Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net: sched: cls_bpf: add BPF-based classifier
From: Eric Dumazet @ 2013-10-28 15:27 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Thomas Graf
In-Reply-To: <526E7556.1060902@redhat.com>

On Mon, 2013-10-28 at 15:31 +0100, Daniel Borkmann wrote:

> I thought about this, I think this can partially be resolved by the
> user implementing one BPF program to match all possible flows for a
> class instead of implementing multiple BPF programs as mentioned in
> the commit, iow that's up to the user space. And the case you suggest,
> would be another option to further improve this, but would come with
> some difficulties in contrast to the notion 0: mismatch, !0: match.
> I think, this would need additional walk-through through all 'ret'
> opcodes to see if those classes actually exist. Then, we would need
> refcounting and call tcf_{un,}bind_filter() for each class that is
> related to this filter, and tcf_exts_exec() would either need to be
> i) understood in a "per-filter" notion, so as long as something
> matches (!0) exec the very same action/policy (which might not be
> what we want) or ii) could just not be implemented as multiple
> user-defined filters could be defined in iproute2 with different
> actions each, but in some paths return the same flowid. So I think
> this here seems the better trade-off.

I have no idea why you want to do all this.

classid are not checked at filter installation time, but at run time.

All you need to do is change :

+       list_for_each_entry(prog, &head->plist, link) {
+               if (SK_RUN_FILTER(prog->filter, skb) == 0)
+                       continue;
+
+               *res = prog->res;
+               ret = tcf_exts_exec(skb, &prog->exts, res);
+               if (ret < 0)
+                       continue;
+
+               return ret;
+       }

to

	list_for_each_entry(prog, &head->plist, link) {
		int filter_res = SK_RUN_FILTER(prog->filter, skb);

		if (!filter_res)
			continue;


               *res = prog->res;
		if (filter_res != -1)
			res->classid = filter_res;

               ret = tcf_exts_exec(skb, &prog->exts, res);
               if (ret < 0)
                       continue;

               return ret;
       }

(Reserving filter returns codes :
 0 : No match
 -1: select classid given in the "tc filter ...." command)
other values : flowid, overiding the default one.

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Dan Williams @ 2013-10-28 15:29 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Johannes Berg, Luciano Coelho, John W. Linville, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <201310281604.48542@pali>

On Mon, 2013-10-28 at 16:04 +0100, Pali Rohár wrote:
> On Monday 28 October 2013 15:56:55 Johannes Berg wrote:
> > On Mon, 2013-10-28 at 09:46 -0500, Dan Williams wrote:
> > > If the device doesn't actually *have* a permanent MAC
> > > address, then it shouldn't be returning one via ethtool,
> > > and should return an error for ETHTOOL_GPERMADDR.
> > 
> > There's currently no provision for that, but I agree it would
> > be better to do so.
> > 
> > johannes
> 
> So what to do with devices which has MAC address stored in some 
> obscure place and there is userspace binary which can read it?
> 
> I think that there should be some way to tell kernel that *this* 
> is the permanent address and it should use it.
> 
> This is reason why I proposed patch which adding sysfs entry for 
> setting permanent address from userspace in wl1251 driver.

Ok, so the N900's MAC is stored in the "cal partition", which is a
region of flash exposed to the OS as /dev/mtd1.  That also stores the
regulatory domain.  Typically userspace binaries are used to pull out
this and other data (see
https://dev.openwrt.org/browser/packages/utils/calvaria/files/src/calvaria.c ) which is then used to initialize the device.

Any idea what kernel driver is used to expose the CAL partition
as /dev/mtd1?  If it's nothing special maybe a small EEPROM-style driver
could be written to read the relevant areas (and since they don't ever
change, don't need to worry about something else writing at the same
time).

Dan

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Stephen Hemminger @ 2013-10-28 15:33 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Dan Williams, Pali Rohár, Luciano Coelho, John W. Linville,
	David S. Miller, linux-wireless, netdev, linux-kernel,
	freemangordon, aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <1382972215.17956.30.camel@jlt4.sipsolutions.net>

On Mon, 28 Oct 2013 15:56:55 +0100
Johannes Berg <johannes@sipsolutions.net> wrote:

> On Mon, 2013-10-28 at 09:46 -0500, Dan Williams wrote:
> 
> > If the device doesn't actually *have* a permanent MAC address, then it
> > shouldn't be returning one via ethtool, and should return an error for
> > ETHTOOL_GPERMADDR.
> 
> There's currently no provision for that, but I agree it would be better
> to do so.
> 
> johannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

The current netdevice API handle the case of a device without a permanent
MAC address, slightly differently.

If device does not have a permanent address,
then:
  1. dev->addr_assign_type should not be NET_ADDR_PERM
  2. when device is registered dev->perm_addr will not be set
     and retain all zeros value
  4. when ethtool gets address it will return all zeros which
     is not a valid address.

This case doesn't seem to be handled threo mac80211 API's

^ permalink raw reply

* [PATCH net-next v2] net: sched: cls_bpf: add BPF-based classifier
From: Daniel Borkmann @ 2013-10-28 15:43 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, netdev, Thomas Graf
In-Reply-To: <cover.1382974535.git.dborkman@redhat.com>

This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:

     0: No match
    -1: Select classid given in "tc filter ..." command
  else: flowid, overwrite the default one

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 v1->v2:
  - Ok, applied Eric Dumazet's feedback with possibility to overwrite
    default flowid

 iproute2 part: http://patchwork.ozlabs.org/patch/286496/

 include/uapi/linux/pkt_cls.h |  14 ++
 net/sched/Kconfig            |  10 ++
 net/sched/Makefile           |   1 +
 net/sched/cls_bpf.c          | 385 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 410 insertions(+)
 create mode 100644 net/sched/cls_bpf.c

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 082eafa..25731df 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -388,6 +388,20 @@ enum {
 
 #define TCA_CGROUP_MAX (__TCA_CGROUP_MAX - 1)
 
+/* BPF classifier */
+
+enum {
+	TCA_BPF_UNSPEC,
+	TCA_BPF_ACT,
+	TCA_BPF_POLICE,
+	TCA_BPF_CLASSID,
+	TCA_BPF_OPS_LEN,
+	TCA_BPF_OPS,
+	__TCA_BPF_MAX,
+};
+
+#define TCA_BPF_MAX (__TCA_BPF_MAX - 1)
+
 /* Extended Matches */
 
 struct tcf_ematch_tree_hdr {
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c03a32a..ad1f1d8 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -443,6 +443,16 @@ config NET_CLS_CGROUP
 	  To compile this code as a module, choose M here: the
 	  module will be called cls_cgroup.
 
+config NET_CLS_BPF
+	tristate "BPF-based classifier"
+	select NET_CLS
+	---help---
+	  If you say Y here, you will be able to classify packets based on
+	  programmable BPF (JIT'ed) filters as an alternative to ematches.
+
+	  To compile this code as a module, choose M here: the module will
+	  be called cls_bpf.
+
 config NET_EMATCH
 	bool "Extended Matches"
 	select NET_CLS
diff --git a/net/sched/Makefile b/net/sched/Makefile
index e5f9abe..35fa47a 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_NET_CLS_RSVP6)	+= cls_rsvp6.o
 obj-$(CONFIG_NET_CLS_BASIC)	+= cls_basic.o
 obj-$(CONFIG_NET_CLS_FLOW)	+= cls_flow.o
 obj-$(CONFIG_NET_CLS_CGROUP)	+= cls_cgroup.o
+obj-$(CONFIG_NET_CLS_BPF)	+= cls_bpf.o
 obj-$(CONFIG_NET_EMATCH)	+= ematch.o
 obj-$(CONFIG_NET_EMATCH_CMP)	+= em_cmp.o
 obj-$(CONFIG_NET_EMATCH_NBYTE)	+= em_nbyte.o
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
new file mode 100644
index 0000000..1002a82
--- /dev/null
+++ b/net/sched/cls_bpf.c
@@ -0,0 +1,385 @@
+/*
+ * Berkeley Packet Filter based traffic classifier
+ *
+ * Might be used to classify traffic through flexible, user-defined and
+ * possibly JIT-ed BPF filters for traffic control as an alternative to
+ * ematches.
+ *
+ * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/skbuff.h>
+#include <linux/filter.h>
+#include <net/rtnetlink.h>
+#include <net/pkt_cls.h>
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
+MODULE_DESCRIPTION("TC BPF based classifier");
+
+struct cls_bpf_head {
+	struct list_head plist;
+	u32 hgen;
+};
+
+struct cls_bpf_prog {
+	struct sk_filter *filter;
+	struct sock_filter *bpf_ops;
+	struct tcf_exts exts;
+	struct tcf_result res;
+	struct list_head link;
+	u32 handle;
+	u16 bpf_len;
+};
+
+static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
+	[TCA_BPF_CLASSID]	= { .type = NLA_U32 },
+	[TCA_BPF_OPS_LEN]	= { .type = NLA_U16 },
+	[TCA_BPF_OPS]		= { .type = NLA_BINARY,
+				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
+};
+
+static const struct tcf_ext_map bpf_ext_map = {
+	.action = TCA_BPF_ACT,
+	.police = TCA_BPF_POLICE,
+};
+
+static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+			    struct tcf_result *res)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+	int ret;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		int filter_res = SK_RUN_FILTER(prog->filter, skb);
+
+		if (filter_res == 0)
+			continue;
+
+		*res = prog->res;
+		if (filter_res != -1)
+			res->classid = filter_res;
+
+		ret = tcf_exts_exec(skb, &prog->exts, res);
+		if (ret < 0)
+			continue;
+
+		return ret;
+	}
+
+	return -1;
+}
+
+static int cls_bpf_init(struct tcf_proto *tp)
+{
+	struct cls_bpf_head *head;
+
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (head == NULL)
+		return -ENOBUFS;
+
+	INIT_LIST_HEAD(&head->plist);
+	tp->root = head;
+
+	return 0;
+}
+
+static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
+{
+	tcf_unbind_filter(tp, &prog->res);
+	tcf_exts_destroy(tp, &prog->exts);
+
+	sk_unattached_filter_destroy(prog->filter);
+
+	kfree(prog->bpf_ops);
+	kfree(prog);
+}
+
+static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog, *todel = (struct cls_bpf_prog *) arg;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (prog == todel) {
+			tcf_tree_lock(tp);
+			list_del(&prog->link);
+			tcf_tree_unlock(tp);
+
+			cls_bpf_delete_prog(tp, prog);
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static void cls_bpf_destroy(struct tcf_proto *tp)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog, *tmp;
+
+	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
+		list_del(&prog->link);
+		cls_bpf_delete_prog(tp, prog);
+	}
+
+	kfree(head);
+}
+
+static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+	unsigned long ret = 0UL;
+
+	if (head == NULL)
+		return 0UL;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (prog->handle == handle) {
+			ret = (unsigned long) prog;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void cls_bpf_put(struct tcf_proto *tp, unsigned long f)
+{
+}
+
+static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
+				   struct cls_bpf_prog *prog,
+				   unsigned long base, struct nlattr **tb,
+				   struct nlattr *est)
+{
+	struct sock_filter *bpf_ops, *bpf_old;
+	struct tcf_exts exts;
+	struct sock_fprog tmp;
+	struct sk_filter *fp, *fp_old;
+	u16 bpf_size, bpf_len;
+	u32 classid;
+	int ret;
+
+	if (!tb[TCA_BPF_OPS_LEN] || !tb[TCA_BPF_OPS] || !tb[TCA_BPF_CLASSID])
+		return -EINVAL;
+
+	ret = tcf_exts_validate(net, tp, tb, est, &exts, &bpf_ext_map);
+	if (ret < 0)
+		return ret;
+
+	classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
+	bpf_len = nla_get_u16(tb[TCA_BPF_OPS_LEN]);
+	if (bpf_len > BPF_MAXINSNS || bpf_len == 0) {
+		ret = -EINVAL;
+		goto errout;
+	}
+
+	bpf_size = bpf_len * sizeof(*bpf_ops);
+	bpf_ops = kzalloc(bpf_size, GFP_KERNEL);
+	if (bpf_ops == NULL) {
+		ret = -ENOMEM;
+		goto errout;
+	}
+
+	memcpy(bpf_ops, nla_data(tb[TCA_BPF_OPS]), bpf_size);
+
+	tmp.len = bpf_len;
+	tmp.filter = (struct sock_filter __user *) bpf_ops;
+
+	ret = sk_unattached_filter_create(&fp, &tmp);
+	if (ret)
+		goto errout_free;
+
+	tcf_tree_lock(tp);
+	fp_old = prog->filter;
+	bpf_old = prog->bpf_ops;
+
+	prog->bpf_len = bpf_len;
+	prog->bpf_ops = bpf_ops;
+	prog->filter = fp;
+	prog->res.classid = classid;
+	tcf_tree_unlock(tp);
+
+	tcf_bind_filter(tp, &prog->res, base);
+	tcf_exts_change(tp, &prog->exts, &exts);
+
+	if (fp_old)
+		sk_unattached_filter_destroy(fp_old);
+	if (bpf_old)
+		kfree(bpf_old);
+
+	return 0;
+
+errout_free:
+	kfree(bpf_ops);
+errout:
+	tcf_exts_destroy(tp, &exts);
+	return ret;
+}
+
+static u32 cls_bpf_grab_new_handle(struct tcf_proto *tp,
+				   struct cls_bpf_head *head)
+{
+	unsigned int i = 0x80000000;
+
+	do {
+		if (++head->hgen == 0x7FFFFFFF)
+			head->hgen = 1;
+	} while (--i > 0 && cls_bpf_get(tp, head->hgen));
+	if (i == 0)
+		pr_err("Insufficient number of handles\n");
+
+	return i;
+}
+
+static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
+			  struct tcf_proto *tp, unsigned long base,
+			  u32 handle, struct nlattr **tca,
+			  unsigned long *arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) *arg;
+	struct nlattr *tb[TCA_BPF_MAX + 1];
+	int ret;
+
+	if (tca[TCA_OPTIONS] == NULL)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_BPF_MAX, tca[TCA_OPTIONS], bpf_policy);
+	if (ret < 0)
+		return ret;
+
+	if (prog != NULL) {
+		if (handle && prog->handle != handle)
+			return -EINVAL;
+		return cls_bpf_modify_existing(net, tp, prog, base, tb,
+					       tca[TCA_RATE]);
+	}
+
+	prog = kzalloc(sizeof(*prog), GFP_KERNEL);
+	if (prog == NULL)
+		return -ENOBUFS;
+
+	if (handle == 0)
+		prog->handle = cls_bpf_grab_new_handle(tp, head);
+	else
+		prog->handle = handle;
+	if (prog->handle == 0) {
+		ret = -EINVAL;
+		goto errout;
+	}
+
+	ret = cls_bpf_modify_existing(net, tp, prog, base, tb, tca[TCA_RATE]);
+	if (ret < 0)
+		goto errout;
+
+	tcf_tree_lock(tp);
+	list_add(&prog->link, &head->plist);
+	tcf_tree_unlock(tp);
+
+	*arg = (unsigned long) prog;
+
+	return 0;
+errout:
+	if (*arg == 0UL && prog)
+		kfree(prog);
+
+	return ret;
+}
+
+static int cls_bpf_dump(struct tcf_proto *tp, unsigned long fh,
+			struct sk_buff *skb, struct tcmsg *tm)
+{
+	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) fh;
+	struct nlattr *nest, *nla;
+
+	if (prog == NULL)
+		return skb->len;
+
+	tm->tcm_handle = prog->handle;
+
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (nest == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_BPF_CLASSID, prog->res.classid))
+		goto nla_put_failure;
+	if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_len))
+		goto nla_put_failure;
+
+	nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_len *
+			  sizeof(struct sock_filter));
+	if (nla == NULL)
+		goto nla_put_failure;
+
+        memcpy(nla_data(nla), prog->bpf_ops, nla_len(nla));
+
+	if (tcf_exts_dump(skb, &prog->exts, &bpf_ext_map) < 0)
+		goto nla_put_failure;
+
+	nla_nest_end(skb, nest);
+
+	if (tcf_exts_dump_stats(skb, &prog->exts, &bpf_ext_map) < 0)
+		goto nla_put_failure;
+
+	return skb->len;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
+static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (arg->count < arg->skip)
+			goto skip;
+		if (arg->fn(tp, (unsigned long) prog, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+skip:
+		arg->count++;
+	}
+}
+
+static struct tcf_proto_ops cls_bpf_ops __read_mostly = {
+	.kind		=	"bpf",
+	.owner		=	THIS_MODULE,
+	.classify	=	cls_bpf_classify,
+	.init		=	cls_bpf_init,
+	.destroy	=	cls_bpf_destroy,
+	.get		=	cls_bpf_get,
+	.put		=	cls_bpf_put,
+	.change		=	cls_bpf_change,
+	.delete		=	cls_bpf_delete,
+	.walk		=	cls_bpf_walk,
+	.dump		=	cls_bpf_dump,
+};
+
+static int __init cls_bpf_init_mod(void)
+{
+	return register_tcf_proto_ops(&cls_bpf_ops);
+}
+
+static void __exit cls_bpf_exit_mod(void)
+{
+	unregister_tcf_proto_ops(&cls_bpf_ops);
+}
+
+module_init(cls_bpf_init_mod);
+module_exit(cls_bpf_exit_mod);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-28 16:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131017084121.GC22705@gmail.com>



Ingo, et al.-
	Ok, sorry for the delay, here are the test results you've been asking
for.


First, some information about what I did.  I attached the module that I ran this
test with at the bottom of this email.  You'll note that I started using a
module parameter write patch to trigger the csum rather than the module load
path.  The latter seemed to be giving me lots of variance in my run times, which
I wanted to eliminate.  I attributed it to the module load mechanism itself, and
by using the parameter write path, I was able to get more consistent results.

First, the run time tests:

I ran this command:
for i in `seq 0 1 3`
do
	 echo $i > /sys/module/csum_test/parameters/module_test_mode
	 perf stat --repeat 20 --null echo 1 > echo 1 > /sys/module/csum_test/parameters/test_fire
done

The for loop allows me to chagne the module_test_mode, which is tied to a switch
statement in do_csum that selects which checksumming method we use
(base/prefetch/parallel alu/both).  The results are:


Base:
 Performance counter stats for 'bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       0.093269042 seconds time elapsed                                          ( +-  2.24% )

Prefetch (5x64):
 Performance counter stats for 'bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       0.079440009 seconds time elapsed                                          ( +-  2.29% )

Parallel ALU:
 Performance counter stats for 'bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       0.087666677 seconds time elapsed                                          ( +-  4.01% )

Prefetch + Parallel ALU:
 Performance counter stats for 'bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       0.080758702 seconds time elapsed                                          ( +-  2.34% )

So we can see here that we get about a 1% speedup between the base and the both
(Prefetch + Parallel ALU) case, with prefetch accounting for most of that
speedup.

Looking at the specific cpu counters we get this:


Base:
     Total time: 0.179 [sec]

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       1571.304618 task-clock                #    5.213 CPUs utilized            ( +-  0.45% )
            14,423 context-switches          #    0.009 M/sec                    ( +-  4.28% )
             2,710 cpu-migrations            #    0.002 M/sec                    ( +-  2.83% )
            75,402 page-faults               #    0.048 M/sec                    ( +-  0.07% )
     1,597,349,326 cycles                    #    1.017 GHz                      ( +-  1.74% ) [40.51%]
       104,882,858 stalled-cycles-frontend   #    6.57% frontend cycles idle     ( +-  1.25% ) [40.33%]
     1,043,429,984 stalled-cycles-backend    #   65.32% backend  cycles idle     ( +-  1.25% ) [39.73%]
       868,372,132 instructions              #    0.54  insns per cycle        
                                             #    1.20  stalled cycles per insn  ( +-  1.43% ) [39.88%]
       161,143,820 branches                  #  102.554 M/sec                    ( +-  1.49% ) [39.76%]
         4,348,075 branch-misses             #    2.70% of all branches          ( +-  1.43% ) [39.99%]
       457,042,576 L1-dcache-loads           #  290.868 M/sec                    ( +-  1.25% ) [40.63%]
         8,928,240 L1-dcache-load-misses     #    1.95% of all L1-dcache hits    ( +-  1.26% ) [41.17%]
        15,821,051 LLC-loads                 #   10.069 M/sec                    ( +-  1.56% ) [41.20%]
         4,902,576 LLC-load-misses           #   30.99% of all LL-cache hits     ( +-  1.51% ) [41.36%]
       235,775,688 L1-icache-loads           #  150.051 M/sec                    ( +-  1.39% ) [41.10%]
         3,116,106 L1-icache-load-misses     #    1.32% of all L1-icache hits    ( +-  3.43% ) [40.96%]
       461,315,416 dTLB-loads                #  293.588 M/sec                    ( +-  1.43% ) [41.18%]
           140,280 dTLB-load-misses          #    0.03% of all dTLB cache hits   ( +-  2.30% ) [40.96%]
       236,127,031 iTLB-loads                #  150.275 M/sec                    ( +-  1.63% ) [41.43%]
            46,173 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  3.40% ) [41.11%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [40.82%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [40.37%]

       0.301414024 seconds time elapsed                                          ( +-  0.47% )

Prefetch (5x64):
     Total time: 0.172 [sec]

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       1565.797128 task-clock                #    5.238 CPUs utilized            ( +-  0.46% )
            13,845 context-switches          #    0.009 M/sec                    ( +-  4.20% )
             2,624 cpu-migrations            #    0.002 M/sec                    ( +-  2.72% )
            75,452 page-faults               #    0.048 M/sec                    ( +-  0.08% )
     1,642,106,355 cycles                    #    1.049 GHz                      ( +-  1.33% ) [40.17%]
       107,786,666 stalled-cycles-frontend   #    6.56% frontend cycles idle     ( +-  1.37% ) [39.90%]
     1,065,286,880 stalled-cycles-backend    #   64.87% backend  cycles idle     ( +-  1.59% ) [39.14%]
       888,815,001 instructions              #    0.54  insns per cycle        
                                             #    1.20  stalled cycles per insn  ( +-  1.29% ) [38.92%]
       163,106,907 branches                  #  104.169 M/sec                    ( +-  1.32% ) [38.93%]
         4,333,456 branch-misses             #    2.66% of all branches          ( +-  1.94% ) [39.77%]
       459,779,806 L1-dcache-loads           #  293.639 M/sec                    ( +-  1.60% ) [40.23%]
         8,827,680 L1-dcache-load-misses     #    1.92% of all L1-dcache hits    ( +-  1.77% ) [41.38%]
        15,556,816 LLC-loads                 #    9.935 M/sec                    ( +-  1.76% ) [41.16%]
         4,885,618 LLC-load-misses           #   31.40% of all LL-cache hits     ( +-  1.40% ) [40.84%]
       236,131,778 L1-icache-loads           #  150.806 M/sec                    ( +-  1.32% ) [40.59%]
         3,037,537 L1-icache-load-misses     #    1.29% of all L1-icache hits    ( +-  2.23% ) [41.13%]
       454,835,028 dTLB-loads                #  290.481 M/sec                    ( +-  1.23% ) [41.34%]
           139,907 dTLB-load-misses          #    0.03% of all dTLB cache hits   ( +-  2.18% ) [41.21%]
       236,357,655 iTLB-loads                #  150.950 M/sec                    ( +-  1.31% ) [41.29%]
            46,633 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  2.74% ) [40.67%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [40.16%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [40.09%]

       0.298948767 seconds time elapsed                                          ( +-  0.36% )

Here it appears everything between the two runs is about the same.  We reduced
the number of dcache misses by a small amount (0.03 percentage points), which is
nice, but I'm not sure would account for the speedup we see in the run time.

Parallel ALU:
     Total time: 0.182 [sec]

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       1553.544876 task-clock                #    5.217 CPUs utilized            ( +-  0.42% )
            14,066 context-switches          #    0.009 M/sec                    ( +-  6.24% )
             2,831 cpu-migrations            #    0.002 M/sec                    ( +-  3.33% )
            75,432 page-faults               #    0.049 M/sec                    ( +-  0.08% )
     1,659,509,743 cycles                    #    1.068 GHz                      ( +-  1.27% ) [40.10%]
       106,466,680 stalled-cycles-frontend   #    6.42% frontend cycles idle     ( +-  1.50% ) [39.98%]
     1,035,481,957 stalled-cycles-backend    #   62.40% backend  cycles idle     ( +-  1.23% ) [39.38%]
       875,104,201 instructions              #    0.53  insns per cycle        
                                             #    1.18  stalled cycles per insn  ( +-  1.30% ) [38.66%]
       160,553,275 branches                  #  103.346 M/sec                    ( +-  1.32% ) [38.85%]
         4,329,119 branch-misses             #    2.70% of all branches          ( +-  1.39% ) [39.59%]
       448,195,116 L1-dcache-loads           #  288.498 M/sec                    ( +-  1.91% ) [41.07%]
         8,632,347 L1-dcache-load-misses     #    1.93% of all L1-dcache hits    ( +-  1.90% ) [41.56%]
        15,143,145 LLC-loads                 #    9.747 M/sec                    ( +-  1.89% ) [41.05%]
         4,698,204 LLC-load-misses           #   31.03% of all LL-cache hits     ( +-  1.03% ) [41.23%]
       224,316,468 L1-icache-loads           #  144.390 M/sec                    ( +-  1.27% ) [41.39%]
         2,902,842 L1-icache-load-misses     #    1.29% of all L1-icache hits    ( +-  2.65% ) [42.60%]
       433,914,588 dTLB-loads                #  279.306 M/sec                    ( +-  1.75% ) [43.07%]
           132,090 dTLB-load-misses          #    0.03% of all dTLB cache hits   ( +-  2.15% ) [43.12%]
       230,701,361 iTLB-loads                #  148.500 M/sec                    ( +-  1.77% ) [43.47%]
            45,562 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  3.76% ) [42.88%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [42.29%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [41.32%]

       0.297758185 seconds time elapsed                                          ( +-  0.40% )

Here It seems the major advantage was backend stall cycles saved (which makes
sense to me).  Since we split the instruction path into two units that could run
independently of each other we spent less time waiting for prior instructions to
retire.  As a result we dropped two percentage points in our stall number.

Prefetch + Parallel ALU:
     Total time: 0.182 [sec]

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       1549.171283 task-clock                #    5.231 CPUs utilized            ( +-  0.50% )
            13,717 context-switches          #    0.009 M/sec                    ( +-  4.32% )
             2,721 cpu-migrations            #    0.002 M/sec                    ( +-  2.47% )
            75,432 page-faults               #    0.049 M/sec                    ( +-  0.07% )
     1,579,140,244 cycles                    #    1.019 GHz                      ( +-  1.71% ) [40.06%]
       103,803,034 stalled-cycles-frontend   #    6.57% frontend cycles idle     ( +-  1.74% ) [39.60%]
     1,016,582,613 stalled-cycles-backend    #   64.38% backend  cycles idle     ( +-  1.79% ) [39.57%]
       881,036,653 instructions              #    0.56  insns per cycle        
                                             #    1.15  stalled cycles per insn  ( +-  1.61% ) [39.29%]
       164,333,010 branches                  #  106.078 M/sec                    ( +-  1.51% ) [39.38%]
         4,385,459 branch-misses             #    2.67% of all branches          ( +-  1.62% ) [40.29%]
       463,987,526 L1-dcache-loads           #  299.507 M/sec                    ( +-  1.52% ) [40.20%]
         8,739,535 L1-dcache-load-misses     #    1.88% of all L1-dcache hits    ( +-  1.95% ) [40.37%]
        15,318,497 LLC-loads                 #    9.888 M/sec                    ( +-  1.80% ) [40.43%]
         4,846,148 LLC-load-misses           #   31.64% of all LL-cache hits     ( +-  1.68% ) [40.59%]
       231,982,874 L1-icache-loads           #  149.746 M/sec                    ( +-  1.43% ) [41.25%]
         3,141,106 L1-icache-load-misses     #    1.35% of all L1-icache hits    ( +-  2.32% ) [41.76%]
       459,688,615 dTLB-loads                #  296.732 M/sec                    ( +-  1.75% ) [41.87%]
           138,667 dTLB-load-misses          #    0.03% of all dTLB cache hits   ( +-  1.97% ) [42.31%]
       235,629,204 iTLB-loads                #  152.100 M/sec                    ( +-  1.40% ) [42.04%]
            46,038 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  2.75% ) [41.20%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [40.77%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [40.27%]

       0.296173305 seconds time elapsed                                          ( +-  0.44% )
Here, with both optimizations, we've reduced both our backend stall cycles, and
our dcache miss rate (though our load misses here is higher than it was when we
are just doing parallel ALU execution.  I wonder if the separation of the adcx
path is leading to multiple load requests before the prefetch completes.  I'll
try messing with the stride a bit more to see if I can get some more insight
there.

So there you have it.  I think, looking at this, I can say that its not as big a
win as my initial measurements were indicating, but still a win.

Thoughts?

Regards
Neil

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/init.h>
#include <linux/moduleparam.h>
#include <linux/rtnetlink.h>
#include <net/rtnetlink.h>
#include <linux/u64_stats_sync.h>

#define BUFSIZ 2*1024*1024
#define NBPAGES 16

extern int csum_mode;
int module_test_mode = 0;
int test_fire = 0;

static int __init csum_init_module(void)
{
        return 0;
}

static void __exit csum_cleanup_module(void)
{
        return;
}

static int set_param_str(const char *val, const struct kernel_param *kp)
{
        int i;
        __wsum sum = 0;
        /*u64 start, end;*/
        void *base, *addrs[NBPAGES];
        u32 rnd, offset;

	
        memset(addrs, 0, sizeof(addrs));
        for (i = 0; i < NBPAGES; i++) {
                addrs[i] = kmalloc_node(BUFSIZ, GFP_KERNEL, 0);
                if (!addrs[i])
                        goto out;
        }

	csum_mode = module_test_mode;

	local_bh_disable();
        /*pr_err("STARTING ITERATIONS on cpu %d\n", smp_processor_id());*/
        /*start = ktime_to_ns(ktime_get());*/

        for (i = 0; i < 100000; i++) {
                rnd = prandom_u32();
                base = addrs[rnd % NBPAGES];
                rnd /= NBPAGES;
                offset = rnd % (BUFSIZ - 1500);
                offset &= ~1U;
                sum = csum_partial(base + offset, 1500, sum);
        }
        /*end = ktime_to_ns(ktime_get());*/
        local_bh_enable();

	/*pr_err("COMPLETED 100000 iterations of csum %x in %llu nanosec\n", sum, end - start);*/

	csum_mode = 0;
out:
        for (i = 0; i < NBPAGES; i++)
                kfree(addrs[i]);

        return 0;
}

static int get_param_str(char *buffer, const struct kernel_param *kp)
{
	return sprintf(buffer, "%d\n", test_fire);
}

static struct kernel_param_ops param_ops_str = {
	.set = set_param_str,
	.get = get_param_str,
};

module_param_named(module_test_mode, module_test_mode, int, 0644);
MODULE_PARM_DESC(module_test_mode, "csum test mode");
module_param_cb(test_fire, &param_ops_str, &test_fire, 0644);
module_init(csum_init_module);
module_exit(csum_cleanup_module);
MODULE_LICENSE("GPL");

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-28 16:20 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028160131.GA31048@hmsreliant.think-freely.org>

* Neil Horman <nhorman@tuxdriver.com> wrote:

> Base:
>        0.093269042 seconds time elapsed                                          ( +-  2.24% )
> Prefetch (5x64):
>        0.079440009 seconds time elapsed                                          ( +-  2.29% )
> Parallel ALU:
>        0.087666677 seconds time elapsed                                          ( +-  4.01% )
> Prefetch + Parallel ALU:
>        0.080758702 seconds time elapsed                                          ( +-  2.34% )
> 
> So we can see here that we get about a 1% speedup between the base 
> and the both (Prefetch + Parallel ALU) case, with prefetch 
> accounting for most of that speedup.

Hm, there's still something strange about these results. So the 
range of the results is 790-930 nsecs. The noise of the measurements 
is 2%-4%, i.e. 20-40 nsecs.

The prefetch-only result itself is the fastest of all - 
statistically equivalent to the prefetch+parallel-ALU result, within 
the noise range.

So if prefetch is enabled, turning on parallel-ALU has no measurable 
effect - which is counter-intuitive. Do you have an 
theory/explanation for that?

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Pali Rohár @ 2013-10-28 16:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Johannes Berg, Luciano Coelho, John W. Linville, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <1382974161.1542.18.camel@dcbw.foobar.com>

[-- Attachment #1: Type: Text/Plain, Size: 1983 bytes --]

On Monday 28 October 2013 16:29:21 Dan Williams wrote:
> On Mon, 2013-10-28 at 16:04 +0100, Pali Rohár wrote:
> > On Monday 28 October 2013 15:56:55 Johannes Berg wrote:
> > > On Mon, 2013-10-28 at 09:46 -0500, Dan Williams wrote:
> > > > If the device doesn't actually *have* a permanent MAC
> > > > address, then it shouldn't be returning one via ethtool,
> > > > and should return an error for ETHTOOL_GPERMADDR.
> > > 
> > > There's currently no provision for that, but I agree it
> > > would be better to do so.
> > > 
> > > johannes
> > 
> > So what to do with devices which has MAC address stored in
> > some obscure place and there is userspace binary which can
> > read it?
> > 
> > I think that there should be some way to tell kernel that
> > *this* is the permanent address and it should use it.
> > 
> > This is reason why I proposed patch which adding sysfs entry
> > for setting permanent address from userspace in wl1251
> > driver.
> 
> Ok, so the N900's MAC is stored in the "cal partition", which
> is a region of flash exposed to the OS as /dev/mtd1.  That
> also stores the regulatory domain.  Typically userspace
> binaries are used to pull out this and other data (see
> https://dev.openwrt.org/browser/packages/utils/calvaria/files/
> src/calvaria.c ) which is then used to initialize the device.
> 
> Any idea what kernel driver is used to expose the CAL
> partition as /dev/mtd1?  If it's nothing special maybe a
> small EEPROM-style driver could be written to read the
> relevant areas (and since they don't ever change, don't need
> to worry about something else writing at the same time).
> 
> Dan

/dev/mtd1 is second (0 is first) OneNand partition. Nothing 
special. Kernel driver for OnaNand is already in mainline kernel.

That partition also has NVS device data for wl1251 chip.

And you are right calvaria has GPL v2 code for parsing data in 
that partition.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-28 16:24 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028160131.GA31048@hmsreliant.think-freely.org>


* Neil Horman <nhorman@tuxdriver.com> wrote:

> Looking at the specific cpu counters we get this:
> 
> Base:
>      Total time: 0.179 [sec]
> 
>  Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
> 
>        1571.304618 task-clock                #    5.213 CPUs utilized            ( +-  0.45% )
>             14,423 context-switches          #    0.009 M/sec                    ( +-  4.28% )
>              2,710 cpu-migrations            #    0.002 M/sec                    ( +-  2.83% )

Hm, for these second round of measurements were you using 'perf stat 
-a -C ...'?

The most accurate method of measurement for such single-threaded 
workloads is something like:

	taskset 0x1 perf stat -a -C 1 --repeat 20 ...

this will bind your workload to CPU#0, and will do PMU measurements 
only there - without mixing in other CPUs or workloads.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: David Ahern @ 2013-10-28 16:49 UTC (permalink / raw)
  To: Ingo Molnar, Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028162438.GB14350@gmail.com>

On 10/28/13 10:24 AM, Ingo Molnar wrote:

> The most accurate method of measurement for such single-threaded
> workloads is something like:
>
> 	taskset 0x1 perf stat -a -C 1 --repeat 20 ...
>
> this will bind your workload to CPU#0, and will do PMU measurements
> only there - without mixing in other CPUs or workloads.

you can drop the -a if you only want a specific CPU (-C arg). And -C in 
perf is cpu number starting with 0, so in your example above -C 1 means 
cpu1, not cpu0.

David

^ permalink raw reply

* [PATCH] can: c_can: Speed up rx_poll function
From: Markus Pargmann @ 2013-10-28 16:59 UTC (permalink / raw)
  To: Marc Kleine-Budde, Wolfgang Grandegger
  Cc: linux-can, netdev, linux-kernel, kernel, Markus Pargmann

This patch speeds up the rx_poll function by reducing the number of
register reads.

Replace the 32bit register read by a 16bit register read. Currently
the 32bit register read is implemented by using 2 16bit reads. This is
inefficient as we only use the lower 16bit in rx_poll.

The for loops reads the pending interrupts in every iteration. This
leads up to 16 reads of pending interrupts. The patch introduces a new
outer loop to read the pending interrupts as long as 'quota' is above 0.
This reduces the total number of reads.

The third change is to replace the for-loop by a find_next_bit loop.

Tested on AM335x. I removed all 'static' and 'inline' from c_can.c to
see the timings for all functions. I used the function tracer with
trace_stats.

125kbit:
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
With patch:
  c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us

1Mbit:
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
With patch:
  c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
---
 drivers/net/can/c_can/c_can.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index a668cd4..31af033 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -259,6 +259,12 @@ static u32 c_can_read_reg32(struct c_can_priv *priv, enum reg index)
 	return val;
 }
 
+u16 c_can_read_reg16(struct c_can_priv *priv, enum reg index)
+{
+	u16 val = priv->read_reg(priv, index);
+	return val;
+}
+
 static void c_can_enable_all_interrupts(struct c_can_priv *priv,
 						int enable)
 {
@@ -798,17 +804,21 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
 	u32 num_rx_pkts = 0;
 	unsigned int msg_obj, msg_ctrl_save;
 	struct c_can_priv *priv = netdev_priv(dev);
-	u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
+	unsigned long val = c_can_read_reg16(priv, C_CAN_INTPND1_REG);
+
+	/*
+	 * It is faster to read only one 16bit register. This is only possible
+	 * for a maximum number of 16 objects.
+	 */
+	BUILD_BUG_ON_MSG(C_CAN_MSG_OBJ_RX_LAST > 16,
+			"Implementation does not support more message objects than 16");
+
+	while (quota > 0 && (val = c_can_read_reg16(priv, C_CAN_INTPND1_REG))) {
+		msg_obj = 0;
+		while ((msg_obj = find_next_bit(&val, 16, msg_obj)) < 16 &&
+				quota > 0) {
+			++msg_obj;
 
-	for (msg_obj = C_CAN_MSG_OBJ_RX_FIRST;
-			msg_obj <= C_CAN_MSG_OBJ_RX_LAST && quota > 0;
-			val = c_can_read_reg32(priv, C_CAN_INTPND1_REG),
-			msg_obj++) {
-		/*
-		 * as interrupt pending register's bit n-1 corresponds to
-		 * message object n, we need to handle the same properly.
-		 */
-		if (val & (1 << (msg_obj - 1))) {
 			c_can_object_get(dev, 0, msg_obj, IF_COMM_ALL &
 					~IF_COMM_TXRQST);
 			msg_ctrl_save = priv->read_reg(priv,
-- 
1.8.4.rc3


^ permalink raw reply related

* RE: transmit lockup using smsc95xx ethernet on usb3
From: David Laight @ 2013-10-28 17:09 UTC (permalink / raw)
  To: Sarah Sharp
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Xenia Ragiadakou
In-Reply-To: <20131017000758.GL7082@xanatos>

> > We are seeing complete lockups of the transmit side when using
> > the smsc95xx driver connected to a USB3 port on an i7 (Ivybridge) cpu.
> > These errors are very intermittent - less than once a day, and
> > it isn't actually clear that they are related to traffic load.
...
> I would suggest you try with the latest stable kernel, with
> CONFIG_USB_DEBUG enabled.

I've finally got the same failure with a 3.12-0-rc5 kernel.
The only change I made was to comment out the 'short packet' trace.

The kernel trace contains:
(I've shortened the lines by removing the fixed part of the header.
I've got the full trace, and the matching usbmon trace, but they
are a bit large to post to the mailing lists!

All from xhci_hcd 0000:00:14.0
[38704.116936] Transfer error on endpoint
[38704.116941] Cleaning up stalled endpoint ring
[38704.116944] Finding segment containing stopped TRB.
[38704.116946] Finding endpoint context
[38704.116948] Finding segment containing last TRB in TD.
[38704.116950] Cycle state = 0x0
[38704.116952] New dequeue segment = ffff880214565660 (virtual)
[38704.116954] New dequeue pointer = 0x210cf4ad0 (DMA)
[38704.116956] Queueing new dequeue state
[38704.116959] Set TR Deq Ptr cmd, new deq seg = ffff880214565660 (0x210cf4800 dma), new deq ptr = ffff880210cf4ad0 (0x210cf4ad0 dma), new cycle = 0
[38704.116962] // Ding dong!
[38704.116966] Giveback URB ffff8800d6841d80, len = 0, expected = 18944, status = -71
[38704.116970] Transfer error on endpoint
[38704.116973] Cleaning up stalled endpoint ring
[38704.116974] Finding segment containing stopped TRB.
[38704.116976] Finding endpoint context
[38704.116978] Finding segment containing last TRB in TD.
[38704.116980] Cycle state = 0x1
[38704.116982] New dequeue segment = ffff880214565680 (virtual)
[38704.116984] New dequeue pointer = 0x210cf47e0 (DMA)
[38704.116986] Queueing new dequeue state
[38704.116989] Set TR Deq Ptr cmd, new deq seg = ffff880214565680 (0x210cf4400 dma), new deq ptr = ffff880210cf47e0 (0x210cf47e0 dma), new cycle = 1
[38704.116991] // Ding dong!
[38704.116995] Giveback URB ffff8802132169c0, len = 1024, expected = 1526, status = -71
[38704.116998] Ignoring reset ep completion code of 1
[38704.117001] Successful Set TR Deq Ptr cmd, deq = @210cf4ad0
[38704.117004] Ignoring reset ep completion code of 1
[38704.117006] Successful Set TR Deq Ptr cmd, deq = @210cf47e1
[38704.240067] Stalled endpoint
[38704.240075] Giveback URB ffff8800d4438900, len = 0, expected = 1526, status = -32
[38704.240112] Cancel URB ffff8800d4438b40, dev 1.1, ep 0x2, starting at offset 0x210cf4030
[38704.240118] // Ding dong!
[38704.240124] Cancel URB ffff8800d4438cc0, dev 1.1, ep 0x2, starting at offset 0x210cf4040
[38704.240129] Removing canceled TD starting at 0x210cf4030 (dma).
[38704.240132] TRB to noop at offset 0x210cf4030
[38704.240134] Removing canceled TD starting at 0x210cf4040 (dma).
[38704.240136] TRB to noop at offset 0x210cf4040
[38704.240155] WARN halted endpoint, queueing URB anyway.
[38704.240163] WARN halted endpoint, queueing URB anyway.
[38704.240170] WARN halted endpoint, queueing URB anyway.
[38704.240177] Cancel URB ffff8800d4438840, dev 1.1, ep 0x2, starting at offset 0x210cf4050
[38704.240179] // Ding dong!
[38704.240184] Cancel URB ffff8800d44389c0, dev 1.1, ep 0x2, starting at offset 0x210cf4060
[38704.240189] Removing canceled TD starting at 0x210cf4050 (dma).
[38704.240191] TRB to noop at offset 0x210cf4050
[38704.240193] Removing canceled TD starting at 0x210cf4060 (dma).
[38704.240195] TRB to noop at offset 0x210cf4060
[38704.240205] WARN halted endpoint, queueing URB anyway.
[38704.240210] WARN halted endpoint, queueing URB anyway.
[38704.240216] Cancel URB ffff8800d4438c00, dev 1.1, ep 0x2, starting at offset 0x210cf4070
[38704.240219] // Ding dong!
[38704.240224] Cancel URB ffff8800d4438a80, dev 1.1, ep 0x2, starting at offset 0x210cf4080
[38704.240228] Removing canceled TD starting at 0x210cf4070 (dma).
[38704.240230] TRB to noop at offset 0x210cf4070
[38704.240232] Removing canceled TD starting at 0x210cf4080 (dma).
[38704.240234] TRB to noop at offset 0x210cf4080
[38704.240244] WARN halted endpoint, queueing URB anyway.
[38704.240249] WARN halted endpoint, queueing URB anyway.
[38704.240255] Cancel URB ffff8800d4438780, dev 1.1, ep 0x2, starting at offset 0x210cf4090
[38704.240257] // Ding dong!
[38704.240262] Cancel URB ffff8800d4438480, dev 1.1, ep 0x2, starting at offset 0x210cf40a0
[38704.240266] Cancel URB ffff8800d44386c0, dev 1.1, ep 0x2, starting at offset 0x210cf40b0
[38704.240271] Removing canceled TD starting at 0x210cf4090 (dma).
[38704.240273] TRB to noop at offset 0x210cf4090
[38704.240275] Removing canceled TD starting at 0x210cf40a0 (dma).
[38704.240277] TRB to noop at offset 0x210cf40a0
[38704.240279] Removing canceled TD starting at 0x210cf40b0 (dma).
[38704.240281] TRB to noop at offset 0x210cf40b0
[38704.240292] WARN halted endpoint, queueing URB anyway.
[38704.240296] WARN halted endpoint, queueing URB anyway.
[38704.240301] WARN halted endpoint, queueing URB anyway.
[38704.240307] Cancel URB ffff8800d4438900, dev 1.1, ep 0x2, starting at offset 0x210cf40c0
[38704.240309] // Ding dong!
[38704.240314] Cancel URB ffff8800d4438b40, dev 1.1, ep 0x2, starting at offset 0x210cf40d0
[38704.240319] Removing canceled TD starting at 0x210cf40c0 (dma).
[38704.240321] TRB to noop at offset 0x210cf40c0
[38704.240323] Removing canceled TD starting at 0x210cf40d0 (dma).
[38704.240325] TRB to noop at offset 0x210cf40d0
[38704.240335] WARN halted endpoint, queueing URB anyway.
[38704.240340] WARN halted endpoint, queueing URB anyway.
[38704.240346] Cancel URB ffff8800d4438f00, dev 1.1, ep 0x2, starting at offset 0x210cf40e0
[38704.240348] // Ding dong!
[38704.240353] Cancel URB ffff8800d4438840, dev 1.1, ep 0x2, starting at offset 0x210cf40f0
[38704.240357] Removing canceled TD starting at 0x210cf40e0 (dma).
[38704.240359] TRB to noop at offset 0x210cf40e0
[38704.240361] Removing canceled TD starting at 0x210cf40f0 (dma).
[38704.240363] TRB to noop at offset 0x210cf40f0
[38704.240373] WARN halted endpoint, queueing URB anyway.
[38704.240378] WARN halted endpoint, queueing URB anyway.
[38704.240384] Cancel URB ffff8800d4438cc0, dev 1.1, ep 0x2, starting at offset 0x210cf4100
[38704.240386] // Ding dong!
[38704.240391] Cancel URB ffff8800d4438c00, dev 1.1, ep 0x2, starting at offset 0x210cf4110
[38704.240395] Cancel URB ffff8800d44389c0, dev 1.1, ep 0x2, starting at offset 0x210cf4120
[38704.240400] Removing canceled TD starting at 0x210cf4100 (dma).
[38704.240402] TRB to noop at offset 0x210cf4100
[38704.240404] Removing canceled TD starting at 0x210cf4110 (dma).
[38704.240406] TRB to noop at offset 0x210cf4110
[38704.240408] Removing canceled TD starting at 0x210cf4120 (dma).
[38704.240410] TRB to noop at offset 0x210cf4120
[38704.240420] WARN halted endpoint, queueing URB anyway.
[38704.240425] WARN halted endpoint, queueing URB anyway.
[38704.240430] WARN halted endpoint, queueing URB anyway.
[38704.240436] Cancel URB ffff8800d4438480, dev 1.1, ep 0x2, starting at offset 0x210cf4130
[38704.240438] // Ding dong!
[38704.240443] Cancel URB ffff8800d4438780, dev 1.1, ep 0x2, starting at offset 0x210cf4140
[38704.240447] Removing canceled TD starting at 0x210cf4130 (dma).
[38704.240449] TRB to noop at offset 0x210cf4130
[38704.240451] Removing canceled TD starting at 0x210cf4140 (dma).
[38704.240453] TRB to noop at offset 0x210cf4140
[38704.240463] WARN halted endpoint, queueing URB anyway.
[38704.240468] WARN halted endpoint, queueing URB anyway.
[38704.240474] Cancel URB ffff8800d4438a80, dev 1.1, ep 0x2, starting at offset 0x210cf4150
[38704.240476] // Ding dong!
[38704.240481] Cancel URB ffff8800d4438900, dev 1.1, ep 0x2, starting at offset 0x210cf4160
[38704.240486] Removing canceled TD starting at 0x210cf4150 (dma).
[38704.240488] TRB to noop at offset 0x210cf4150
[38704.240490] Removing canceled TD starting at 0x210cf4160 (dma).
[38704.240492] TRB to noop at offset 0x210cf4160
[38704.240501] WARN halted endpoint, queueing URB anyway.
[38704.240506] WARN halted endpoint, queueing URB anyway.
[38704.240512] Cancel URB ffff8800d44386c0, dev 1.1, ep 0x2, starting at offset 0x210cf4170
[38704.240514] // Ding dong!
[38704.240519] Cancel URB ffff8800d4438f00, dev 1.1, ep 0x2, starting at offset 0x210cf4180
[38704.240524] Cancel URB ffff8800d4438b40, dev 1.1, ep 0x2, starting at offset 0x210cf4190
[38704.240528] Removing canceled TD starting at 0x210cf4170 (dma).
[38704.240530] TRB to noop at offset 0x210cf4170
[38704.240532] Removing canceled TD starting at 0x210cf4180 (dma).
[38704.240535] TRB to noop at offset 0x210cf4180
[38704.240537] Removing canceled TD starting at 0x210cf4190 (dma).
[38704.240539] TRB to noop at offset 0x210cf4190
[38704.240549] WARN halted endpoint, queueing URB anyway.
[38704.240554] WARN halted endpoint, queueing URB anyway.
[38704.240559] WARN halted endpoint, queueing URB anyway.
[38704.240565] Cancel URB ffff8800d4438c00, dev 1.1, ep 0x2, starting at offset 0x210cf41a0
[38704.240567] // Ding dong!
[38704.240572] Cancel URB ffff8800d4438cc0, dev 1.1, ep 0x2, starting at offset 0x210cf41b0
[38704.240577] Removing canceled TD starting at 0x210cf41a0 (dma).
[38704.240579] TRB to noop at offset 0x210cf41a0
[38704.240581] Removing canceled TD starting at 0x210cf41b0 (dma).
[38704.240583] TRB to noop at offset 0x210cf41b0
[38704.240592] WARN halted endpoint, queueing URB anyway.
[38704.240598] WARN halted endpoint, queueing URB anyway.
[38704.240603] Cancel URB ffff8800d4438840, dev 1.1, ep 0x2, starting at offset 0x210cf41c0
[38704.240605] // Ding dong!
[38704.240610] Cancel URB ffff8800d4438480, dev 1.1, ep 0x2, starting at offset 0x210cf41d0
[38704.240614] Cancel URB ffff8800d44389c0, dev 1.1, ep 0x2, starting at offset 0x210cf41e0
[38704.240618] Removing canceled TD starting at 0x210cf41c0 (dma).
[38704.240620] TRB to noop at offset 0x210cf41c0
[38704.240623] Removing canceled TD starting at 0x210cf41d0 (dma).
[38704.240625] TRB to noop at offset 0x210cf41d0
[38704.240627] Removing canceled TD starting at 0x210cf41e0 (dma).
[38704.240629] TRB to noop at offset 0x210cf41e0
[38704.240639] WARN halted endpoint, queueing URB anyway.
[38704.240644] WARN halted endpoint, queueing URB anyway.
[38704.240648] WARN halted endpoint, queueing URB anyway.
[38704.240654] Cancel URB ffff8800d4438a80, dev 1.1, ep 0x2, starting at offset 0x210cf41f0
[38704.240656] // Ding dong!
[38704.240661] Cancel URB ffff8800d4438780, dev 1.1, ep 0x2, starting at offset 0x210cf4200
[38704.240665] Removing canceled TD starting at 0x210cf41f0 (dma).
[38704.240667] TRB to noop at offset 0x210cf41f0
[38704.240669] Removing canceled TD starting at 0x210cf4200 (dma).
[38704.240671] TRB to noop at offset 0x210cf4200
[38704.240681] WARN halted endpoint, queueing URB anyway.
[38704.240687] Cancel URB ffff8800d4438f00, dev 1.1, ep 0x2, starting at offset 0x210cf4210
[38704.240689] // Ding dong!
[38704.240694] Cancel URB ffff8800d44386c0, dev 1.1, ep 0x2, starting at offset 0x210cf4220
[38704.240699] Cancel URB ffff8800d4438900, dev 1.1, ep 0x2, starting at offset 0x210cf4230
[38704.240703] Cancel URB ffff8800d4438c00, dev 1.1, ep 0x2, starting at offset 0x210cf4240
[38704.240707] Removing canceled TD starting at 0x210cf4210 (dma).
[38704.240709] TRB to noop at offset 0x210cf4210
[38704.240711] Removing canceled TD starting at 0x210cf4220 (dma).
[38704.240713] TRB to noop at offset 0x210cf4220
[38704.240716] Removing canceled TD starting at 0x210cf4230 (dma).
[38704.240718] TRB to noop at offset 0x210cf4230
[38704.240720] Removing canceled TD starting at 0x210cf4240 (dma).
[38704.240722] TRB to noop at offset 0x210cf4240
[38704.240733] Cancel URB ffff8800d4438b40, dev 1.1, ep 0x2, starting at offset 0x210cf4250
[38704.240735] // Ding dong!
[38704.240740] Cancel URB ffff8800d4438480, dev 1.1, ep 0x2, starting at offset 0x210cf4260
[38704.240745] Removing canceled TD starting at 0x210cf4250 (dma).
[38704.240747] TRB to noop at offset 0x210cf4250
[38704.240749] Removing canceled TD starting at 0x210cf4260 (dma).
[38704.240751] TRB to noop at offset 0x210cf4260
[38704.240764] Cancel URB ffff8800d4438840, dev 1.1, ep 0x2, starting at offset 0x210cf4270
[38704.240766] // Ding dong!
[38704.240771] Cancel URB ffff8800d4438cc0, dev 1.1, ep 0x2, starting at offset 0x210cf4280
[38704.240776] Cancel URB ffff8800d4438a80, dev 1.1, ep 0x2, starting at offset 0x210cf4290
[38704.240787] Removing canceled TD starting at 0x210cf4270 (dma).
[38704.240789] TRB to noop at offset 0x210cf4270
[38704.240791] Removing canceled TD starting at 0x210cf4280 (dma).
[38704.240793] TRB to noop at offset 0x210cf4280
[38704.240796] Removing canceled TD starting at 0x210cf4290 (dma).
[38704.240798] TRB to noop at offset 0x210cf4290
[38704.240812] Queueing reset endpoint command
[38704.240814] Cleaning up stalled endpoint ring
[38704.240816] Finding segment containing stopped TRB.
[38704.240818] Finding endpoint context
[38704.240820] Finding segment containing last TRB in TD.
[38704.240822] Cycle state = 0x0
[38704.240825] New dequeue segment = ffff8802145657a0 (virtual)
[38704.240827] New dequeue pointer = 0x210cf4030 (DMA)
[38704.240829] Queueing new dequeue state
[38704.240832] Set TR Deq Ptr cmd, new deq seg = ffff8802145657a0 (0x210cf4000 dma), new deq ptr = ffff880210cf4030 (0x210cf4030 dma), new cycle = 0
[38704.240834] // Ding dong!
[38704.240843] Ignoring reset ep completion code of 1
[38704.240846] Successful Set TR Deq Ptr cmd, deq = @210cf4030
[38704.332072] Stalled endpoint
[38704.332080] Giveback URB ffff8800d4438480, len = 0, expected = 1526, status = -32
[38704.332135] Queueing reset endpoint command
[38704.332138] Cleaning up stalled endpoint ring
[38704.332140] Finding segment containing stopped TRB.
[38704.332142] Finding endpoint context
[38704.332144] Finding segment containing last TRB in TD.
[38704.332146] Cycle state = 0x0
[38704.332149] New dequeue segment = ffff8802145657a0 (virtual)
[38704.332151] New dequeue pointer = 0x210cf42b0 (DMA)
[38704.332153] Queueing new dequeue state
[38704.332156] Set TR Deq Ptr cmd, new deq seg = ffff8802145657a0 (0x210cf4000 dma), new deq ptr = ffff880210cf42b0 (0x210cf42b0 dma), new cycle = 0
[38704.332158] // Ding dong!
[38704.332170] Ignoring reset ep completion code of 1
[38704.332175] Successful Set TR Deq Ptr cmd, deq = @210cf42b0

>From then on this gets traced every few seconds - probably whenever
dhclient tries to send a packet.

[38704.747916] Stalled endpoint
[38704.747924] Giveback URB ffff8800d4438480, len = 0, expected = 1526, status = -32
[38704.747972] Queueing reset endpoint command
[38704.747982] Cleaning up stalled endpoint ring
[38704.747984] Finding segment containing stopped TRB.
[38704.747986] Finding endpoint context
[38704.747988] Finding segment containing last TRB in TD.
[38704.747990] Cycle state = 0x0
[38704.747992] New dequeue segment = ffff8802145657a0 (virtual)
[38704.747994] New dequeue pointer = 0x210cf42c0 (DMA)
[38704.747996] Queueing new dequeue state
[38704.747999] Set TR Deq Ptr cmd, new deq seg = ffff8802145657a0 (0x210cf4000 dma), new deq ptr = ffff880210cf42c0 (0x210cf42c0 dma), new cycle = 0
[38704.748002] // Ding dong!
[38704.748010] Ignoring reset ep completion code of 1
[38704.748018] Successful Set TR Deq Ptr cmd, deq = @210cf42c0

The only way to recover is to unplug and replug.

	David



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] can: c_can: Speed up rx_poll function
From: Joe Perches @ 2013-10-28 17:12 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
	linux-kernel, kernel
In-Reply-To: <1382979582-10352-1-git-send-email-mpa@pengutronix.de>

On Mon, 2013-10-28 at 17:59 +0100, Markus Pargmann wrote:
> This patch speeds up the rx_poll function by reducing the number of
> register reads.

trivial notes:

> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
[]
> @@ -259,6 +259,12 @@ static u32 c_can_read_reg32(struct c_can_priv *priv, enum reg index)
> +u16 c_can_read_reg16(struct c_can_priv *priv, enum reg index)
> +{
> +	u16 val = priv->read_reg(priv, index);
> +	return val;
> +}

This function doesn't seem useful at all.
It's not exported and it's not static.

Why not use an in-place priv->read_reg(priv, index)?

> +
>  static void c_can_enable_all_interrupts(struct c_can_priv *priv,
>  						int enable)
>  {
> @@ -798,17 +804,21 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
>  	u32 num_rx_pkts = 0;
>  	unsigned int msg_obj, msg_ctrl_save;
>  	struct c_can_priv *priv = netdev_priv(dev);
> -	u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
> +	unsigned long val = c_can_read_reg16(priv, C_CAN_INTPND1_REG);

Probably better as a u16 as detailed below.

> +	/*
> +	 * It is faster to read only one 16bit register. This is only possible
> +	 * for a maximum number of 16 objects.
> +	 */
> +	BUILD_BUG_ON_MSG(C_CAN_MSG_OBJ_RX_LAST > 16,
> +			"Implementation does not support more message objects than 16");
> +
> +	while (quota > 0 && (val = c_can_read_reg16(priv, C_CAN_INTPND1_REG))) {
> +		msg_obj = 0;
> +		while ((msg_obj = find_next_bit(&val, 16, msg_obj)) < 16 &&

Using ffs instead of find_next_bit would be more standard
and probably faster too.



^ permalink raw reply

* Re: [PATCH 4/4] wl1251: spi: add device tree support
From: Grazvydas Ignotas @ 2013-10-28 17:15 UTC (permalink / raw)
  To: Kumar Gala
  Cc: Sebastian Reichel, Mark Rutland, linux-doc, Tony Lindgren,
	Russell King, Sachin Kamat, Ian Campbell, Sebastian Reichel,
	Luciano Coelho, devicetree, Pawel Moll, Stephen Warren,
	John W. Linville, Rob Herring, Bill Pemberton,
	linux-omap@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Greg Kroah-Hartman, linux-wireless@vger.kernel.org,
	linux-kernel@vger.kernel.org, Felipe 
In-Reply-To: <FF34C626-4A49-43B1-B0AD-DC6146ABBB11@codeaurora.org>

On Mon, Oct 28, 2013 at 8:37 AM, Kumar Gala <galak@codeaurora.org> wrote:
> On Oct 27, 2013, at 11:14 AM, Sebastian Reichel wrote:
> > +++ b/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt
> > @@ -0,0 +1,36 @@
> > +* Texas Instruments wl1251 controller
> > +
> > +The wl1251 chip can be connected via SPI or via SDIO. The linux
> > +kernel currently only supports device tree for the SPI variant.
> > +
>
> From the binding I have no idea what this chip actually does, also we don't normally reference linux kernel support in bindings specs (so please remove it).
>
> However, what would expect the SDIO binding to look like?  Or more specifically, how would you distinguish the SPI vs SDIO binding/connection?  I'm wondering if the compatible should be something like "ti,wl1251-spi" and than the sdio can be "ti,wl1251-sdio"

When connected to SDIO, it doesn't act as standard SDIO device and
can't be probed (standard SDIO registers missing), so information has
to come some other way. That used to partially come through
platform_data and partially through a callback from mmc subsystem (see
pandora_wl1251_init_card() in
arch/arm/mach-omap2/board-omap3pandora.c). I don't know much about DT,
but maybe the information that comes from SDIO registers on "normal"
SDIO devices should come through DT in this case too? I don't really
know how that should be integrated with mmc subsystem though..

> > +Required properties:
> > +- compatible : Should be "ti,wl1251"
>
> reg is not listed as a required prop.
>
> > +- interrupts : Should contain interrupt line
> > +- interrupt-parent : Should be the phandle for the interrupt
> > +  controller that services interrupts for this device
> > +- vio-supply : phandle to regulator providing VIO
> > +- power-gpio : GPIO connected to chip's PMEN pin
>
> should be vendor prefixed: ti,power-gpio
>
> > +- For additional required properties on SPI, please consult
> > +  Documentation/devicetree/bindings/spi/spi-bus.txt
> > +
> > +Optional properties:
> > +- ti,use-eeprom : If found, configuration will be loaded from eeprom.
>
> can you be a bit more specific on what cfg will be loaded.  Also, is this property a boolean, if so how do I know which eeprom the cfg is loaded from (is it one that is directly connected to the wl1251?

wl1251 is a wifi chip that can have an optional eeprom connected to it
to store things like calibration stuff and MAC address, and that
eeprom is usually inside a single module with some additional radio
related chips. If the eeprom is connected (like the module on pandora
board has), the driver can issue command to the firmware running on
chip to load that data on it's startup, alternatively the driver can
load calibration from other storage (like it happens on N900).

Gražvydas

^ permalink raw reply

* Re: [PATCH 2/4] wl1251: move power GPIO handling into the driver
From: Grazvydas Ignotas @ 2013-10-28 17:29 UTC (permalink / raw)
  To: Alexander Shiyan, Luciano Coelho, Mark Rutland, devicetree,
	Russell King, Pawel Moll, Ian Campbell, Tony Lindgren,
	Greg Kroah-Hartman, Stephen Warren, linux-doc, John W. Linville,
	Rob Herring, linux-kernel@vger.kernel.org, Sachin Kamat,
	Bill Pemberton, Felipe Balbi, Rob Landley, netdev,
	linux-wireless@vger.kernel.org, linux-omap@vger.kernel.org,
	"linux-arm-kernel@lists.infradead.org"
In-Reply-To: <20131027201218.GA4414@earth.universe>

On Sun, Oct 27, 2013 at 10:12 PM, Sebastian Reichel <sre@debian.org> wrote:
> On Sun, Oct 27, 2013 at 08:24:16PM +0400, Alexander Shiyan wrote:
>> > Move the power GPIO handling from the board code into
>> > the driver. This is a dependency for device tree support.
>> >
>> > Signed-off-by: Sebastian Reichel <sre@debian.org>
>> > ---
>> >  arch/arm/mach-omap2/board-omap3pandora.c     |  2 ++
>> >  arch/arm/mach-omap2/board-rx51-peripherals.c | 11 ++--------
>> >  drivers/net/wireless/ti/wl1251/sdio.c        | 21 +++++++++++++-----
>> >  drivers/net/wireless/ti/wl1251/spi.c         | 33 ++++++++++++++++++----------
>> >  drivers/net/wireless/ti/wl1251/wl1251.h      |  2 +-
>> >  include/linux/wl12xx.h                       |  2 +-
>> >  6 files changed, 43 insertions(+), 28 deletions(-)
>> ...
>> > diff --git a/include/linux/wl12xx.h b/include/linux/wl12xx.h
>> > index b516b4f..a9c723b 100644
>> > --- a/include/linux/wl12xx.h
>> > +++ b/include/linux/wl12xx.h
>> > @@ -49,7 +49,7 @@ enum {
>> >  };
>> >
>> >  struct wl1251_platform_data {
>> > -   void (*set_power)(bool enable);
>> > +   int power_gpio;
>> >     /* SDIO only: IRQ number if WLAN_IRQ line is used, 0 for SDIO IRQs */
>> >     int irq;
>> >     bool use_eeprom;
>> > --
>>
>> What a reason for not using regulator API here with GPIO-based
>> regulator?
>
> I think this pin is not used as power supply, but like an enable pin
> for low power states. Of course the regulator API could still be
> (mis?)used for this, but I think it would be the first linux device
> driver doing this.
>
> Note: I don't have wl1251 documentation.

When wl12xx family of chips is connected through SDIO, we already have
that pin set up as a regulator controlled with the help of mmc
subsystem. When time comes to communicate with the chip, mmc subsystem
sees this as yet another SD card and looks for associated regulator
for it, and the board file has that set up as a fixed regulator
controlling that pin (see pandora_vmmc3 in
arch/arm/mach-omap2/board-omap3pandora.c). To prevent poweroff after
first SDIO communications are over, pm_runtime calls are used in
drivers/net/wireless/ti/wl1251/sdio.c .

I don't know if something similar can be done done in SPI case, but
I'm sure this is not the first your-so-called regulator misuse.

Gražvydas

^ permalink raw reply

* [PATCH] net: x25: Fix dead URLs in Kconfig
From: Michael Drüing @ 2013-10-28 17:33 UTC (permalink / raw)
  To: davem; +Cc: andrew.hendry, linux-x25, netdev, linux-kernel,
	Michael Drüing

Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com

Signed-off-by: Michael Drüing <michael@drueing.de>
---
 net/x25/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/x25/Kconfig b/net/x25/Kconfig
index c959312c..e2fa133 100644
--- a/net/x25/Kconfig
+++ b/net/x25/Kconfig
@@ -16,8 +16,8 @@ config X25
 	  if you want that) and the lower level data link layer protocol LAPB
 	  (say Y to "LAPB Data Link Driver" below if you want that).
 
-	  You can read more about X.25 at <http://www.sangoma.com/x25.htm> and
-	  <http://www.cisco.com/univercd/cc/td/doc/product/software/ios11/cbook/cx25.htm>.
+	  You can read more about X.25 at <http://www.sangoma.com/tutorials/x25/> and
+	  <http://docwiki.cisco.com/wiki/X.25>.
 	  Information about X.25 for Linux is contained in the files
 	  <file:Documentation/networking/x25.txt> and
 	  <file:Documentation/networking/x25-iface.txt>.
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH stable 3.10] ip_gre: Fix WCCPv2 header parsing.
From: Pravin B Shelar @ 2013-10-28 17:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, peter.schmitt82, eric.dumazet, Pravin B Shelar

In case of WCCPv2 GRE header has extra four bytes.  Following
patch pull those extra four bytes so that skb offsets are set
correctly.

CC: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Peter Schmitt <peter.schmitt82@yahoo.de>
Tested-by: Peter Schmitt <peter.schmitt82@yahoo.de>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 include/net/ip_tunnels.h |    2 +-
 net/ipv4/ip_gre.c        |    2 +-
 net/ipv4/ip_tunnel.c     |    4 ++--
 net/ipv4/ipip.c          |    2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 09b1360..79cf958 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -113,7 +113,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 				   __be32 key);
 
 int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
-		  const struct tnl_ptk_info *tpi, bool log_ecn_error);
+		  const struct tnl_ptk_info *tpi, int hdr_len, bool log_ecn_error);
 int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[],
 			 struct ip_tunnel_parm *p);
 int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 855004f..3ddc8df 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -335,7 +335,7 @@ static int ipgre_rcv(struct sk_buff *skb)
 				  iph->saddr, iph->daddr, tpi.key);
 
 	if (tunnel) {
-		ip_tunnel_rcv(tunnel, skb, &tpi, log_ecn_error);
+		ip_tunnel_rcv(tunnel, skb, &tpi, hdr_len, log_ecn_error);
 		return 0;
 	}
 	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index cbfc37f..49fb608 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -402,7 +402,7 @@ static struct ip_tunnel *ip_tunnel_create(struct net *net,
 }
 
 int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
-		  const struct tnl_ptk_info *tpi, bool log_ecn_error)
+		  const struct tnl_ptk_info *tpi, int hdr_len, bool log_ecn_error)
 {
 	struct pcpu_tstats *tstats;
 	const struct iphdr *iph = ip_hdr(skb);
@@ -413,7 +413,7 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
 	skb->protocol = tpi->proto;
 
 	skb->mac_header = skb->network_header;
-	__pskb_pull(skb, tunnel->hlen);
+	__pskb_pull(skb, hdr_len);
 	skb_postpull_rcsum(skb, skb_transport_header(skb), tunnel->hlen);
 #ifdef CONFIG_NET_IPGRE_BROADCAST
 	if (ipv4_is_multicast(iph->daddr)) {
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 7cfc456..f5cc7b3 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -195,7 +195,7 @@ static int ipip_rcv(struct sk_buff *skb)
 	if (tunnel) {
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 			goto drop;
-		return ip_tunnel_rcv(tunnel, skb, &tpi, log_ecn_error);
+		return ip_tunnel_rcv(tunnel, skb, &tpi, 0, log_ecn_error);
 	}
 
 	return -1;
-- 
1.7.1

^ permalink raw reply related

* Re: PROBLEM: GRE forwarding not working with 3.10.x, WCCPv2 and Cisco 7200 Router showing IP0 bad-hlen 4 in tcpdump
From: Pravin Shelar @ 2013-10-28 17:42 UTC (permalink / raw)
  To: Peter Schmitt; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <1382955011.46218.YahooMailNeo@web172005.mail.ir2.yahoo.com>

Thanks for testing it. I sent patch for stable kernel.

On Mon, Oct 28, 2013 at 3:10 AM, Peter Schmitt <peter.schmitt82@yahoo.de> wrote:
> Hi Pravin,
>
> I tested this patch and it works fine with 3.10.17! Now I get correct packets from the GRE-device and can surf again using WCCP.
>
> Thank you very much!
>
> Best regards,
> Peter
>
>
>
>
>
>> On Friday, October 25, 2013 5:25 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>> > Hi Peter,
>> I think its easy to fix this case, Can you try attached patch?
>> If it works I will send formal patch for stable.
>>
>> Thanks,
>> Pravin.
>>
>>
>> On Fri, Oct 18, 2013 at 4:39 AM, Peter Schmitt <peter.schmitt82@yahoo.de>
>> wrote:
>>>  Hi,
>>>
>>>
>>>
>>>>  On Thursday, October 17, 2013 3:44 PM, Eric Dumazet
>> <eric.dumazet@gmail.com> wrote:
>>>>  > On Thu, 2013-10-17 at 11:53 +0100, Peter Schmitt wrote:
>>>>>   Hi,
>>>>>
>>>>>
>>>>>   >
>>>>>   > 3.10 is buggy (for ETH_P_WCCP support I mean), 3.11 has the
>> probable
>>>>>   > fix :
>>>>>   >
>>>>>   > commit 3d7b46cd20e300bd6989fb1f43d46f1b9645816e
>>>>>   > ("ip_tunnel: push generic protocol handling to ip_tunnel
>>>>  module.")
>>>>>   >
>>>>>   > The problem is 3.10 do not pull the extra 4 bytes of WCCP
>>>>>   >
>>>>>   > This is now properly done in iptunnel_pull_header()
>>>>>   >
>>>>>
>>>>>   First of all, many thanks for your help on this.
>>>>>
>>>>>   I tried to apply the fix to the 3.10.16 sources, as I would like
>> to
>>>>>   stay with the long-term line. Unfortunately it does not apply, as
>>>>>   there are a lot of dependencies on other patches.
>>>>>
>>>>>   Do you think there will be a fix for the long-time 3.10 kernel
>> line?
>>>>>   Or can you guide me on how to apply this fix to the long-term
>> kernel?
>>>>
>>>>  3.10 is right in the middle of GRE refactoring, and many bugs are in
>> it.
>>>>
>>>>  It might be very painful to get a complete list of patches to backport.
>>>
>>>  thanks again. Yes I saw that there were lots of changes.
>>>
>>>  Do you think there will be a fix for all the 3.10.x stable long term kernel
>> users? Many of them might not be able to use some newer kernel easily or will
>> this be a "won't fix"?
>>>
>>>  Best regards,
>>>  Peter
>>>
>>

^ permalink raw reply

* [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it
From: Rafał Miłecki @ 2013-10-28 17:42 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: openwrt-devel, Rafał Miłecki

Copying whole packets with skb_copy_from_linear_data_offset is a pretty
bad idea. CPU was spending time in __copy_user_common and network
performance was lower. With the new solution iperf-measured speed
increased from 116Mb/s to 134Mb/s.

Another way to improve performance could be switching to build_skb. It
is cache-specific, so will require testing of various devices.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
---
 drivers/net/ethernet/broadcom/bgmac.c |   71 ++++++++++++++++++++-------------
 1 file changed, 44 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index 6b7541f..fde9a11 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -307,7 +307,6 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
 		struct device *dma_dev = bgmac->core->dma_dev;
 		struct bgmac_slot_info *slot = &ring->slots[ring->start];
 		struct sk_buff *skb = slot->skb;
-		struct sk_buff *new_skb;
 		struct bgmac_rx_header *rx;
 		u16 len, flags;
 
@@ -320,38 +319,56 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
 		len = le16_to_cpu(rx->len);
 		flags = le16_to_cpu(rx->flags);
 
-		/* Check for poison and drop or pass the packet */
-		if (len == 0xdead && flags == 0xbeef) {
-			bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
-				  ring->start);
-		} else {
+		do {
+			struct sk_buff *old_skb = slot->skb;
+			dma_addr_t old_dma_addr = slot->dma_addr;
+			int err;
+
+			/* Check for poison and drop or pass the packet */
+			if (len == 0xdead && flags == 0xbeef) {
+				bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
+					  ring->start);
+				dma_sync_single_for_device(dma_dev,
+							   slot->dma_addr,
+							   BGMAC_RX_BUF_SIZE,
+							   DMA_FROM_DEVICE);
+				break;
+			}
+
 			/* Omit CRC. */
 			len -= ETH_FCS_LEN;
 
-			new_skb = netdev_alloc_skb_ip_align(bgmac->net_dev, len);
-			if (new_skb) {
-				skb_put(new_skb, len);
-				skb_copy_from_linear_data_offset(skb, BGMAC_RX_FRAME_OFFSET,
-								 new_skb->data,
-								 len);
-				skb_checksum_none_assert(skb);
-				new_skb->protocol =
-					eth_type_trans(new_skb, bgmac->net_dev);
-				netif_receive_skb(new_skb);
-				handled++;
-			} else {
-				bgmac->net_dev->stats.rx_dropped++;
-				bgmac_err(bgmac, "Allocation of skb for copying packet failed!\n");
+			/* Prepare new skb as replacement */
+			err = bgmac_dma_rx_skb_for_slot(bgmac, slot);
+			if (err) {
+				slot->skb = old_skb;
+				slot->dma_addr = old_dma_addr;
+
+				/* Poison the old skb */
+				rx->len = cpu_to_le16(0xdead);
+				rx->flags = cpu_to_le16(0xbeef);
+
+				dma_sync_single_for_device(dma_dev,
+							   slot->dma_addr,
+							   BGMAC_RX_BUF_SIZE,
+							   DMA_FROM_DEVICE);
+				break;
 			}
+			bgmac_dma_rx_setup_desc(bgmac, ring, ring->start);
 
-			/* Poison the old skb */
-			rx->len = cpu_to_le16(0xdead);
-			rx->flags = cpu_to_le16(0xbeef);
-		}
+			/* Unmap old skb, we'll pass it to the netfif */
+			dma_unmap_single(dma_dev, old_dma_addr,
+						BGMAC_RX_BUF_SIZE,
+						DMA_FROM_DEVICE);
+
+			skb_put(skb, BGMAC_RX_FRAME_OFFSET + len);
+			skb_pull(skb, BGMAC_RX_FRAME_OFFSET);
 
-		/* Make it back accessible to the hardware */
-		dma_sync_single_for_device(dma_dev, slot->dma_addr,
-					   BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
+			skb_checksum_none_assert(skb);
+			skb->protocol = eth_type_trans(skb, bgmac->net_dev);
+			netif_receive_skb(skb);
+			handled++;
+		} while (0);
 
 		if (++ring->start >= BGMAC_RX_RING_SLOTS)
 			ring->start = 0;
-- 
1.7.10.4

^ permalink raw reply related

* Re: extending ndo_add_rx_vxlan_port
From: Joseph Gasparakis @ 2013-10-28 17:59 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Joseph Gasparakis, Or Gerlitz, John Fastabend, Yan Burman, netdev,
	Stephen Hemminger
In-Reply-To: <CAJZOPZLAg+5dYS=bUCkDMyYq8EmA_uvEv1Zi+drq6Qqe5onrzQ@mail.gmail.com>



On Mon, 28 Oct 2013, Or Gerlitz wrote:

> On Mon, Oct 28, 2013 at 1:44 AM, Joseph Gasparakis wrote:
> 
> > VXLAN implementation is not done like VLAN. VLANs are stacked on top of
> > real interfaces and what you are saying makes sense. VXLAN however is
> > using ip[6]_tunnel_xmit, and this is why we need to notify all the
> 
> As the name of the ndo you added ndo_add_rx_vxlan_port suggests -- HW
> needs that for RX offloads. So basically, what I am saying is: in a
> similar manner that we already program the NIC "over which" the vxlan
> instance is set to listen on the multicast address which is associated
> with that vxlan segement, lets give them a hint that packets arriving
> on this group are vxlan ones, so they can use it for programming
> steering rules.

Why don't you write up some code so everyone has something to look at?
Then we can see what makes sense to do in terms of the existing or new 
ndos.

> 
> 
> > netdevs in the system that implement the add/del vxlan port ndo (i.e. are
> > capable of offloading inner csums). In effect all physical "real" netdevs
> > are candidates to receive/transmit VXLAN traffic, subject to routing
> > tables changing or not.
> 

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-28 17:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028162438.GB14350@gmail.com>

On Mon, Oct 28, 2013 at 05:24:38PM +0100, Ingo Molnar wrote:
> 
> * Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > Looking at the specific cpu counters we get this:
> > 
> > Base:
> >      Total time: 0.179 [sec]
> > 
> >  Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
> > 
> >        1571.304618 task-clock                #    5.213 CPUs utilized            ( +-  0.45% )
> >             14,423 context-switches          #    0.009 M/sec                    ( +-  4.28% )
> >              2,710 cpu-migrations            #    0.002 M/sec                    ( +-  2.83% )
> 
> Hm, for these second round of measurements were you using 'perf stat 
> -a -C ...'?
> 
> The most accurate method of measurement for such single-threaded 
> workloads is something like:
> 
> 	taskset 0x1 perf stat -a -C 1 --repeat 20 ...
> 
> this will bind your workload to CPU#0, and will do PMU measurements 
> only there - without mixing in other CPUs or workloads.
> 
> Thanks,
> 
> 	Ingo
I wasn't, but I will...
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-28 17:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028162044.GA14350@gmail.com>

On Mon, Oct 28, 2013 at 05:20:45PM +0100, Ingo Molnar wrote:
> 
> * Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > Base:
> >        0.093269042 seconds time elapsed                                          ( +-  2.24% )
> > Prefetch (5x64):
> >        0.079440009 seconds time elapsed                                          ( +-  2.29% )
> > Parallel ALU:
> >        0.087666677 seconds time elapsed                                          ( +-  4.01% )
> > Prefetch + Parallel ALU:
> >        0.080758702 seconds time elapsed                                          ( +-  2.34% )
> > 
> > So we can see here that we get about a 1% speedup between the base 
> > and the both (Prefetch + Parallel ALU) case, with prefetch 
> > accounting for most of that speedup.
> 
> Hm, there's still something strange about these results. So the 
> range of the results is 790-930 nsecs. The noise of the measurements 
> is 2%-4%, i.e. 20-40 nsecs.
> 
> The prefetch-only result itself is the fastest of all - 
> statistically equivalent to the prefetch+parallel-ALU result, within 
> the noise range.
> 
> So if prefetch is enabled, turning on parallel-ALU has no measurable 
> effect - which is counter-intuitive. Do you have an 
> theory/explanation for that?
> 
> Thanks,
 I mentioned it farther down, loosely theorizing that running with parallel
alu's in conjunction with a prefetch, puts more pressure on the load/store unit
causing stalls while both alu's wait for the L1 cache to fill.  Not sure if that
makes sense, but I did note that in the both (prefetch+alu case) our data cache
hit rate was somewhat degraded, so I was going to play with the prefetch stride
to see if that fixed the situation.  Regardless I agree, the lack of improvement
in the both case is definately counter-intuitive.

Neil

> 
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply

* Re: [gpio:for-next 67/67] pch_gbe_main.c:undefined reference to `devm_gpio_request_one'
From: David Cohen @ 2013-10-28 18:01 UTC (permalink / raw)
  To: Darren Hart
  Cc: Linus Walleij, David S. Miller, netdev@vger.kernel.org,
	Fengguang Wu, Alexandre Courbot, linux-gpio@vger.kernel.org
In-Reply-To: <1382918677.23829.92.camel@dvhart-mobl4.amr.corp.intel.com>

>>> 	ret = devm_gpio_request_one(&pdev->dev, gpio, flags,
>>> 				    "minnow_phy_reset");
>>> 	if (ret) {
>>> 		dev_err(&pdev->dev,
>>> 			"ERR: Can't request PHY reset GPIO line '%d'\n", gpio);
>>>
>>> But deliberately does not about the probe because there is a chance
>>> things will work just fine. If they do not, the PHY detection code will
>>> print display errors about a failure to communicate over RGMII, and the
>>> device probe will fail from there.
>>>
>>> This still seems like the correct approach to me. Does anyone disagree?
>>
>> Considering this scenario it seems to be correct. But if
>> devm_gpio_request_one() may fail for "unfriendly" reasons too, then
>> it's incomplete.
>>
>>>
>>> (we still need to sort out the GPIOLIB and GPIO_SCH dependency though of
>>> course)
>>
>> Maybe if GPIOLIB has the static inline stubs returning -ENODEV we could
>> use a patch similar to the one attached here.
>>
>
> I think you may have inverted your logic on the return?
>
> +		dev_warn(&pdev->dev,
> +			 "ERR: Can't request PHY reset GPIO line '%d'\n", gpio);
> +		return ret == -ENODEV ? ret : 0;
>
> Did you mean:
>
> +		/*
> 		 * Things may still work if the GPIO driver wasn't
> 		 * compiled in
> 		 */

Actually I did mean what I sent, but I misunderstood the above case
where it may work without GPIO driver.
But anyway, "things *may* still work" IMO may create instability issue
by allowing an uncertain situation to go through. If PCH_GBE depends on
GPIOLIB in MinnowBoard case it should fail right away with a clear
message why.

If pch_gbe driver can detect what devm_gpio_request_one() returns from
the static inline stub then it can create a condition to gracefully
fail. So whoever is interested in MinnowBoard will spend less time
trying to identify this problem.

Br, David Cohen

> +		return ret == -ENODEV ? 0 : ret;
>
> The concern here of course would be someone added another GPIO
> controller over i2c over the expansion connector or something similar
> and did not enable the GPIO_SCH driver, then it could conceivably grab
> the wrong GPIO pin.... or would those never map to GPIO 13?
>


^ permalink raw reply

* [GIT net] Open vSwitch
From: Jesse Gross @ 2013-10-28 18:04 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

One patch for net/3.12 fixing an issue where devices could be in an
invalid state they are removed while still attached to OVS.

The following changes since commit 272b98c6455f00884f0350f775c5342358ebb73f:

  Linux 3.12-rc1 (2013-09-16 16:17:51 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git fixes

for you to fetch changes up to b07c26511e94ab856f3700c56d582c0da36d5b4d:

  openvswitch: fix vport-netdev unregister (2013-10-16 14:50:22 -0700)

----------------------------------------------------------------
Alexei Starovoitov (1):
      openvswitch: fix vport-netdev unregister

 net/openvswitch/dp_notify.c    |  7 +++++--
 net/openvswitch/vport-netdev.c | 16 +++++++++++++---
 net/openvswitch/vport-netdev.h |  1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

^ permalink raw reply

* [PATCH net] openvswitch: fix vport-netdev unregister
From: Jesse Gross @ 2013-10-28 18:04 UTC (permalink / raw)
  To: David Miller
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alexei Starovoitov
In-Reply-To: <1382983491-24432-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

From: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>

The combination of two commits:
commit 8e4e1713e4
("openvswitch: Simplify datapath locking.")
commit 2537b4dd0a
("openvswitch:: link upper device for port devices")

introduced a bug where upper_dev wasn't unlinked upon
netdev_unregister notification

The following steps:

  modprobe openvswitch
  ovs-dpctl add-dp test
  ip tuntap add dev tap1 mode tap
  ovs-dpctl add-if test tap1
  ip tuntap del dev tap1 mode tap

are causing multiple warnings:

[   62.747557] gre: GRE over IPv4 demultiplexor driver
[   62.749579] openvswitch: Open vSwitch switching datapath
[   62.755087] device test entered promiscuous mode
[   62.765911] device tap1 entered promiscuous mode
[   62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready
[   62.769017] ------------[ cut here ]------------
[   62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240()
[   62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60
[   62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769053]  0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006
[   62.769055]  0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58
[   62.769057]  ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8
[   62.769059] Call Trace:
[   62.769062]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769065]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769067]  [<ffffffff8105319a>] warn_slowpath_null+0x1a/0x20
[   62.769069]  [<ffffffff8162a04f>] rollback_registered_many+0x20f/0x240
[   62.769071]  [<ffffffff8162a101>] rollback_registered+0x31/0x40
[   62.769073]  [<ffffffff8162a488>] unregister_netdevice_queue+0x58/0x90
[   62.769075]  [<ffffffff8154f900>] __tun_detach+0x140/0x340
[   62.769077]  [<ffffffff8154fb36>] tun_chr_close+0x36/0x60
[   62.769080]  [<ffffffff811bddaf>] __fput+0xff/0x260
[   62.769082]  [<ffffffff811bdf5e>] ____fput+0xe/0x10
[   62.769084]  [<ffffffff8107b515>] task_work_run+0xb5/0xe0
[   62.769087]  [<ffffffff810029b9>] do_notify_resume+0x59/0x80
[   62.769089]  [<ffffffff813a41fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   62.769091]  [<ffffffff81770f5a>] int_signal+0x12/0x17
[   62.769093] ---[ end trace 838756c62e156ffb ]---
[   62.769481] ------------[ cut here ]------------
[   62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
[   62.769486] sysfs: can not remove 'master', no directory
[   62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
[   62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch]
[   62.769519]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
[   62.769521]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28
[   62.769523]  0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500
[   62.769525] Call Trace:
[   62.769528]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769529]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769531]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
[   62.769533]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
[   62.769535]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
[   62.769538]  [<ffffffff81631ef7>] __netdev_adjacent_dev_remove+0xf7/0x150
[   62.769540]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
[   62.769542]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
[   62.769544]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
[   62.769548]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
[   62.769550]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
[   62.769552]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
[   62.769555]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
[   62.769557]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
[   62.769559]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
[   62.769562]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
[   62.769564]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
[   62.769566]  [<ffffffff8107f44a>] kthread+0xea/0xf0
[   62.769568]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769570]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
[   62.769572]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769573] ---[ end trace 838756c62e156ffc ]---
[   62.769574] ------------[ cut here ]------------
[   62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
[   62.769577] sysfs: can not remove 'upper_test', no directory
[   62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
[   62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch]
[   62.769607]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
[   62.769609]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58
[   62.769611]  0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500
[   62.769613] Call Trace:
[   62.769615]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769617]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769619]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
[   62.769621]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
[   62.769622]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
[   62.769624]  [<ffffffff81631f22>] __netdev_adjacent_dev_remove+0x122/0x150
[   62.769627]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
[   62.769629]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
[   62.769631]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
[   62.769633]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
[   62.769636]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
[   62.769638]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
[   62.769640]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
[   62.769642]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
[   62.769644]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
[   62.769646]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
[   62.769648]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
[   62.769650]  [<ffffffff8107f44a>] kthread+0xea/0xf0
[   62.769652]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769654]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
[   62.769656]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769657] ---[ end trace 838756c62e156ffd ]---
[   62.769724] device tap1 left promiscuous mode

This patch also affects moving devices between net namespaces.

OVS used to ignore netns move notifications which caused problems.
Like:
  ovs-dpctl add-if test tap1
  ip link set tap1 netns 3512
and then removing tap1 inside the namespace will cause hang on missing dev_put.

With this patch OVS will detach dev upon receiving netns move event.

Signed-off-by: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/dp_notify.c    |  7 +++++--
 net/openvswitch/vport-netdev.c | 16 +++++++++++++---
 net/openvswitch/vport-netdev.h |  1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index c323567..5c2dab2 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -65,8 +65,7 @@ void ovs_dp_notify_wq(struct work_struct *work)
 					continue;
 
 				netdev_vport = netdev_vport_priv(vport);
-				if (netdev_vport->dev->reg_state == NETREG_UNREGISTERED ||
-				    netdev_vport->dev->reg_state == NETREG_UNREGISTERING)
+				if (!(netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH))
 					dp_detach_port_notify(vport);
 			}
 		}
@@ -88,6 +87,10 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
 		return NOTIFY_DONE;
 
 	if (event == NETDEV_UNREGISTER) {
+		/* upper_dev_unlink and decrement promisc immediately */
+		ovs_netdev_detach_dev(vport);
+
+		/* schedule vport destroy, dev_put and genl notification */
 		ovs_net = net_generic(dev_net(dev), ovs_net_id);
 		queue_work(system_wq, &ovs_net->dp_notify_work);
 	}
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 09d93c1..d21f77d 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -150,15 +150,25 @@ static void free_port_rcu(struct rcu_head *rcu)
 	ovs_vport_free(vport_from_priv(netdev_vport));
 }
 
-static void netdev_destroy(struct vport *vport)
+void ovs_netdev_detach_dev(struct vport *vport)
 {
 	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
 
-	rtnl_lock();
+	ASSERT_RTNL();
 	netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH;
 	netdev_rx_handler_unregister(netdev_vport->dev);
-	netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp));
+	netdev_upper_dev_unlink(netdev_vport->dev,
+				netdev_master_upper_dev_get(netdev_vport->dev));
 	dev_set_promiscuity(netdev_vport->dev, -1);
+}
+
+static void netdev_destroy(struct vport *vport)
+{
+	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
+
+	rtnl_lock();
+	if (netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH)
+		ovs_netdev_detach_dev(vport);
 	rtnl_unlock();
 
 	call_rcu(&netdev_vport->rcu, free_port_rcu);
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index dd298b5..8df01c11 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -39,5 +39,6 @@ netdev_vport_priv(const struct vport *vport)
 }
 
 const char *ovs_netdev_get_name(const struct vport *);
+void ovs_netdev_detach_dev(struct vport *);
 
 #endif /* vport_netdev.h */
-- 
1.8.3.2

^ permalink raw reply related

* Re: extending ndo_add_rx_vxlan_port
From: Or Gerlitz @ 2013-10-28 18:19 UTC (permalink / raw)
  To: Joseph Gasparakis
  Cc: Or Gerlitz, John Fastabend, Yan Burman, netdev, Stephen Hemminger
In-Reply-To: <alpine.LFD.2.03.1310281049210.31450@intel.com>

On Mon, Oct 28, 2013 at 7:59 PM, Joseph Gasparakis
> On Mon, 28 Oct 2013, Or Gerlitz wrote:
>> On Mon, Oct 28, 2013 at 1:44 AM, Joseph Gasparakis wrote:

>> > VXLAN implementation is not done like VLAN. VLANs are stacked on top of
>> > real interfaces and what you are saying makes sense. VXLAN however is
>> > using ip[6]_tunnel_xmit, and this is why we need to notify all the
>>
>> As the name of the ndo you added ndo_add_rx_vxlan_port suggests -- HW
>> needs that for RX offloads. So basically, what I am saying is: in a
>> similar manner that we already program the NIC "over which" the vxlan
>> instance is set to listen on the multicast address which is associated
>> with that vxlan segement, lets give them a hint that packets arriving
>> on this group are vxlan ones, so they can use it for programming
>> steering rules.
>
> Why don't you write up some code so everyone has something to look at?
> Then we can see what makes sense to do in terms of the existing or new ndos.

sure, code talks, indeed, still, looking on net-next, for the current
ndo there's no in tree consumer unless I miss anything, did I?

Or.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox