Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC 0/1] netfilter: xtables: xt_condition inclusion with namespace fix
From: Luciano Coelho @ 2010-07-22 14:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, kaber, jengelh, sameo

Hi,

This is a respin of the patch Jan sent to the list some time ago.  I've made
the changes proposed by Patrick in order to support multiple namespaces
correctly.

I still need to reapply my condition target and the u32 changes to the
condition on top of this, but I'd like to get some comments before I continue.

Please let me know how this looks.

Cheers,
Luca.

Luciano Coelho (1):
  netfilter: xtables: inclusion of xt_condition

 include/linux/netfilter/Kbuild         |    1 +
 include/linux/netfilter/xt_condition.h |   14 ++
 net/netfilter/Kconfig                  |    8 +
 net/netfilter/Makefile                 |    1 +
 net/netfilter/xt_condition.c           |  294 ++++++++++++++++++++++++++++++++
 5 files changed, 318 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_condition.h
 create mode 100644 net/netfilter/xt_condition.c

^ permalink raw reply

* [RFC 1/1] netfilter: xtables: inclusion of xt_condition
From: Luciano Coelho @ 2010-07-22 14:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, kaber, jengelh, sameo
In-Reply-To: <1279807758-6876-1-git-send-email-luciano.coelho@nokia.com>

xt_condition can be used by userspace to influence decisions in rules
by means of togglable variables without having to reload the entire
ruleset.

This is a respin of the module in Xtables-addons, with support for
multiple namespaces and other small improvements.

Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
---
 include/linux/netfilter/Kbuild         |    1 +
 include/linux/netfilter/xt_condition.h |   14 ++
 net/netfilter/Kconfig                  |    8 +
 net/netfilter/Makefile                 |    1 +
 net/netfilter/xt_condition.c           |  294 ++++++++++++++++++++++++++++++++
 5 files changed, 318 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_condition.h
 create mode 100644 net/netfilter/xt_condition.c

diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index bb103f4..d873f67 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -20,6 +20,7 @@ header-y += xt_TCPOPTSTRIP.h
 header-y += xt_TEE.h
 header-y += xt_TPROXY.h
 header-y += xt_comment.h
+header-y += xt_condition.h
 header-y += xt_connbytes.h
 header-y += xt_connlimit.h
 header-y += xt_connmark.h
diff --git a/include/linux/netfilter/xt_condition.h b/include/linux/netfilter/xt_condition.h
new file mode 100644
index 0000000..4faf3ca
--- /dev/null
+++ b/include/linux/netfilter/xt_condition.h
@@ -0,0 +1,14 @@
+#ifndef _XT_CONDITION_H
+#define _XT_CONDITION_H
+
+#include <linux/types.h>
+
+struct xt_condition_mtinfo {
+	char name[31];
+	__u8 invert;
+
+	/* Used internally by the kernel */
+	void *condvar __attribute__((aligned(8)));
+};
+
+#endif /* _XT_CONDITION_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index aa2f106..8c114b8 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -605,6 +605,14 @@ config NETFILTER_XT_MATCH_COMMENT
 	  If you want to compile it as a module, say M here and read
 	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
 
+config NETFILTER_XT_MATCH_CONDITION
+	tristate '"condition" match support'
+	depends on NETFILTER_ADVANCED
+	depends on PROC_FS
+	---help---
+	This option allows you to match firewall rules against condition
+	variables stored in the /proc/net/nf_condition directory.
+
 config NETFILTER_XT_MATCH_CONNBYTES
 	tristate  '"connbytes" per-connection counter match support'
 	depends on NF_CONNTRACK
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index e28420a..474dd06 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CONDITION) += xt_condition.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNTRACK) += xt_conntrack.o
diff --git a/net/netfilter/xt_condition.c b/net/netfilter/xt_condition.c
new file mode 100644
index 0000000..162aa60
--- /dev/null
+++ b/net/netfilter/xt_condition.c
@@ -0,0 +1,294 @@
+/*
+ *	"condition" match extension for Xtables
+ *
+ *	Description: This module allows firewall rules to match using
+ *	condition variables available through procfs.
+ *
+ *	Authors:
+ *	Stephane Ouellette <ouellettes@videotron.ca>, 2002-10-22
+ *	Massimiliano Hofer <max@nucleus.it>, 2006-05-15
+ *
+ *	This program is free software; you can redistribute it and/or modify it
+ *	under the terms of the GNU General Public License; either version 2
+ *	or 3 of the License, as published by the Free Software Foundation.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/version.h>
+#include <linux/nsproxy.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_condition.h>
+#include <net/netns/generic.h>
+#include <asm/uaccess.h>
+
+MODULE_AUTHOR("Stephane Ouellette <ouellettes@videotron.ca>");
+MODULE_AUTHOR("Massimiliano Hofer <max@nucleus.it>");
+MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>");
+MODULE_DESCRIPTION("Allows rules to match against condition variables");
+MODULE_LICENSE("GPL");
+MODULE_PARM_DESC(condition_list_perms, "default permissions on /proc/net/nf_condition/* files");
+MODULE_PARM_DESC(condition_uid_perms, "default user owner of /proc/net/nf_condition/* files");
+MODULE_PARM_DESC(condition_gid_perms, "default group owner of /proc/net/nf_condition/* files");
+MODULE_ALIAS("ipt_condition");
+MODULE_ALIAS("ip6t_condition");
+
+struct condition_variable {
+	struct list_head list;
+	struct proc_dir_entry *status_proc;
+	unsigned int refcount;
+	bool enabled;
+};
+
+struct condition_net {
+	struct list_head list;
+	struct proc_dir_entry *proc_dir;
+	unsigned int list_perms;
+	unsigned int uid_perms;
+	unsigned int gid_perms;
+};
+
+static int condition_net_id;
+static inline struct condition_net *condition_pernet(struct net *net)
+{
+	return net_generic(net, condition_net_id);
+}
+
+/* proc_lock is a user context only semaphore used for write access */
+/*           to the conditions' list.                               */
+static DEFINE_MUTEX(proc_lock);
+
+static int condition_proc_read(char __user *buffer, char **start, off_t offset,
+			       int length, int *eof, void *data)
+{
+	const struct condition_variable *var = data;
+
+	buffer[0] = var->enabled ? '1' : '0';
+	buffer[1] = '\n';
+	if (length >= 2)
+		*eof = true;
+	return 2;
+}
+
+static int condition_proc_write(struct file *file, const char __user *buffer,
+				unsigned long length, void *data)
+{
+	struct condition_variable *var = data;
+	char newval;
+
+	if (length > 0) {
+		if (get_user(newval, buffer) != 0)
+			return -EFAULT;
+		/* Match only on the first character */
+		switch (newval) {
+		case '0':
+			var->enabled = false;
+			break;
+		case '1':
+			var->enabled = true;
+			break;
+		}
+	}
+	return length;
+}
+
+static bool
+condition_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_condition_mtinfo *info = par->matchinfo;
+	const struct condition_variable *var   = info->condvar;
+
+	return var->enabled ^ info->invert;
+}
+
+static int condition_mt_check(const struct xt_mtchk_param *par)
+{
+	struct xt_condition_mtinfo *info = par->matchinfo;
+	struct condition_variable *var;
+	struct condition_net *cond_net =
+		condition_pernet(current->nsproxy->net_ns);
+
+	/* Forbid certain names */
+	if (*info->name == '\0' || *info->name == '.' ||
+	    info->name[sizeof(info->name)-1] != '\0' ||
+	    memchr(info->name, '/', sizeof(info->name)) != NULL) {
+		pr_info("name not allowed or too long: \"%.*s\"\n",
+			(unsigned int)sizeof(info->name), info->name);
+		return -EINVAL;
+	}
+
+	/*
+	 * Let's acquire the lock, check for the condition and add it
+	 * or increase the reference counter.
+	 */
+	mutex_lock(&proc_lock);
+	list_for_each_entry(var, &cond_net->list, list) {
+		if (strcmp(info->name, var->status_proc->name) == 0) {
+			++var->refcount;
+			mutex_unlock(&proc_lock);
+			info->condvar = var;
+			return 0;
+		}
+	}
+
+	/* At this point, we need to allocate a new condition variable. */
+	var = kmalloc(sizeof(struct condition_variable), GFP_KERNEL);
+	if (var == NULL) {
+		mutex_unlock(&proc_lock);
+		return -ENOMEM;
+	}
+
+	/* Create the condition variable's proc file entry. */
+	var->status_proc = create_proc_entry(info->name,
+					     cond_net->list_perms,
+					     cond_net->proc_dir);
+	if (var->status_proc == NULL) {
+		kfree(var);
+		mutex_unlock(&proc_lock);
+		return -ENOMEM;
+	}
+
+	var->refcount = 1;
+	var->enabled  = false;
+	var->status_proc->data       = var;
+	var->status_proc->read_proc  = condition_proc_read;
+	var->status_proc->write_proc = condition_proc_write;
+	var->status_proc->uid        = cond_net->uid_perms;
+	var->status_proc->gid        = cond_net->gid_perms;
+	list_add(&var->list, &cond_net->list);
+	mutex_unlock(&proc_lock);
+	info->condvar = var;
+	return 0;
+}
+
+static void condition_mt_destroy(const struct xt_mtdtor_param *par)
+{
+	const struct xt_condition_mtinfo *info = par->matchinfo;
+	struct condition_variable *var = info->condvar;
+	struct condition_net *cond_net =
+		condition_pernet(current->nsproxy->net_ns);
+
+	mutex_lock(&proc_lock);
+	if (--var->refcount == 0) {
+		list_del(&var->list);
+		remove_proc_entry(var->status_proc->name,
+				  cond_net->proc_dir);
+		mutex_unlock(&proc_lock);
+		kfree(var);
+		return;
+	}
+	mutex_unlock(&proc_lock);
+}
+
+static struct xt_match condition_mt_reg __read_mostly = {
+	.name       = "condition",
+	.revision   = 1,
+	.family     = NFPROTO_UNSPEC,
+	.matchsize  = sizeof(struct xt_condition_mtinfo),
+	.match      = condition_mt,
+	.checkentry = condition_mt_check,
+	.destroy    = condition_mt_destroy,
+	.me         = THIS_MODULE,
+};
+
+static const char *const dir_name = "nf_condition";
+
+static int __net_init condnet_mt_init(struct net *net)
+{
+	struct condition_net *cond_net = condition_pernet(net);
+
+	INIT_LIST_HEAD(&cond_net->list);
+	cond_net->list_perms = S_IRUSR | S_IWUSR;
+	cond_net->uid_perms  = S_IRUSR | S_IWUSR;
+	cond_net->gid_perms  = S_IRUSR | S_IWUSR;
+
+	cond_net->proc_dir = proc_mkdir(dir_name, net->proc_net);
+
+	return (cond_net->proc_dir == NULL) ? -EACCES : 0;
+}
+
+static void __net_exit condnet_mt_exit(struct net *net)
+{
+	remove_proc_entry(dir_name, net->proc_net);
+}
+
+static struct pernet_operations condition_mt_netops = {
+	.init = condnet_mt_init,
+	.exit = condnet_mt_exit,
+	.id   = &condition_net_id,
+	.size = sizeof(struct condition_net),
+};
+
+static int __init condition_mt_init(void)
+{
+	int ret;
+
+	mutex_init(&proc_lock);
+	ret = xt_register_match(&condition_mt_reg);
+	if (ret < 0)
+		return ret;
+
+	ret = register_pernet_subsys(&condition_mt_netops);
+	if (ret < 0) {
+		xt_unregister_match(&condition_mt_reg);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __exit condition_mt_exit(void)
+{
+	unregister_pernet_subsys(&condition_mt_netops);
+	xt_unregister_match(&condition_mt_reg);
+}
+
+int xt_condition_set_module_perms(const char *val, struct kernel_param *kp)
+{
+	unsigned long l;
+	int ret;
+	struct condition_net *cond_net =
+		condition_pernet(current->nsproxy->net_ns);
+
+	if (!val) return -EINVAL;
+	ret = strict_strtoul(val, 0, &l);
+	if (ret == -EINVAL || ((uint)l != l))
+		return -EINVAL;
+	*((u32 *) ((u8 *) cond_net + (size_t) kp->arg)) = l;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xt_condition_set_module_perms);
+
+int xt_condition_get_module_perms(char *buffer, struct kernel_param *kp)
+{
+	struct condition_net *cond_net =
+		condition_pernet(current->nsproxy->net_ns);
+
+	return sprintf(buffer, "%u",
+		       *((u32 *) ((u8 *) cond_net + (size_t) kp->arg)));
+}
+EXPORT_SYMBOL_GPL(xt_condition_get_module_perms);
+
+module_param_call(list_perms,
+		  xt_condition_set_module_perms,
+		  xt_condition_get_module_perms,
+		  (void *) offsetof(struct condition_net, list_perms),
+		  0600);
+module_param_call(uid_perms,
+		  xt_condition_set_module_perms,
+		  xt_condition_get_module_perms,
+		  (void *) offsetof(struct condition_net, uid_perms),
+		  0600);
+module_param_call(gid_perms,
+		  xt_condition_set_module_perms,
+		  xt_condition_get_module_perms,
+		  (void *) offsetof(struct condition_net, gid_perms),
+		  0600);
+
+module_init(condition_mt_init);
+module_exit(condition_mt_exit);
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH] Driver-core: Fix bluetooth network device rename regression
From: Johannes Berg @ 2010-07-22 14:10 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Eric W. Biederman, Greg KH, Andrew Morton, Greg KH,
	Rafael J. Wysocki, Maciej W. Rozycki, netdev
In-Reply-To: <AANLkTimM5Ea8mQ7aX4kDq3dgF3P-t2Wm3dERhckC69Ja@mail.gmail.com>

On Thu, 2010-07-22 at 15:38 +0200, Kay Sievers wrote:

> Please try to fix these drivers instead, or mark the broken for
> namespaces, if nobody can fix them right now.

We've tried. Nobody, including you, has been able to suggest how to fix
it. And it's not just broken with network namespaces enabled either, as
Eric explained. I really don't see why you keep asking us to fix it when
clearly we cannot -- even you don't know how and you certainly have more
insight into the device model than we do.

johannes

^ permalink raw reply

* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw    HIDIOCGFEATURE and HIDIOCSFEATURE
From: Jiri Kosina @ 2010-07-22 14:14 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Alan Ott, David S Miller, Michael Poole, Bastien Nocera,
	Eric Dumazet, linux-bluetooth, linux-kernel, netdev
In-Reply-To: <1278696815.10421.137.camel@localhost.localdomain>

On Fri, 9 Jul 2010, Marcel Holtmann wrote:

> > >>> what is usb-hid.ko doing here? I would expect a bunch of code
> > >>> duplication with minor difference between USB and Bluetooth.
> > >>>        
> > >> usbhid doesn't have a lot of code for hidraw. Two functions are involved:
> > >>       usbhid_output_raw_report()
> > >>           - calls usb_control_msg() with Get_Report
> > >>       usbhid_get_raw_report()
> > >>           - calls usb_control_msg() with Set_Report
> > >>               OR
> > >>           - calls usb_interrupt_msg() on the Ouput pipe.
> > >>
> > >> This is of course easier than bluetooth because usb_control_msg() is
> > >> synchronous, even when requesting reports, mostly because of the nature
> > >> of USB, where the request and response are part of the same transfer.
> > >>
> > >> For Bluetooth, it's a bit more complicated since the kernel treats it
> > >> more like a networking interface (and indeed it is). My understanding is
> > >> that to make a synchronous transfer in bluetooth, one must:
> > >>       - send the request packet
> > >>       - block (wait_event_*())
> > >>       - when the response is received in the input handler, wake_up_*().
> > >>
> > >> There's not really any code duplication, mostly because initiating
> > >> synchronous USB transfers (input and output) is easy (because of the
> > >> usb_*_msg() functions), while making synchronous Bluetooth transfers
> > >> must be done manually. If there's a nice, convenient, synchronous
> > >> function in Bluetooth similar to usb_control_msg() that I've missed,
> > >> then let me know, as it would simplify this whole thing.
> > >>      
> > > there is not and I don't think we ever get one. My question here was
> > > more in the direction why HID core is doing these synchronously in the
> > > first place. Especially since USB can do everything async as well.
> >
> > I'm open to suggestions. The way I see it is from a user space 
> > perspective. With Get_Feature being on an ioctl(), I don't see any clean 
> > way to do it other than synchronously. Other operating systems (I can 
> > say for sure Windows, Mac OS X, and FreeBSD) handle Get/Set Feature the 
> > same way (synchronously) from user space.
> > 
> > You seem to be proposing an asynchronous interface. What would that look 
> > like from user space?
> 
> not necessarily from user space, but at least from HID core to HIDP and
> usb-hid transports. At least that is what I would expect, Jiri?

Sorry for this taking too long (vacations, conferences, you name it) for 
me to respond.

As all the _raw() callbacks are purely intended for userspace interaction 
anyway, it's perfectly fine (and in fact desirable) for the low-level 
transport drivers to perform these operations synchronously (and that's 
what USB implementation does as well).

Marcel, if your opposition to synchronous interface is strong, we'll have 
to think about other aproaches, but from my POV, the patch is fine as-is 
for Bluetooth.

Thanks,

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH nf-next-2.6] netfilter: add xt_cpu match
From: Jan Engelhardt @ 2010-07-22 14:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, Netfilter Development Mailinglist, netdev
In-Reply-To: <1279807385.2467.67.camel@edumazet-laptop>

On Thursday 2010-07-22 16:03, Eric Dumazet wrote:

>This match is a bit strange, being packet content agnostic...
>+/*
>+ * Yes, packet content is not interesting for us, we only take care
>+ * of cpu handling this packet
>+ */

That is not so strange after all, we have many packet agnostic matches: 
xt_time, xt_condition, xt_IDLETIMER, xt_iface.
So this little comment looks a bit redundant.

Or it seems that academia can't come up with enough new protocols in time that
we have to resort to do -m coffeemaker :)

>@@ -0,0 +1,8 @@
>+#ifndef _XT_CPU_H
>+#define _XT_CPU_H
>+
>+struct xt_cpu_info {
>+	unsigned int	cpu;
>+	int		invert;
>+};
>+#endif /*_XT_MAC_H*/

Please take a read in "Writing Netfilter Modules" e-book :-)
It will tell you that types other than fixed ones are a no-no.

>diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
>index e28420a..0fe7efd 100644
>--- a/net/netfilter/Makefile
>+++ b/net/netfilter/Makefile
>@@ -79,6 +79,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o
>+obj-$(CONFIG_NETFILTER_XT_MATCH_CPU) += xt_cpu.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o
> obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o

Try to keep it alphabetic (KConfig too).

>+ *
>+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 -j REDIRECT --to-port 8080
>+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 -j REDIRECT --to-port 8081
>+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 -j REDIRECT --to-port 8082
>+ * iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 -j REDIRECT --to-port 8083
>+ *
>+ */

Well the commands you already have presented in the commit log, and the 
most efficient place for these is actually the manpage.

>+static bool cpu_mt(const struct sk_buff *skb, struct xt_action_param *par)
>+{
>+	const struct xt_cpu_info *info = par->matchinfo;
>+	bool ret;
>+
>+	ret = info->cpu == smp_processor_id();
>+	ret ^= info->invert;
>+	return ret;
>+}

Looks simple enough that it could do it in a single line,

	return (info->cpu == smp_processor_id()) ^ !!info->invert;

^ permalink raw reply

* Re: [RFC 1/1] netfilter: xtables: inclusion of xt_condition
From: Jan Engelhardt @ 2010-07-22 14:44 UTC (permalink / raw)
  To: Luciano Coelho
  Cc: Netfilter Developer Mailing List, netdev, Patrick McHardy, sameo,
	Alexey Dobriyan
In-Reply-To: <1279807758-6876-2-git-send-email-luciano.coelho@nokia.com>


On Thursday 2010-07-22 16:09, Luciano Coelho wrote:
>+static int condition_mt_check(const struct xt_mtchk_param *par)
>+{
>+	struct xt_condition_mtinfo *info = par->matchinfo;
>+	struct condition_variable *var;
>+	struct condition_net *cond_net =
>+		condition_pernet(current->nsproxy->net_ns);

Cc'ing Alexey who has done the netns support.

Alexey, you added par->net, but given Luciano just did it with 
current->nsproxy->net_ns, do we really need par->net?


>+int xt_condition_set_module_perms(const char *val, struct kernel_param *kp)
>+{
>+	unsigned long l;
>+	int ret;
>+	struct condition_net *cond_net =
>+		condition_pernet(current->nsproxy->net_ns);
>+
>+	if (!val) return -EINVAL;

newline before return.

>+	ret = strict_strtoul(val, 0, &l);
>+	if (ret == -EINVAL || ((uint)l != l))
>+		return -EINVAL;

>+	*((u32 *) ((u8 *) cond_net + (size_t) kp->arg)) = l;

I don't think we need this level of granularity; let the options be 
global, similar to what xt_hashlimit does.
(I am not even sure if kp->arg can be non-multiples-of-4, in which case 
this would be an alignment violation even.)

>+
>+	return 0;
>+}
>+EXPORT_SYMBOL_GPL(xt_condition_set_module_perms);
>+
>+int xt_condition_get_module_perms(char *buffer, struct kernel_param *kp)
>+{
>+	struct condition_net *cond_net =
>+		condition_pernet(current->nsproxy->net_ns);
>+
>+	return sprintf(buffer, "%u",
>+		       *((u32 *) ((u8 *) cond_net + (size_t) kp->arg)));
>+}
>+EXPORT_SYMBOL_GPL(xt_condition_get_module_perms);
>+
>+module_param_call(list_perms,
>+		  xt_condition_set_module_perms,
>+		  xt_condition_get_module_perms,
>+		  (void *) offsetof(struct condition_net, list_perms),
>+		  0600);
>+module_param_call(uid_perms,
>+		  xt_condition_set_module_perms,
>+		  xt_condition_get_module_perms,
>+		  (void *) offsetof(struct condition_net, uid_perms),
>+		  0600);
>+module_param_call(gid_perms,
>+		  xt_condition_set_module_perms,
>+		  xt_condition_get_module_perms,
>+		  (void *) offsetof(struct condition_net, gid_perms),
>+		  0600);
>+
>+module_init(condition_mt_init);
>+module_exit(condition_mt_exit);
>-- 
>1.7.0.4
>


^ permalink raw reply

* Re: [PATCH net-next] sysfs: add attribute to indicate hw address assignment type
From: Stefan Assmann @ 2010-07-22 14:47 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, abadea, netdev, linux-kernel, gospo, gregory.v.rose,
	alexander.h.duyck, leedom, harald
In-Reply-To: <1279807643.2104.1.camel@achroite.uk.solarflarecom.com>

On 22.07.2010 16:07, Ben Hutchings wrote:
> On Thu, 2010-07-22 at 14:50 +0200, Stefan Assmann wrote:
>> On 21.07.2010 15:54, Ben Hutchings wrote:
>>> On Wed, 2010-07-21 at 10:10 +0200, Stefan Assmann wrote:
>>>> I put Alex' idea into code for further discussion, keeping the names
>>>> mentioned here until we agree on the scope of this attribute. When we
>>>> have settled I'll post a patch with proper patch description.
>>> [...]
>>>
>>> Just a little nitpick: I think it would be clearer to use a more
>>> specific term like 'address source' or 'address assignment type' rather
>>> than 'address type'.
>>
>> Here's a proposal for the final patch.
> 
> Looks good, but...
> 
> [...]
>>  /**
>> + * dev_hw_addr_random - Create random MAC and set device flag
>> + * @dev: pointer to net_device structure
>> + * @addr: Pointer to a six-byte array containing the Ethernet address
>> + *
>> + * Generate random MAC to be used by a device and set addr_assign_type
>> + * so the state can be read by sysfs and be used by udev.
>> + */
>> +static inline void dev_hw_addr_random(struct net_device *dev, u8 *hwaddr)
>> +{
>> +	dev->addr_assign_type |= NET_ADDR_RANDOM;
>> +	random_ether_addr(hwaddr);
>> +}
> [...]
> 
> ...why '|=' and not '='?

The intention is to use addr_assign_type as a bit field.

Okay it it might not make too much sense to 'steal' a random MAC
address but in case we add more types later it might get useful.

  Stefan
--
Stefan Assmann         | Red Hat GmbH
Software Engineer      | Otto-Hahn-Strasse 20, 85609 Dornach
                       | HR: Amtsgericht Muenchen HRB 153243
                       | GF: Brendan Lane, Charlie Peters,
sassmann at redhat.com |     Michael Cunningham, Charles Cachera

^ permalink raw reply

* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
From: Neil Horman @ 2010-07-22 14:57 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers
In-Reply-To: <4C4803B3.1020808@jp.fujitsu.com>

On Thu, Jul 22, 2010 at 05:39:15PM +0900, Koki Sanagi wrote:
> (2010/07/21 19:56), Neil Horman wrote:
> > On Wed, Jul 21, 2010 at 04:02:57PM +0900, Koki Sanagi wrote:
> >> (2010/07/20 20:50), Neil Horman wrote:
> >>> On Tue, Jul 20, 2010 at 09:49:10AM +0900, Koki Sanagi wrote:
> >>>> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
> >>>> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
> >>>> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
> >>>> we can check how long it takes to free transmited packets. And using it, we can
> >>>> calculate how many packets driver had at that time. It is useful when a drop of
> >>>> transmited packet is a problem.
> >>>>
> >>>>           <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
> >>>>           <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
> >>>>
> >>>>         udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
> >>>>
> >>>>
> >>>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> >>>> ---
> >>>>  include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
> >>>>  net/core/datagram.c        |    1 +
> >>>>  net/core/dev.c             |    2 ++
> >>>>  net/core/skbuff.c          |    1 +
> >>>>  4 files changed, 46 insertions(+), 0 deletions(-)
> >>>>
> >>>> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> >>>> index 4b2be6d..84c9041 100644
> >>>> --- a/include/trace/events/skb.h
> >>>> +++ b/include/trace/events/skb.h
> >>>> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
> >>>>  		__entry->skbaddr, __entry->protocol, __entry->location)
> >>>>  );
> >>>>  
> >>>> +DECLARE_EVENT_CLASS(free_skb,
> >>>> +
> >>>> +	TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> +	TP_ARGS(skb),
> >>>> +
> >>>> +	TP_STRUCT__entry(
> >>>> +		__field(	void *,	skbaddr	)
> >>>> +	),
> >>>> +
> >>>> +	TP_fast_assign(
> >>>> +		__entry->skbaddr = skb;
> >>>> +	),
> >>>> +
> >>>> +	TP_printk("skbaddr=%p", __entry->skbaddr)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, consume_skb,
> >>>> +
> >>>> +	TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> +	TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
> >>>> +
> >>>> +	TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> +	TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
> >>>> +
> >>>> +	TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> +	TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>
> >>> Why create these last two tracepoints at all?  dev_kfree_skb_irq will eventually
> >>> pass through kfree_skb anyway, getting picked up by the tracepoint there, the
> >>> while the latter won't (since it uses __kfree_skb instead), I think that could
> >>> be fixed up by add a call to trace_kfree_skb there directly, saving you two
> >>> tracepoints.
> >>>
> >>> Neil
> >>>
> >> I think dev_kfree_skb_irq isn't chased by trace_kfree_skb or trace_consume_skb
> >> completely. Because net_tx_action frees skb by __kfree_skb. So it is better to
> >> add trace_kfree_skb before it. skb_free_datagram_locked is same.
> >>
> > It isn't, you're right, but that was the point I made above.  Those missed areas
> > could be easily handled by adding calls to trace_kfree_skb which already exists,
> > to the missed areas.  Then you don't need to create those new tracepoints.  The
> > way your doing this, if someone wants to trace all skb frees in debugfs, they
> > would have to enable three tracepoints, not just one.  Not that thats the point
> > of your patch, but its something to consider, and it simplifies your code.
> > Neil
> > 
> 
> O.K. I've re-made a patch to use trace_kfree_skb instead of
> trace_dev_kfree_skb_irq and trace_skb_free_datagram_locked.
> But I've got a problem.
> I should use not __builtin_return_address, but macro or function which returns
> current address. But I don't know any macro like that. Do you know any solution ?
> 
Since the trace call is the first thing in the function, why not just pass in
skb_free_datagram_locked as the pointer.  That should work out properly
Neil

> 

^ permalink raw reply

* Re: minstrel_tx_status mac80211 WARNINGs in vanilla 2.6.34.1
From: John W. Linville @ 2010-07-22 14:47 UTC (permalink / raw)
  To: Sven Geggus; +Cc: netdev, linux-wireless, nbd
In-Reply-To: <20100722123000.GA16657@geggus.net>

On Thu, Jul 22, 2010 at 02:30:01PM +0200, Sven Geggus wrote:
> Hello,
> 
> running vanilla 2.6.34.1 I get the following warnings in  kernel log:
> 
> WARNING: at net/mac80211/rc80211_minstrel.c:70 minstrel_tx_status+0x67/0xd1 [mac80211]()
> Hardware name: SCENIC E300/E600
> Modules linked in: i915 drm_kms_helper drm video backlight output lp loop
> snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
> snd_seq_dummy zd1211rw snd_seq_oss usbhid mac80211 option cfg80211
> snd_seq_midi usbserial snd_rawmidi snd_seq_midi_event snd_seq snd_timer
> snd_seq_device snd parport_pc ehci_hcd uhci_hcd soundcore intel_agp parport
> usbcore nls_base snd_page_alloc agpgart rng_core floppy sg
> Pid: 0, comm: swapper Tainted: G        W  2.6.34.1 #3
> Call Trace:
>  [<c102af25>] warn_slowpath_common+0x60/0x90
>  [<c102af62>] warn_slowpath_null+0xd/0x10
>  [<f83cbd48>] minstrel_tx_status+0x67/0xd1 [mac80211]
>  [<f83b6eb1>] ieee80211_tx_status+0x1f6/0x5ac [mac80211]
>  [<c1261533>] ? skb_dequeue+0x45/0x4c
>  [<f83b6896>] ieee80211_tasklet_handler+0x61/0xd6 [mac80211]
>  [<c102ed7d>] tasklet_action+0x62/0x9f
>  [<c102f331>] __do_softirq+0x77/0xe5
>  [<c102f3c5>] do_softirq+0x26/0x2b
>  [<c102f52f>] irq_exit+0x29/0x66
>  [<c1003e90>] do_IRQ+0x85/0x9b
>  [<c1002d29>] common_interrupt+0x29/0x30
>  [<c10083ac>] ? default_idle+0x2d/0x42
>  [<c1001a9b>] cpu_idle+0x44/0x71
>  [<c12e00de>] rest_init+0x96/0x98
>  [<c1498862>] start_kernel+0x2a5/0x2aa
>  [<c14980b7>] i386_start_kernel+0xb7/0xbf
> ---[ end trace f22ceacef336878f ]---
> 
> Wireless driver is zd1211rw.
> 
> Did not test with older kernel because this device has not been in user on
> this machine before.
> 
> WLAN does however seem to work anyway.

Well, I just took a quick look -- so, I'm not 100% sure...

But, it looks to me like zd1211rw is reporting tx status on a rate
that minstrel didn't expect it to use.  It seems like the hardware
is pre-wired to do retries on sequentially lower rates, which seems
a bit incompatible with minstrel's worldview.

Felix, can we accomodate this?  The "WLAN does however seem to work
anyway" seems to suggest things work, so can we at least not yell
about it?

Thanks,

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [RFC 1/1] netfilter: xtables: inclusion of xt_condition
From: Luciano Coelho @ 2010-07-22 15:16 UTC (permalink / raw)
  To: ext Jan Engelhardt
  Cc: Netfilter Developer Mailing List, netdev@vger.kernel.org,
	Patrick McHardy, sameo@linux.intel.com, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1007221621270.1619@obet.zrqbmnf.qr>

On Thu, 2010-07-22 at 16:44 +0200, ext Jan Engelhardt wrote:
> On Thursday 2010-07-22 16:09, Luciano Coelho wrote:
> >+static int condition_mt_check(const struct xt_mtchk_param *par)
> >+{
> >+	struct xt_condition_mtinfo *info = par->matchinfo;
> >+	struct condition_variable *var;
> >+	struct condition_net *cond_net =
> >+		condition_pernet(current->nsproxy->net_ns);
> 
> Cc'ing Alexey who has done the netns support.
> 
> Alexey, you added par->net, but given Luciano just did it with 
> current->nsproxy->net_ns, do we really need par->net?
> 
> 
> >+int xt_condition_set_module_perms(const char *val, struct kernel_param *kp)
> >+{
> >+	unsigned long l;
> >+	int ret;
> >+	struct condition_net *cond_net =
> >+		condition_pernet(current->nsproxy->net_ns);
> >+
> >+	if (!val) return -EINVAL;
> 
> newline before return.

Sure! I copied this from params.c.  I'll fix it.


> >+	ret = strict_strtoul(val, 0, &l);
> >+	if (ret == -EINVAL || ((uint)l != l))
> >+		return -EINVAL;
> 
> >+	*((u32 *) ((u8 *) cond_net + (size_t) kp->arg)) = l;
> 
> I don't think we need this level of granularity; let the options be 
> global, similar to what xt_hashlimit does.

I did this according to Patrick's comment:
> > proc_net_condition is a global variable, so this won't work for
> > namespaces. What the code does is reinitialize it when instantiating
> > a new namespace, so it will always point to the last instantiated
> > namespace.
> > 
> > The same problem exists for the condition_list, each namespace
> > should only be able to access its own conditions.
> 
> This also applies to the permission variables. Basically, we shouldn't
> be having any globals except perhaps the mutex. You probably need a
> module_param_call function to set them for the correct namespace (you
> can access that through current->nsproxy->net_ns).

I found it a bit strange to be able to change the module params in a
per-netns basis, but it is actually possible if you're changing the
parameters via sysfs.  I tried it and it even seems to work. ;)

I can't see any module parameters in the xt_hashlimit.c file.  Am I
looking in the wrong place?

I would be fine with making the module params global (as they were
before), if that's fine with Patrick too.


> (I am not even sure if kp->arg can be non-multiples-of-4, in which case 
> this would be an alignment violation even.)

I'm passing size_t in kp->arg.  It looks quite ugly, because usually
kp->arg is a pointer to some data.  But at least this way, using
offsetof(), I could avoid lots of repeated code for the options...


-- 
Cheers,
Luca.


^ permalink raw reply

* Re: [PATCH nf-next-2.6] netfilter: add xt_cpu match
From: Eric Dumazet @ 2010-07-22 15:18 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Patrick McHardy, Netfilter Development Mailinglist, netdev
In-Reply-To: <alpine.LSU.2.01.1007221611510.1619@obet.zrqbmnf.qr>

Le jeudi 22 juillet 2010 à 16:19 +0200, Jan Engelhardt a écrit :
> On Thursday 2010-07-22 16:03, Eric Dumazet wrote:
> 
> >This match is a bit strange, being packet content agnostic...
> >+/*
> >+ * Yes, packet content is not interesting for us, we only take care
> >+ * of cpu handling this packet
> >+ */
> 
> That is not so strange after all, we have many packet agnostic matches: 
> xt_time, xt_condition, xt_IDLETIMER, xt_iface.
> So this little comment looks a bit redundant.
> 
> Or it seems that academia can't come up with enough new protocols in time that
> we have to resort to do -m coffeemaker :)
> 
> >@@ -0,0 +1,8 @@
> >+#ifndef _XT_CPU_H
> >+#define _XT_CPU_H
> >+
> >+struct xt_cpu_info {
> >+	unsigned int	cpu;
> >+	int		invert;
> >+};
> >+#endif /*_XT_MAC_H*/
> 
> Please take a read in "Writing Netfilter Modules" e-book :-)
> It will tell you that types other than fixed ones are a no-no.

Ok, let's do that, but I doubt sizeof(int) can be different than 4 on a
Linux 2.6 host right now.

I prefer not doing the !!info->invert, and do the check only once.

Thanks

[PATCH nf-next-2.6] netfilter: add xt_cpu match

In some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.

With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)

Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.

Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.

Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
        -j REDIRECT --to-port 8080

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
        -j REDIRECT --to-port 8081

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
        -j REDIRECT --to-port 8082

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
        -j REDIRECT --to-port 8083


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netfilter/Kbuild   |    3 -
 include/linux/netfilter/xt_cpu.h |   11 +++++
 net/netfilter/Kconfig            |    9 ++++
 net/netfilter/Makefile           |    1 
 net/netfilter/xt_cpu.c           |   63 +++++++++++++++++++++++++++++
 5 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index bb103f4..1041a1d 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -19,12 +19,13 @@ header-y += xt_TCPMSS.h
 header-y += xt_TCPOPTSTRIP.h
 header-y += xt_TEE.h
 header-y += xt_TPROXY.h
+header-y += xt_cluster.h
 header-y += xt_comment.h
 header-y += xt_connbytes.h
 header-y += xt_connlimit.h
 header-y += xt_connmark.h
 header-y += xt_conntrack.h
-header-y += xt_cluster.h
+header-y += xt_cpu.h
 header-y += xt_dccp.h
 header-y += xt_dscp.h
 header-y += xt_esp.h
diff --git a/include/linux/netfilter/xt_cpu.h b/include/linux/netfilter/xt_cpu.h
index e69de29..93c7f11 100644
--- a/include/linux/netfilter/xt_cpu.h
+++ b/include/linux/netfilter/xt_cpu.h
@@ -0,0 +1,11 @@
+#ifndef _XT_CPU_H
+#define _XT_CPU_H
+
+#include <linux/types.h>
+
+struct xt_cpu_info {
+	__u32	cpu;
+	__u32	invert;
+};
+
+#endif /*_XT_CPU_H*/
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index aa2f106..523e8d0 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -647,6 +647,15 @@ config NETFILTER_XT_MATCH_CONNTRACK
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_MATCH_CPU
+	tristate '"cpu" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  CPU matching allows you to match packets based on the CPU
+	  currently handling the packet.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_DCCP
 	tristate '"dccp" protocol match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index e28420a..6da84c3 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -69,6 +69,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNTRACK) += xt_conntrack.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_CPU) += xt_cpu.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_DCCP) += xt_dccp.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_DSCP) += xt_dscp.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_ESP) += xt_esp.o
diff --git a/net/netfilter/xt_cpu.c b/net/netfilter/xt_cpu.c
index e69de29..b39db8a 100644
--- a/net/netfilter/xt_cpu.c
+++ b/net/netfilter/xt_cpu.c
@@ -0,0 +1,63 @@
+/* Kernel module to match running CPU */
+
+/*
+ * Might be used to distribute connections on several daemons, if
+ * RPS (Remote Packet Steering) is enabled or NIC is multiqueue capable,
+ * each RX queue IRQ affined to one CPU (1:1 mapping)
+ *
+ */
+
+/* (C) 2010 Eric Dumazet
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter/xt_cpu.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Dumazet <eric.dumazet@gmail.com>");
+MODULE_DESCRIPTION("Xtables: CPU match");
+
+static int cpu_mt_check(const struct xt_mtchk_param *par)
+{
+	const struct xt_cpu_info *info = par->matchinfo;
+
+	if (info->invert & ~1)
+		return -EINVAL;
+	return 0;
+}
+
+static bool cpu_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_cpu_info *info = par->matchinfo;
+
+	return (info->cpu == smp_processor_id()) ^ info->invert;
+}
+
+static struct xt_match cpu_mt_reg __read_mostly = {
+	.name       = "cpu",
+	.revision   = 0,
+	.family     = NFPROTO_UNSPEC,
+	.checkentry = cpu_mt_check,
+	.match      = cpu_mt,
+	.matchsize  = sizeof(struct xt_cpu_info),
+	.me         = THIS_MODULE,
+};
+
+static int __init cpu_mt_init(void)
+{
+	return xt_register_match(&cpu_mt_reg);
+}
+
+static void __exit cpu_mt_exit(void)
+{
+	xt_unregister_match(&cpu_mt_reg);
+}
+
+module_init(cpu_mt_init);
+module_exit(cpu_mt_exit);


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw    HIDIOCGFEATURE and HIDIOCSFEATURE
From: Marcel Holtmann @ 2010-07-22 15:21 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Alan Ott, David S Miller, Michael Poole, Bastien Nocera,
	Eric Dumazet, linux-bluetooth, linux-kernel, netdev
In-Reply-To: <alpine.LNX.2.00.1007221612450.4741@pobox.suse.cz>

Hi Jiri,

> > > >>> what is usb-hid.ko doing here? I would expect a bunch of code
> > > >>> duplication with minor difference between USB and Bluetooth.
> > > >>>        
> > > >> usbhid doesn't have a lot of code for hidraw. Two functions are involved:
> > > >>       usbhid_output_raw_report()
> > > >>           - calls usb_control_msg() with Get_Report
> > > >>       usbhid_get_raw_report()
> > > >>           - calls usb_control_msg() with Set_Report
> > > >>               OR
> > > >>           - calls usb_interrupt_msg() on the Ouput pipe.
> > > >>
> > > >> This is of course easier than bluetooth because usb_control_msg() is
> > > >> synchronous, even when requesting reports, mostly because of the nature
> > > >> of USB, where the request and response are part of the same transfer.
> > > >>
> > > >> For Bluetooth, it's a bit more complicated since the kernel treats it
> > > >> more like a networking interface (and indeed it is). My understanding is
> > > >> that to make a synchronous transfer in bluetooth, one must:
> > > >>       - send the request packet
> > > >>       - block (wait_event_*())
> > > >>       - when the response is received in the input handler, wake_up_*().
> > > >>
> > > >> There's not really any code duplication, mostly because initiating
> > > >> synchronous USB transfers (input and output) is easy (because of the
> > > >> usb_*_msg() functions), while making synchronous Bluetooth transfers
> > > >> must be done manually. If there's a nice, convenient, synchronous
> > > >> function in Bluetooth similar to usb_control_msg() that I've missed,
> > > >> then let me know, as it would simplify this whole thing.
> > > >>      
> > > > there is not and I don't think we ever get one. My question here was
> > > > more in the direction why HID core is doing these synchronously in the
> > > > first place. Especially since USB can do everything async as well.
> > >
> > > I'm open to suggestions. The way I see it is from a user space 
> > > perspective. With Get_Feature being on an ioctl(), I don't see any clean 
> > > way to do it other than synchronously. Other operating systems (I can 
> > > say for sure Windows, Mac OS X, and FreeBSD) handle Get/Set Feature the 
> > > same way (synchronously) from user space.
> > > 
> > > You seem to be proposing an asynchronous interface. What would that look 
> > > like from user space?
> > 
> > not necessarily from user space, but at least from HID core to HIDP and
> > usb-hid transports. At least that is what I would expect, Jiri?
> 
> Sorry for this taking too long (vacations, conferences, you name it) for 
> me to respond.
> 
> As all the _raw() callbacks are purely intended for userspace interaction 
> anyway, it's perfectly fine (and in fact desirable) for the low-level 
> transport drivers to perform these operations synchronously (and that's 
> what USB implementation does as well).
> 
> Marcel, if your opposition to synchronous interface is strong, we'll have 
> to think about other aproaches, but from my POV, the patch is fine as-is 
> for Bluetooth.

that the ioctl() API is synchronous is fine to me, however pushing that
down to the transport drivers seems wrong to me. I think the HID core
should be able to handle a fully asynchronous transport driver. I know
that with the USB subsystem you are little bit spoiled here, but for
Bluetooth it is not the case. And in the end even using the asynchronous
USB URB calls would be nice as well.

So why not make the core actually wait for responses from the transport
driver. I would make the transport drivers a lot simpler in the long
run. And I know that most likely besides Bluetooth and USB you won't see
another, but you never know.

Regards

Marcel



^ permalink raw reply

* Re: [PATCH] Driver-core: Fix bluetooth network device rename regression
From: Kay Sievers @ 2010-07-22 15:30 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Eric W. Biederman, Greg KH, Andrew Morton, Greg KH,
	Rafael J. Wysocki, Maciej W. Rozycki, netdev
In-Reply-To: <1279807845.12439.20.camel@jlt3.sipsolutions.net>

On Thu, Jul 22, 2010 at 16:10, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Thu, 2010-07-22 at 15:38 +0200, Kay Sievers wrote:
>>
>> Please try to fix these drivers instead, or mark the broken for
>> namespaces, if nobody can fix them right now.
>
> We've tried. Nobody, including you, has been able to suggest how to fix
> it.

I did, and several times. Here are the options again:

- Split the driver in two modules, so the auto-cleanup of the netdev
can be done by the second module when the device is force-unloaded
without taking any references to the code while in use (netdev specif
behavior).

- Move the device cleanup code somehow in the core by adding proper
functions to bus devices, similar to the completely mis-used
device_create() function for class devices. This would also be a
proper fix for the weird driver core use.

- Do not allow to force-unload the module while the netdev is in use.
You would need some "destruct" command for the device then, which
removes the netdev, and to be able to unload the module.

> And it's not just broken with network namespaces enabled either, as

So what's the problem without the sysfs ns then? I didn't hear any the
last couple of years.

> Eric explained. I really don't see why you keep asking us to fix it when
> clearly we cannot -- even you don't know how and you certainly have more
> insight into the device model than we do.

Sure, and I ask again to fix the drivers, instead of sneaking-in dirty
hacks into the core, just by calling an expected behavior a
"regression". This is not a core bug, and should not be worked around
that way in the core.

If there is a new requirement for the core (like possibly point 2
above), we can surely look into making this happen.

We can not add lists of individual subsystems to generic core
functions to work around broken drivers. I hope you will understand
that.

Thanks,
Kay

^ permalink raw reply

* Re: [RFC 1/1] netfilter: xtables: inclusion of xt_condition
From: Jan Engelhardt @ 2010-07-22 15:36 UTC (permalink / raw)
  To: Luciano Coelho
  Cc: Netfilter Developer Mailing List, netdev@vger.kernel.org,
	Patrick McHardy, sameo@linux.intel.com, Alexey Dobriyan
In-Reply-To: <1279811808.1630.8.camel@powerslave>


On Thursday 2010-07-22 17:16, Luciano Coelho wrote:
>> >+	ret = strict_strtoul(val, 0, &l);
>> >+	if (ret == -EINVAL || ((uint)l != l))
>> >+		return -EINVAL;
>> 
>> >+	*((u32 *) ((u8 *) cond_net + (size_t) kp->arg)) = l;
>> 
>> I don't think we need this level of granularity; let the options be 
>> global, similar to what xt_hashlimit does.
>
>I did this according to Patrick's comment:
>> > proc_net_condition is a global variable, so this won't work for
>> > namespaces. What the code does is reinitialize it when instantiating
>> > a new namespace, so it will always point to the last instantiated
>> > namespace.
>> > 
>> > The same problem exists for the condition_list, each namespace
>> > should only be able to access its own conditions.
>> 
>> This also applies to the permission variables. Basically, we shouldn't
>> be having any globals except perhaps the mutex. You probably need a
>> module_param_call function to set them for the correct namespace (you
>> can access that through current->nsproxy->net_ns).
>
>I found it a bit strange to be able to change the module params in a
>per-netns basis, but it is actually possible if you're changing the
>parameters via sysfs.  I tried it and it even seems to work. ;)
>
>I can't see any module parameters in the xt_hashlimit.c file.  Am I
>looking in the wrong place?

Oops, xt_recent.c.

>I would be fine with making the module params global (as they were
>before), if that's fine with Patrick too.

"When was the last time you needed to change the default ownership
when you _also_ have the possibility to chown each procfs file
individually?"

>> (I am not even sure if kp->arg can be non-multiples-of-4, in which case 
>> this would be an alignment violation even.)
>
>I'm passing size_t in kp->arg.  It looks quite ugly, because usually
>kp->arg is a pointer to some data.  But at least this way, using
>offsetof(), I could avoid lots of repeated code for the options...

if kp->arg is 1, ((u8*)cond_net + kp->arg) yields a pointer that's
usually not aligned for u32. (And C pedants would probably argue
that is should be char* not u8*, even if the one is a typedef
of another.)

^ permalink raw reply

* Re: [PATCH nf-next-2.6] netfilter: add xt_cpu match
From: Jan Engelhardt @ 2010-07-22 15:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, Netfilter Development Mailinglist, netdev
In-Reply-To: <1279811939.2467.79.camel@edumazet-laptop>

On Thursday 2010-07-22 17:18, Eric Dumazet wrote:

>Le jeudi 22 juillet 2010 à 16:19 +0200, Jan Engelhardt a écrit :
>> On Thursday 2010-07-22 16:03, Eric Dumazet wrote:
>> 
>> >This match is a bit strange, being packet content agnostic...
>> >+/*
>> >+ * Yes, packet content is not interesting for us, we only take care
>> >+ * of cpu handling this packet
>> >+ */
>> 
>> That is not so strange after all, we have many packet agnostic matches: 
>> xt_time, xt_condition, xt_IDLETIMER, xt_iface.
>> So this little comment looks a bit redundant.
>> 
>> Or it seems that academia can't come up with enough new protocols in time that
>> we have to resort to do -m coffeemaker :)
>> 
>> >@@ -0,0 +1,8 @@
>> >+#ifndef _XT_CPU_H
>> >+#define _XT_CPU_H
>> >+
>> >+struct xt_cpu_info {
>> >+	unsigned int	cpu;
>> >+	int		invert;
>> >+};
>> >+#endif /*_XT_MAC_H*/
>> 
>> Please take a read in "Writing Netfilter Modules" e-book :-)
>> It will tell you that types other than fixed ones are a no-no.
>
>Ok, let's do that, but I doubt sizeof(int) can be different than 4 on a
>Linux 2.6 host right now.

Never say never. "long" already bit people in the past, and now we
have that CONFIG_COMPAT stuff.

If invert is the only flag, perhaps it makes sense to use __u8 
for it. 

>I prefer not doing the !!info->invert, and do the check only once.

>+static int cpu_mt_check(const struct xt_mtchk_param *par)
>+{
>+	const struct xt_cpu_info *info = par->matchinfo;
>+
>+	if (info->invert & ~1)
>+		return -EINVAL;
>+	return 0;
>+}
>+
>+static bool cpu_mt(const struct sk_buff *skb, struct xt_action_param *par)
>+{
>+	const struct xt_cpu_info *info = par->matchinfo;
>+
>+	return (info->cpu == smp_processor_id()) ^ info->invert;
>+}

That works nicely indeed. Do you anticipate any future flags?


^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-22 15:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C06A580.9060300@kernel.org>

On Wed, Jun 02, 2010 at 08:40:00PM +0200, Tejun Heo wrote:
> Replace vhost_workqueue with per-vhost kthread.  Other than callback
> argument change from struct work_struct * to struct vhost_work *,
> there's no visible change to vhost_poll_*() interface.
> 
> This conversion is to make each vhost use a dedicated kthread so that
> resource control via cgroup can be applied.
> 
> Partially based on Sridhar Samudrala's patch.
> 
> * Updated to use sub structure vhost_work instead of directly using
>   vhost_poll at Michael's suggestion.
> 
> * Added flusher wake_up() optimization at Michael's suggestion.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>

All the tricky barrier pairing made me uncomfortable.  So I came up with
this on top (untested): if we do all operations under the spinlock, we
can get by without barriers and atomics.  And since we need the lock for
list operations anyway, this should have no paerformance impact.

What do you think?


diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 0c6b533..7730a30 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -73,7 +73,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
 	INIT_LIST_HEAD(&work->node);
 	work->fn = fn;
 	init_waitqueue_head(&work->done);
-	atomic_set(&work->flushing, 0);
+	work->flushing = 0;
 	work->queue_seq = work->done_seq = 0;
 }
 
@@ -99,13 +99,23 @@ void vhost_poll_stop(struct vhost_poll *poll)
 void vhost_poll_flush(struct vhost_poll *poll)
 {
 	struct vhost_work *work = &poll->work;
-	int seq = work->queue_seq;
+	unsigned seq, left;
+	int flushing;
 
-	atomic_inc(&work->flushing);
-	smp_mb__after_atomic_inc();	/* mb flush-b0 paired with worker-b1 */
-	wait_event(work->done, seq - work->done_seq <= 0);
-	atomic_dec(&work->flushing);
-	smp_mb__after_atomic_dec();	/* rmb flush-b1 paired with worker-b0 */
+	spin_lock_irq(&dev->work_lock);
+	seq = work->queue_seq;
+	work->flushing++;
+	spin_unlock_irq(&dev->work_lock);
+	wait_event(work->done, {
+		   spin_lock_irq(&dev->work_lock);
+		   left = work->done_seq - seq;
+		   spin_unlock_irq(&dev->work_lock);
+		   left < UINT_MAX / 2;
+	});
+	spin_lock_irq(&dev->work_lock);
+	flushing = --work->flushing;
+	spin_unlock_irq(&dev->work_lock);
+	BUG_ON(flushing < 0);
 }
 
 void vhost_poll_queue(struct vhost_poll *poll)
@@ -151,37 +161,37 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 static int vhost_worker(void *data)
 {
 	struct vhost_dev *dev = data;
-	struct vhost_work *work;
+	struct vhost_work *work = NULL;
 
-repeat:
-	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
 
-	if (kthread_should_stop()) {
-		__set_current_state(TASK_RUNNING);
-		return 0;
-	}
+		if (kthread_should_stop()) {
+			__set_current_state(TASK_RUNNING);
+			return 0;
+		}
 
-	work = NULL;
-	spin_lock_irq(&dev->work_lock);
-	if (!list_empty(&dev->work_list)) {
-		work = list_first_entry(&dev->work_list,
-					struct vhost_work, node);
-		list_del_init(&work->node);
-	}
-	spin_unlock_irq(&dev->work_lock);
+		spin_lock_irq(&dev->work_lock);
+		if (work) {
+			work->done_seq = work->queue_seq;
+			if (work->flushing)
+				wake_up_all(&work->done);
+		}
+		if (!list_empty(&dev->work_list)) {
+			work = list_first_entry(&dev->work_list,
+						struct vhost_work, node);
+			list_del_init(&work->node);
+		} else
+			work = NULL;
+		spin_unlock_irq(&dev->work_lock);
+
+		if (work) {
+			__set_current_state(TASK_RUNNING);
+			work->fn(work);
+		} else
+			schedule();
 
-	if (work) {
-		__set_current_state(TASK_RUNNING);
-		work->fn(work);
-		smp_wmb();	/* wmb worker-b0 paired with flush-b1 */
-		work->done_seq = work->queue_seq;
-		smp_mb();	/* mb worker-b1 paired with flush-b0 */
-		if (atomic_read(&work->flushing))
-			wake_up_all(&work->done);
-	} else
-		schedule();
-
-	goto repeat;
+	}
 }
 
 long vhost_dev_init(struct vhost_dev *dev,
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 0e63091..3693327 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -27,9 +27,9 @@ struct vhost_work {
 	struct list_head	  node;
 	vhost_work_fn_t		  fn;
 	wait_queue_head_t	  done;
-	atomic_t		  flushing;
-	int			  queue_seq;
-	int			  done_seq;
+	int			  flushing;
+	unsigned		  queue_seq;
+	unsigned		  done_seq;
 };
 
 /* Poll a file (eventfd or socket) */

^ permalink raw reply related

* Re: macvtap: Limit packet queue length
From: Shirley Ma @ 2010-07-22 15:59 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev, Arnd Bergmann, Mark Wagner
In-Reply-To: <20100722064157.GA25913@gondor.apana.org.au>

On Thu, 2010-07-22 at 14:41 +0800, Herbert Xu wrote:
>  {
>         struct macvtap_queue *q = macvtap_get_queue(dev, skb);
>         if (!q)
> -               return -ENOLINK;
> +               goto drop;
> +
> +       if (skb_queue_len(&q->sk.sk_receive_queue) >=
> dev->tx_queue_len)
> +               goto drop;
> 

Do we need to orphan skb here, just like tun?

>         skb_queue_tail(&q->sk.sk_receive_queue, skb);
>         wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN |
> POLLRDNORM | POLLRDBAND);
> -       return 0;
> +       return NET_RX_SUCCESS;
> +
> +drop:

Do we need to increase dropped++ counter here to let user know there are
packets dropped?

> +       kfree_skb(skb);
> +       return NET_RX_DROP;
>  }
> 
> 

^ permalink raw reply

* Re: macvtap: Limit packet queue length
From: Shirley Ma @ 2010-07-22 16:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Herbert Xu, David S. Miller, netdev, Mark Wagner, Chris Wright
In-Reply-To: <201007221130.53612.arnd@arndb.de>

On Thu, 2010-07-22 at 11:30 +0200, Arnd Bergmann wrote:
> In the TX direction, we really don't queue, since we simply forward
> to the lowerdev tx queue, so exposing the tunable to user space
> as the tx queue length is a little bit awkward, as well as
> inconsistent
> between macvtap and macvlan.

Maybe we can use lowerdev backlog queue size here from receiving path
here?

thanks
Shirley

^ permalink raw reply

* Re: macvtap: Limit packet queue length
From: Herbert Xu @ 2010-07-22 16:07 UTC (permalink / raw)
  To: Shirley Ma; +Cc: David S. Miller, netdev, Arnd Bergmann, Mark Wagner
In-Reply-To: <1279814398.3211.1.camel@localhost.localdomain>

On Thu, Jul 22, 2010 at 08:59:58AM -0700, Shirley Ma wrote:
> On Thu, 2010-07-22 at 14:41 +0800, Herbert Xu wrote:
> >  {
> >         struct macvtap_queue *q = macvtap_get_queue(dev, skb);
> >         if (!q)
> > -               return -ENOLINK;
> > +               goto drop;
> > +
> > +       if (skb_queue_len(&q->sk.sk_receive_queue) >=
> > dev->tx_queue_len)
> > +               goto drop;
> > 
> 
> Do we need to orphan skb here, just like tun?

We could, but that is orthogonal to the problem at hand so feel
free to do that in another patch.

> >         skb_queue_tail(&q->sk.sk_receive_queue, skb);
> >         wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN |
> > POLLRDNORM | POLLRDBAND);
> > -       return 0;
> > +       return NET_RX_SUCCESS;
> > +
> > +drop:
> 
> Do we need to increase dropped++ counter here to let user know there are
> packets dropped?

The caller is supposed to handle this.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: macvtap: Limit packet queue length
From: Herbert Xu @ 2010-07-22 16:08 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Arnd Bergmann, David S. Miller, netdev, Mark Wagner, Chris Wright
In-Reply-To: <1279814726.3211.5.camel@localhost.localdomain>

On Thu, Jul 22, 2010 at 09:05:26AM -0700, Shirley Ma wrote:
> On Thu, 2010-07-22 at 11:30 +0200, Arnd Bergmann wrote:
> > In the TX direction, we really don't queue, since we simply forward
> > to the lowerdev tx queue, so exposing the tunable to user space
> > as the tx queue length is a little bit awkward, as well as
> > inconsistent
> > between macvtap and macvlan.
> 
> Maybe we can use lowerdev backlog queue size here from receiving path
> here?

No, you may wish to set different queue lengths for different
macvtap devices over the same lowerdev so you definitely don't
want to use any lowerdev parameter for this.

I honestly don't see any problems with using tx_queue_len since
it isn't used by anything else for macvtap.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH nf-next-2.6] netfilter: add xt_cpu match
From: Eric Dumazet @ 2010-07-22 16:24 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Patrick McHardy, Netfilter Development Mailinglist, netdev
In-Reply-To: <alpine.LSU.2.01.1007221736150.12308@obet.zrqbmnf.qr>

Le jeudi 22 juillet 2010 à 17:39 +0200, Jan Engelhardt a écrit :

> Never say never. "long" already bit people in the past, and now we
> have that CONFIG_COMPAT stuff.
> 

I know pretty well the "long" problem, I received one of the first alpha
machine ever built in the world (DEC 3000 AXP, with a fast 133 MHz
cpu ;) ), before I began to use Linux :)


> If invert is the only flag, perhaps it makes sense to use __u8 
> for it. 
> 

Quite frankly it brings more problems than plain u32

- Possible security problems (padding bytes). Not applicable to
iptables.

- Some arches have slow byte/short accesses (21064 for example :) )

"int" is the natural type, fast on all arches.

- Given alignment requirements of iptables rules, using less than 32bits
here saves no ram.

But I dont care that much.

I even see compiler doesnt want to use a XOR instruction :

00000018 <cpu_mt>:
  18:	55                   	push   %ebp
  19:	8b 42 04             	mov    0x4(%edx),%eax
  1c:	64 8b 15 00 00 00 00 	mov    %fs:0x0,%edx
  23:	89 e5                	mov    %esp,%ebp
  25:	5d                   	pop    %ebp
  26:	39 10                	cmp    %edx,(%eax)
  28:	0f 94 c2             	sete   %dl
  2b:	0f b6 d2             	movzbl %dl,%edx
  2e:	3b 50 04             	cmp    0x4(%eax),%edx
  31:	0f 95 c0             	setne  %al
  34:	c3                   	ret    




--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] CAN: Add Flexcan CAN controller driver
From: Marc Kleine-Budde @ 2010-07-22 16:40 UTC (permalink / raw)
  To: wg-5Yr1BZd7O62+XT7JhA+gdA
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Hello,

sorry for the confusion about the last minute problem with the driver,
it was false alarm. Someone had troubles a driver derived from fsl's
original one.

David, please feel free to merge the driver now. :)

Cheers, Marc


P.S.: You can pull this from:

The following changes since commit 4cfa580e7eebb8694b875d2caff3b989ada2efac:

  r6040: Fix args to phy_mii_ioctl(). (2010-07-21 21:10:49 -0700)

are available in the git repository at:
  git://git.pengutronix.de/git/mkl/linux-2.6.git for-net-next-2.6

Marc Kleine-Budde (1):
      CAN: Add Flexcan CAN controller driver

 drivers/net/can/Kconfig              |    9 +
 drivers/net/can/Makefile             |    1 +
 drivers/net/can/flexcan.c            | 1030 ++++++++++++++++++++++++++++++++++
 include/linux/can/platform/flexcan.h |   20 +
 4 files changed, 1060 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/can/flexcan.c
 create mode 100644 include/linux/can/platform/flexcan.h

---------8<--------8<--------8<--------8<--------8<--------8<--------8<-------

This core is found on some Freescale SoCs and also some Coldfire
SoCs. Support for Coldfire is missing though at the moment as
they have an older revision of the core which does not have RX FIFO
support.

Signed-off-by: Sascha Hauer <s.hauer-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Acked-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
---
Changes to prev version:

* improved activation of the driver in Kconfig

 drivers/net/can/Kconfig              |    9 +
 drivers/net/can/Makefile             |    1 +
 drivers/net/can/flexcan.c            | 1030 ++++++++++++++++++++++++++++++++++
 include/linux/can/platform/flexcan.h |   20 +
 4 files changed, 1060 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/can/flexcan.c
 create mode 100644 include/linux/can/platform/flexcan.h

diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
index 2c5227c..9d9e453 100644
--- a/drivers/net/can/Kconfig
+++ b/drivers/net/can/Kconfig
@@ -73,6 +73,15 @@ config CAN_JANZ_ICAN3
 	  This driver can also be built as a module. If so, the module will be
 	  called janz-ican3.ko.
 
+config HAVE_CAN_FLEXCAN
+	bool
+
+config CAN_FLEXCAN
+	tristate "Support for Freescale FLEXCAN based chips"
+	depends on CAN_DEV && HAVE_CAN_FLEXCAN
+	---help---
+	  Say Y here if you want to support for Freescale FlexCAN.
+
 source "drivers/net/can/mscan/Kconfig"
 
 source "drivers/net/can/sja1000/Kconfig"
diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile
index 9047cd0..0057537 100644
--- a/drivers/net/can/Makefile
+++ b/drivers/net/can/Makefile
@@ -16,5 +16,6 @@ obj-$(CONFIG_CAN_TI_HECC)	+= ti_hecc.o
 obj-$(CONFIG_CAN_MCP251X)	+= mcp251x.o
 obj-$(CONFIG_CAN_BFIN)		+= bfin_can.o
 obj-$(CONFIG_CAN_JANZ_ICAN3)	+= janz-ican3.o
+obj-$(CONFIG_CAN_FLEXCAN)	+= flexcan.o
 
 ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
new file mode 100644
index 0000000..ef443a0
--- /dev/null
+++ b/drivers/net/can/flexcan.c
@@ -0,0 +1,1030 @@
+/*
+ * flexcan.c - FLEXCAN CAN controller driver
+ *
+ * Copyright (c) 2005-2006 Varma Electronics Oy
+ * Copyright (c) 2009 Sascha Hauer, Pengutronix
+ * Copyright (c) 2010 Marc Kleine-Budde, Pengutronix
+ *
+ * Based on code originally by Andrey Volkov <avolkov-ppI4tVfbJvJWk0Htik3J/w@public.gmane.org>
+ *
+ * LICENCE:
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/netdevice.h>
+#include <linux/can.h>
+#include <linux/can/dev.h>
+#include <linux/can/error.h>
+#include <linux/can/platform/flexcan.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/if_arp.h>
+#include <linux/if_ether.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+
+#include <mach/clock.h>
+
+#define DRV_NAME			"flexcan"
+
+/* 8 for RX fifo and 2 error handling */
+#define FLEXCAN_NAPI_WEIGHT		(8 + 2)
+
+/* FLEXCAN module configuration register (CANMCR) bits */
+#define FLEXCAN_MCR_MDIS		BIT(31)
+#define FLEXCAN_MCR_FRZ			BIT(30)
+#define FLEXCAN_MCR_FEN			BIT(29)
+#define FLEXCAN_MCR_HALT		BIT(28)
+#define FLEXCAN_MCR_NOT_RDY		BIT(27)
+#define FLEXCAN_MCR_WAK_MSK		BIT(26)
+#define FLEXCAN_MCR_SOFTRST		BIT(25)
+#define FLEXCAN_MCR_FRZ_ACK		BIT(24)
+#define FLEXCAN_MCR_SUPV		BIT(23)
+#define FLEXCAN_MCR_SLF_WAK		BIT(22)
+#define FLEXCAN_MCR_WRN_EN		BIT(21)
+#define FLEXCAN_MCR_LPM_ACK		BIT(20)
+#define FLEXCAN_MCR_WAK_SRC		BIT(19)
+#define FLEXCAN_MCR_DOZE		BIT(18)
+#define FLEXCAN_MCR_SRX_DIS		BIT(17)
+#define FLEXCAN_MCR_BCC			BIT(16)
+#define FLEXCAN_MCR_LPRIO_EN		BIT(13)
+#define FLEXCAN_MCR_AEN			BIT(12)
+#define FLEXCAN_MCR_MAXMB(x)		((x) & 0xf)
+#define FLEXCAN_MCR_IDAM_A		(0 << 8)
+#define FLEXCAN_MCR_IDAM_B		(1 << 8)
+#define FLEXCAN_MCR_IDAM_C		(2 << 8)
+#define FLEXCAN_MCR_IDAM_D		(3 << 8)
+
+/* FLEXCAN control register (CANCTRL) bits */
+#define FLEXCAN_CTRL_PRESDIV(x)		(((x) & 0xff) << 24)
+#define FLEXCAN_CTRL_RJW(x)		(((x) & 0x03) << 22)
+#define FLEXCAN_CTRL_PSEG1(x)		(((x) & 0x07) << 19)
+#define FLEXCAN_CTRL_PSEG2(x)		(((x) & 0x07) << 16)
+#define FLEXCAN_CTRL_BOFF_MSK		BIT(15)
+#define FLEXCAN_CTRL_ERR_MSK		BIT(14)
+#define FLEXCAN_CTRL_CLK_SRC		BIT(13)
+#define FLEXCAN_CTRL_LPB		BIT(12)
+#define FLEXCAN_CTRL_TWRN_MSK		BIT(11)
+#define FLEXCAN_CTRL_RWRN_MSK		BIT(10)
+#define FLEXCAN_CTRL_SMP		BIT(7)
+#define FLEXCAN_CTRL_BOFF_REC		BIT(6)
+#define FLEXCAN_CTRL_TSYN		BIT(5)
+#define FLEXCAN_CTRL_LBUF		BIT(4)
+#define FLEXCAN_CTRL_LOM		BIT(3)
+#define FLEXCAN_CTRL_PROPSEG(x)		((x) & 0x07)
+#define FLEXCAN_CTRL_ERR_BUS		(FLEXCAN_CTRL_ERR_MSK)
+#define FLEXCAN_CTRL_ERR_STATE \
+	(FLEXCAN_CTRL_TWRN_MSK | FLEXCAN_CTRL_RWRN_MSK | \
+	 FLEXCAN_CTRL_BOFF_MSK)
+#define FLEXCAN_CTRL_ERR_ALL \
+	(FLEXCAN_CTRL_ERR_BUS | FLEXCAN_CTRL_ERR_STATE)
+
+/* FLEXCAN error and status register (ESR) bits */
+#define FLEXCAN_ESR_TWRN_INT		BIT(17)
+#define FLEXCAN_ESR_RWRN_INT		BIT(16)
+#define FLEXCAN_ESR_BIT1_ERR		BIT(15)
+#define FLEXCAN_ESR_BIT0_ERR		BIT(14)
+#define FLEXCAN_ESR_ACK_ERR		BIT(13)
+#define FLEXCAN_ESR_CRC_ERR		BIT(12)
+#define FLEXCAN_ESR_FRM_ERR		BIT(11)
+#define FLEXCAN_ESR_STF_ERR		BIT(10)
+#define FLEXCAN_ESR_TX_WRN		BIT(9)
+#define FLEXCAN_ESR_RX_WRN		BIT(8)
+#define FLEXCAN_ESR_IDLE		BIT(7)
+#define FLEXCAN_ESR_TXRX		BIT(6)
+#define FLEXCAN_EST_FLT_CONF_SHIFT	(4)
+#define FLEXCAN_ESR_FLT_CONF_MASK	(0x3 << FLEXCAN_EST_FLT_CONF_SHIFT)
+#define FLEXCAN_ESR_FLT_CONF_ACTIVE	(0x0 << FLEXCAN_EST_FLT_CONF_SHIFT)
+#define FLEXCAN_ESR_FLT_CONF_PASSIVE	(0x1 << FLEXCAN_EST_FLT_CONF_SHIFT)
+#define FLEXCAN_ESR_BOFF_INT		BIT(2)
+#define FLEXCAN_ESR_ERR_INT		BIT(1)
+#define FLEXCAN_ESR_WAK_INT		BIT(0)
+#define FLEXCAN_ESR_ERR_BUS \
+	(FLEXCAN_ESR_BIT1_ERR | FLEXCAN_ESR_BIT0_ERR | \
+	 FLEXCAN_ESR_ACK_ERR | FLEXCAN_ESR_CRC_ERR | \
+	 FLEXCAN_ESR_FRM_ERR | FLEXCAN_ESR_STF_ERR)
+#define FLEXCAN_ESR_ERR_STATE \
+	(FLEXCAN_ESR_TWRN_INT | FLEXCAN_ESR_RWRN_INT | FLEXCAN_ESR_BOFF_INT)
+#define FLEXCAN_ESR_ERR_ALL \
+	(FLEXCAN_ESR_ERR_BUS | FLEXCAN_ESR_ERR_STATE)
+
+/* FLEXCAN interrupt flag register (IFLAG) bits */
+#define FLEXCAN_TX_BUF_ID		8
+#define FLEXCAN_IFLAG_BUF(x)		BIT(x)
+#define FLEXCAN_IFLAG_RX_FIFO_OVERFLOW	BIT(7)
+#define FLEXCAN_IFLAG_RX_FIFO_WARN	BIT(6)
+#define FLEXCAN_IFLAG_RX_FIFO_AVAILABLE	BIT(5)
+#define FLEXCAN_IFLAG_DEFAULT \
+	(FLEXCAN_IFLAG_RX_FIFO_OVERFLOW | FLEXCAN_IFLAG_RX_FIFO_AVAILABLE | \
+	 FLEXCAN_IFLAG_BUF(FLEXCAN_TX_BUF_ID))
+
+/* FLEXCAN message buffers */
+#define FLEXCAN_MB_CNT_CODE(x)		(((x) & 0xf) << 24)
+#define FLEXCAN_MB_CNT_SRR		BIT(22)
+#define FLEXCAN_MB_CNT_IDE		BIT(21)
+#define FLEXCAN_MB_CNT_RTR		BIT(20)
+#define FLEXCAN_MB_CNT_LENGTH(x)	(((x) & 0xf) << 16)
+#define FLEXCAN_MB_CNT_TIMESTAMP(x)	((x) & 0xffff)
+
+#define FLEXCAN_MB_CODE_MASK		(0xf0ffffff)
+
+/* Structure of the message buffer */
+struct flexcan_mb {
+	u32 can_ctrl;
+	u32 can_id;
+	u32 data[2];
+};
+
+/* Structure of the hardware registers */
+struct flexcan_regs {
+	u32 mcr;		/* 0x00 */
+	u32 ctrl;		/* 0x04 */
+	u32 timer;		/* 0x08 */
+	u32 _reserved1;		/* 0x0c */
+	u32 rxgmask;		/* 0x10 */
+	u32 rx14mask;		/* 0x14 */
+	u32 rx15mask;		/* 0x18 */
+	u32 ecr;		/* 0x1c */
+	u32 esr;		/* 0x20 */
+	u32 imask2;		/* 0x24 */
+	u32 imask1;		/* 0x28 */
+	u32 iflag2;		/* 0x2c */
+	u32 iflag1;		/* 0x30 */
+	u32 _reserved2[19];
+	struct flexcan_mb cantxfg[64];
+};
+
+struct flexcan_priv {
+	struct can_priv can;
+	struct net_device *dev;
+	struct napi_struct napi;
+
+	void __iomem *base;
+	u32 reg_esr;
+	u32 reg_ctrl_default;
+
+	struct clk *clk;
+	struct flexcan_platform_data *pdata;
+};
+
+static struct can_bittiming_const flexcan_bittiming_const = {
+	.name = DRV_NAME,
+	.tseg1_min = 4,
+	.tseg1_max = 16,
+	.tseg2_min = 2,
+	.tseg2_max = 8,
+	.sjw_max = 4,
+	.brp_min = 1,
+	.brp_max = 256,
+	.brp_inc = 1,
+};
+
+/*
+ * Swtich transceiver on or off
+ */
+static void flexcan_transceiver_switch(const struct flexcan_priv *priv, int on)
+{
+	if (priv->pdata && priv->pdata->transceiver_switch)
+		priv->pdata->transceiver_switch(on);
+}
+
+static inline int flexcan_has_and_handle_berr(const struct flexcan_priv *priv,
+					      u32 reg_esr)
+{
+	return (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING) &&
+		(reg_esr & FLEXCAN_ESR_ERR_BUS);
+}
+
+static inline void flexcan_chip_enable(struct flexcan_priv *priv)
+{
+	struct flexcan_regs __iomem *regs = priv->base;
+	u32 reg;
+
+	reg = readl(&regs->mcr);
+	reg &= ~FLEXCAN_MCR_MDIS;
+	writel(reg, &regs->mcr);
+
+	udelay(10);
+}
+
+static inline void flexcan_chip_disable(struct flexcan_priv *priv)
+{
+	struct flexcan_regs __iomem *regs = priv->base;
+	u32 reg;
+
+	reg = readl(&regs->mcr);
+	reg |= FLEXCAN_MCR_MDIS;
+	writel(reg, &regs->mcr);
+}
+
+static int flexcan_get_berr_counter(const struct net_device *dev,
+				    struct can_berr_counter *bec)
+{
+	const struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->base;
+	u32 reg = readl(&regs->ecr);
+
+	bec->txerr = (reg >> 0) & 0xff;
+	bec->rxerr = (reg >> 8) & 0xff;
+
+	return 0;
+}
+
+static int flexcan_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	const struct flexcan_priv *priv = netdev_priv(dev);
+	struct net_device_stats *stats = &dev->stats;
+	struct flexcan_regs __iomem *regs = priv->base;
+	struct can_frame *cf = (struct can_frame *)skb->data;
+	u32 can_id;
+	u32 ctrl = FLEXCAN_MB_CNT_CODE(0xc) | (cf->can_dlc << 16);
+
+	if (can_dropped_invalid_skb(dev, skb))
+		return NETDEV_TX_OK;
+
+	netif_stop_queue(dev);
+
+	if (cf->can_id & CAN_EFF_FLAG) {
+		can_id = cf->can_id & CAN_EFF_MASK;
+		ctrl |= FLEXCAN_MB_CNT_IDE | FLEXCAN_MB_CNT_SRR;
+	} else {
+		can_id = (cf->can_id & CAN_SFF_MASK) << 18;
+	}
+
+	if (cf->can_id & CAN_RTR_FLAG)
+		ctrl |= FLEXCAN_MB_CNT_RTR;
+
+	if (cf->can_dlc > 0) {
+		u32 data = be32_to_cpup((__be32 *)&cf->data[0]);
+		writel(data, &regs->cantxfg[FLEXCAN_TX_BUF_ID].data[0]);
+	}
+	if (cf->can_dlc > 3) {
+		u32 data = be32_to_cpup((__be32 *)&cf->data[4]);
+		writel(data, &regs->cantxfg[FLEXCAN_TX_BUF_ID].data[1]);
+	}
+
+	writel(can_id, &regs->cantxfg[FLEXCAN_TX_BUF_ID].can_id);
+	writel(ctrl, &regs->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);
+
+	kfree_skb(skb);
+
+	/* tx_packets is incremented in flexcan_irq */
+	stats->tx_bytes += cf->can_dlc;
+
+	return NETDEV_TX_OK;
+}
+
+static void do_bus_err(struct net_device *dev,
+		       struct can_frame *cf, u32 reg_esr)
+{
+	struct flexcan_priv *priv = netdev_priv(dev);
+	int rx_errors = 0, tx_errors = 0;
+
+	cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
+
+	if (reg_esr & FLEXCAN_ESR_BIT1_ERR) {
+		dev_dbg(dev->dev.parent, "BIT1_ERR irq\n");
+		cf->data[2] |= CAN_ERR_PROT_BIT1;
+		tx_errors = 1;
+	}
+	if (reg_esr & FLEXCAN_ESR_BIT0_ERR) {
+		dev_dbg(dev->dev.parent, "BIT0_ERR irq\n");
+		cf->data[2] |= CAN_ERR_PROT_BIT0;
+		tx_errors = 1;
+	}
+	if (reg_esr & FLEXCAN_ESR_ACK_ERR) {
+		dev_dbg(dev->dev.parent, "ACK_ERR irq\n");
+		cf->can_id |= CAN_ERR_ACK;
+		cf->data[3] |= CAN_ERR_PROT_LOC_ACK;
+		tx_errors = 1;
+	}
+	if (reg_esr & FLEXCAN_ESR_CRC_ERR) {
+		dev_dbg(dev->dev.parent, "CRC_ERR irq\n");
+		cf->data[2] |= CAN_ERR_PROT_BIT;
+		cf->data[3] |= CAN_ERR_PROT_LOC_CRC_SEQ;
+		rx_errors = 1;
+	}
+	if (reg_esr & FLEXCAN_ESR_FRM_ERR) {
+		dev_dbg(dev->dev.parent, "FRM_ERR irq\n");
+		cf->data[2] |= CAN_ERR_PROT_FORM;
+		rx_errors = 1;
+	}
+	if (reg_esr & FLEXCAN_ESR_STF_ERR) {
+		dev_dbg(dev->dev.parent, "STF_ERR irq\n");
+		cf->data[2] |= CAN_ERR_PROT_STUFF;
+		rx_errors = 1;
+	}
+
+	priv->can.can_stats.bus_error++;
+	if (rx_errors)
+		dev->stats.rx_errors++;
+	if (tx_errors)
+		dev->stats.tx_errors++;
+}
+
+static int flexcan_poll_bus_err(struct net_device *dev, u32 reg_esr)
+{
+	struct sk_buff *skb;
+	struct can_frame *cf;
+
+	skb = alloc_can_err_skb(dev, &cf);
+	if (unlikely(!skb))
+		return 0;
+
+	do_bus_err(dev, cf, reg_esr);
+	netif_receive_skb(skb);
+
+	dev->stats.rx_packets++;
+	dev->stats.rx_bytes += cf->can_dlc;
+
+	return 1;
+}
+
+static void do_state(struct net_device *dev,
+		     struct can_frame *cf, enum can_state new_state)
+{
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct can_berr_counter bec;
+
+	flexcan_get_berr_counter(dev, &bec);
+
+	switch (priv->can.state) {
+	case CAN_STATE_ERROR_ACTIVE:
+		/*
+		 * from: ERROR_ACTIVE
+		 * to  : ERROR_WARNING, ERROR_PASSIVE, BUS_OFF
+		 * =>  : there was a warning int
+		 */
+		if (new_state >= CAN_STATE_ERROR_WARNING &&
+		    new_state <= CAN_STATE_BUS_OFF) {
+			dev_dbg(dev->dev.parent, "Error Warning IRQ\n");
+			priv->can.can_stats.error_warning++;
+
+			cf->can_id |= CAN_ERR_CRTL;
+			cf->data[1] = (bec.txerr > bec.rxerr) ?
+				CAN_ERR_CRTL_TX_WARNING :
+				CAN_ERR_CRTL_RX_WARNING;
+		}
+	case CAN_STATE_ERROR_WARNING:	/* fallthrough */
+		/*
+		 * from: ERROR_ACTIVE, ERROR_WARNING
+		 * to  : ERROR_PASSIVE, BUS_OFF
+		 * =>  : error passive int
+		 */
+		if (new_state >= CAN_STATE_ERROR_PASSIVE &&
+		    new_state <= CAN_STATE_BUS_OFF) {
+			dev_dbg(dev->dev.parent, "Error Passive IRQ\n");
+			priv->can.can_stats.error_passive++;
+
+			cf->can_id |= CAN_ERR_CRTL;
+			cf->data[1] = (bec.txerr > bec.rxerr) ?
+				CAN_ERR_CRTL_TX_PASSIVE :
+				CAN_ERR_CRTL_RX_PASSIVE;
+		}
+		break;
+	case CAN_STATE_BUS_OFF:
+		dev_err(dev->dev.parent,
+			"BUG! hardware recovered automatically from BUS_OFF\n");
+		break;
+	default:
+		break;
+	}
+
+	/* process state changes depending on the new state */
+	switch (new_state) {
+	case CAN_STATE_ERROR_ACTIVE:
+		dev_dbg(dev->dev.parent, "Error Active\n");
+		cf->can_id |= CAN_ERR_PROT;
+		cf->data[2] = CAN_ERR_PROT_ACTIVE;
+		break;
+	case CAN_STATE_BUS_OFF:
+		cf->can_id |= CAN_ERR_BUSOFF;
+		can_bus_off(dev);
+		break;
+	default:
+		break;
+	}
+}
+
+static int flexcan_poll_state(struct net_device *dev, u32 reg_esr)
+{
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct sk_buff *skb;
+	struct can_frame *cf;
+	enum can_state new_state;
+	int flt;
+
+	flt = reg_esr & FLEXCAN_ESR_FLT_CONF_MASK;
+	if (likely(flt == FLEXCAN_ESR_FLT_CONF_ACTIVE)) {
+		if (likely(!(reg_esr & (FLEXCAN_ESR_TX_WRN |
+					FLEXCAN_ESR_RX_WRN))))
+			new_state = CAN_STATE_ERROR_ACTIVE;
+		else
+			new_state = CAN_STATE_ERROR_WARNING;
+	} else if (unlikely(flt == FLEXCAN_ESR_FLT_CONF_PASSIVE))
+		new_state = CAN_STATE_ERROR_PASSIVE;
+	else
+		new_state = CAN_STATE_BUS_OFF;
+
+	/* state hasn't changed */
+	if (likely(new_state == priv->can.state))
+		return 0;
+
+	skb = alloc_can_err_skb(dev, &cf);
+	if (unlikely(!skb))
+		return 0;
+
+	do_state(dev, cf, new_state);
+	priv->can.state = new_state;
+	netif_receive_skb(skb);
+
+	dev->stats.rx_packets++;
+	dev->stats.rx_bytes += cf->can_dlc;
+
+	return 1;
+}
+
+static void flexcan_read_fifo(const struct net_device *dev,
+			      struct can_frame *cf)
+{
+	const struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->base;
+	struct flexcan_mb __iomem *mb = &regs->cantxfg[0];
+	u32 reg_ctrl, reg_id;
+
+	reg_ctrl = readl(&mb->can_ctrl);
+	reg_id = readl(&mb->can_id);
+	if (reg_ctrl & FLEXCAN_MB_CNT_IDE)
+		cf->can_id = ((reg_id >> 0) & CAN_EFF_MASK) | CAN_EFF_FLAG;
+	else
+		cf->can_id = (reg_id >> 18) & CAN_SFF_MASK;
+
+	if (reg_ctrl & FLEXCAN_MB_CNT_RTR)
+		cf->can_id |= CAN_RTR_FLAG;
+	cf->can_dlc = get_can_dlc((reg_ctrl >> 16) & 0xf);
+
+	*(__be32 *)(cf->data + 0) = cpu_to_be32(readl(&mb->data[0]));
+	*(__be32 *)(cf->data + 4) = cpu_to_be32(readl(&mb->data[1]));
+
+	/* mark as read */
+	writel(FLEXCAN_IFLAG_RX_FIFO_AVAILABLE, &regs->iflag1);
+	readl(&regs->timer);
+}
+
+static int flexcan_read_frame(struct net_device *dev)
+{
+	struct net_device_stats *stats = &dev->stats;
+	struct can_frame *cf;
+	struct sk_buff *skb;
+
+	skb = alloc_can_skb(dev, &cf);
+	if (unlikely(!skb)) {
+		stats->rx_dropped++;
+		return 0;
+	}
+
+	flexcan_read_fifo(dev, cf);
+	netif_receive_skb(skb);
+
+	stats->rx_packets++;
+	stats->rx_bytes += cf->can_dlc;
+
+	return 1;
+}
+
+static int flexcan_poll(struct napi_struct *napi, int quota)
+{
+	struct net_device *dev = napi->dev;
+	const struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->base;
+	u32 reg_iflag1, reg_esr;
+	int work_done = 0;
+
+	/*
+	 * The error bits are cleared on read,
+	 * use saved value from irq handler.
+	 */
+	reg_esr = readl(&regs->esr) | priv->reg_esr;
+
+	/* handle state changes */
+	work_done += flexcan_poll_state(dev, reg_esr);
+
+	/* handle RX-FIFO */
+	reg_iflag1 = readl(&regs->iflag1);
+	while (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE &&
+	       work_done < quota) {
+		work_done += flexcan_read_frame(dev);
+		reg_iflag1 = readl(&regs->iflag1);
+	}
+
+	/* report bus errors */
+	if (flexcan_has_and_handle_berr(priv, reg_esr) && work_done < quota)
+		work_done += flexcan_poll_bus_err(dev, reg_esr);
+
+	if (work_done < quota) {
+		napi_complete(napi);
+		/* enable IRQs */
+		writel(FLEXCAN_IFLAG_DEFAULT, &regs->imask1);
+		writel(priv->reg_ctrl_default, &regs->ctrl);
+	}
+
+	return work_done;
+}
+
+static irqreturn_t flexcan_irq(int irq, void *dev_id)
+{
+	struct net_device *dev = dev_id;
+	struct net_device_stats *stats = &dev->stats;
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->base;
+	u32 reg_iflag1, reg_esr;
+
+	reg_iflag1 = readl(&regs->iflag1);
+	reg_esr = readl(&regs->esr);
+	writel(FLEXCAN_ESR_ERR_INT, &regs->esr);	/* ACK err IRQ */
+
+	/*
+	 * schedule NAPI in case of:
+	 * - rx IRQ
+	 * - state change IRQ
+	 * - bus error IRQ and bus error reporting is activated
+	 */
+	if ((reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE) ||
+	    (reg_esr & FLEXCAN_ESR_ERR_STATE) ||
+	    flexcan_has_and_handle_berr(priv, reg_esr)) {
+		/*
+		 * The error bits are cleared on read,
+		 * save them for later use.
+		 */
+		priv->reg_esr = reg_esr & FLEXCAN_ESR_ERR_BUS;
+		writel(FLEXCAN_IFLAG_DEFAULT & ~FLEXCAN_IFLAG_RX_FIFO_AVAILABLE,
+		       &regs->imask1);
+		writel(priv->reg_ctrl_default & ~FLEXCAN_CTRL_ERR_ALL,
+		       &regs->ctrl);
+		napi_schedule(&priv->napi);
+	}
+
+	/* FIFO overflow */
+	if (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_OVERFLOW) {
+		writel(FLEXCAN_IFLAG_RX_FIFO_OVERFLOW, &regs->iflag1);
+		dev->stats.rx_over_errors++;
+		dev->stats.rx_errors++;
+	}
+
+	/* transmission complete interrupt */
+	if (reg_iflag1 & (1 << FLEXCAN_TX_BUF_ID)) {
+		/* tx_bytes is incremented in flexcan_start_xmit */
+		stats->tx_packets++;
+		writel((1 << FLEXCAN_TX_BUF_ID), &regs->iflag1);
+		netif_wake_queue(dev);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void flexcan_set_bittiming(struct net_device *dev)
+{
+	const struct flexcan_priv *priv = netdev_priv(dev);
+	const struct can_bittiming *bt = &priv->can.bittiming;
+	struct flexcan_regs __iomem *regs = priv->base;
+	u32 reg;
+
+	reg = readl(&regs->ctrl);
+	reg &= ~(FLEXCAN_CTRL_PRESDIV(0xff) |
+		 FLEXCAN_CTRL_RJW(0x3) |
+		 FLEXCAN_CTRL_PSEG1(0x7) |
+		 FLEXCAN_CTRL_PSEG2(0x7) |
+		 FLEXCAN_CTRL_PROPSEG(0x7) |
+		 FLEXCAN_CTRL_LPB |
+		 FLEXCAN_CTRL_SMP |
+		 FLEXCAN_CTRL_LOM);
+
+	reg |= FLEXCAN_CTRL_PRESDIV(bt->brp - 1) |
+		FLEXCAN_CTRL_PSEG1(bt->phase_seg1 - 1) |
+		FLEXCAN_CTRL_PSEG2(bt->phase_seg2 - 1) |
+		FLEXCAN_CTRL_RJW(bt->sjw - 1) |
+		FLEXCAN_CTRL_PROPSEG(bt->prop_seg - 1);
+
+	if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK)
+		reg |= FLEXCAN_CTRL_LPB;
+	if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
+		reg |= FLEXCAN_CTRL_LOM;
+	if (priv->can.ctrlmode & CAN_CTRLMODE_3_SAMPLES)
+		reg |= FLEXCAN_CTRL_SMP;
+
+	dev_info(dev->dev.parent, "writing ctrl=0x%08x\n", reg);
+	writel(reg, &regs->ctrl);
+
+	/* print chip status */
+	dev_dbg(dev->dev.parent, "%s: mcr=0x%08x ctrl=0x%08x\n", __func__,
+		readl(&regs->mcr), readl(&regs->ctrl));
+}
+
+/*
+ * flexcan_chip_start
+ *
+ * this functions is entered with clocks enabled
+ *
+ */
+static int flexcan_chip_start(struct net_device *dev)
+{
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->base;
+	unsigned int i;
+	int err;
+	u32 reg_mcr, reg_ctrl;
+
+	/* enable module */
+	flexcan_chip_enable(priv);
+
+	/* soft reset */
+	writel(FLEXCAN_MCR_SOFTRST, &regs->mcr);
+	udelay(10);
+
+	reg_mcr = readl(&regs->mcr);
+	if (reg_mcr & FLEXCAN_MCR_SOFTRST) {
+		dev_err(dev->dev.parent,
+			"Failed to softreset can module (mcr=0x%08x)\n",
+			reg_mcr);
+		err = -ENODEV;
+		goto out;
+	}
+
+	flexcan_set_bittiming(dev);
+
+	/*
+	 * MCR
+	 *
+	 * enable freeze
+	 * enable fifo
+	 * halt now
+	 * only supervisor access
+	 * enable warning int
+	 * choose format C
+	 *
+	 */
+	reg_mcr = readl(&regs->mcr);
+	reg_mcr |= FLEXCAN_MCR_FRZ | FLEXCAN_MCR_FEN | FLEXCAN_MCR_HALT |
+		FLEXCAN_MCR_SUPV | FLEXCAN_MCR_WRN_EN |
+		FLEXCAN_MCR_IDAM_C;
+	dev_dbg(dev->dev.parent, "%s: writing mcr=0x%08x", __func__, reg_mcr);
+	writel(reg_mcr, &regs->mcr);
+
+	/*
+	 * CTRL
+	 *
+	 * disable timer sync feature
+	 *
+	 * disable auto busoff recovery
+	 * transmit lowest buffer first
+	 *
+	 * enable tx and rx warning interrupt
+	 * enable bus off interrupt
+	 * (== FLEXCAN_CTRL_ERR_STATE)
+	 *
+	 * _note_: we enable the "error interrupt"
+	 * (FLEXCAN_CTRL_ERR_MSK), too. Otherwise we don't get any
+	 * warning or bus passive interrupts.
+	 */
+	reg_ctrl = readl(&regs->ctrl);
+	reg_ctrl &= ~FLEXCAN_CTRL_TSYN;
+	reg_ctrl |= FLEXCAN_CTRL_BOFF_REC | FLEXCAN_CTRL_LBUF |
+		FLEXCAN_CTRL_ERR_STATE | FLEXCAN_CTRL_ERR_MSK;
+
+	/* save for later use */
+	priv->reg_ctrl_default = reg_ctrl;
+	dev_dbg(dev->dev.parent, "%s: writing ctrl=0x%08x", __func__, reg_ctrl);
+	writel(reg_ctrl, &regs->ctrl);
+
+	for (i = 0; i < ARRAY_SIZE(regs->cantxfg); i++) {
+		writel(0, &regs->cantxfg[i].can_ctrl);
+		writel(0, &regs->cantxfg[i].can_id);
+		writel(0, &regs->cantxfg[i].data[0]);
+		writel(0, &regs->cantxfg[i].data[1]);
+
+		/* put MB into rx queue */
+		writel(FLEXCAN_MB_CNT_CODE(0x4), &regs->cantxfg[i].can_ctrl);
+	}
+
+	/* acceptance mask/acceptance code (accept everything) */
+	writel(0x0, &regs->rxgmask);
+	writel(0x0, &regs->rx14mask);
+	writel(0x0, &regs->rx15mask);
+
+	flexcan_transceiver_switch(priv, 1);
+
+	/* synchronize with the can bus */
+	reg_mcr = readl(&regs->mcr);
+	reg_mcr &= ~FLEXCAN_MCR_HALT;
+	writel(reg_mcr, &regs->mcr);
+
+	priv->can.state = CAN_STATE_ERROR_ACTIVE;
+
+	/* enable FIFO interrupts */
+	writel(FLEXCAN_IFLAG_DEFAULT, &regs->imask1);
+
+	/* print chip status */
+	dev_dbg(dev->dev.parent, "%s: reading mcr=0x%08x ctrl=0x%08x\n",
+		__func__, readl(&regs->mcr), readl(&regs->ctrl));
+
+	return 0;
+
+ out:
+	flexcan_chip_disable(priv);
+	return err;
+}
+
+/*
+ * flexcan_chip_stop
+ *
+ * this functions is entered with clocks enabled
+ *
+ */
+static void flexcan_chip_stop(struct net_device *dev)
+{
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->base;
+	u32 reg;
+
+	/* Disable all interrupts */
+	writel(0, &regs->imask1);
+
+	/* Disable + halt module */
+	reg = readl(&regs->mcr);
+	reg |= FLEXCAN_MCR_MDIS | FLEXCAN_MCR_HALT;
+	writel(reg, &regs->mcr);
+
+	flexcan_transceiver_switch(priv, 0);
+	priv->can.state = CAN_STATE_STOPPED;
+
+	return;
+}
+
+static int flexcan_open(struct net_device *dev)
+{
+	struct flexcan_priv *priv = netdev_priv(dev);
+	int err;
+
+	clk_enable(priv->clk);
+
+	err = open_candev(dev);
+	if (err)
+		goto out;
+
+	err = request_irq(dev->irq, flexcan_irq, IRQF_SHARED, dev->name, dev);
+	if (err)
+		goto out_close;
+
+	/* start chip and queuing */
+	err = flexcan_chip_start(dev);
+	if (err)
+		goto out_close;
+	napi_enable(&priv->napi);
+	netif_start_queue(dev);
+
+	return 0;
+
+ out_close:
+	close_candev(dev);
+ out:
+	clk_disable(priv->clk);
+
+	return err;
+}
+
+static int flexcan_close(struct net_device *dev)
+{
+	struct flexcan_priv *priv = netdev_priv(dev);
+
+	netif_stop_queue(dev);
+	napi_disable(&priv->napi);
+	flexcan_chip_stop(dev);
+
+	free_irq(dev->irq, dev);
+	clk_disable(priv->clk);
+
+	close_candev(dev);
+
+	return 0;
+}
+
+static int flexcan_set_mode(struct net_device *dev, enum can_mode mode)
+{
+	int err;
+
+	switch (mode) {
+	case CAN_MODE_START:
+		err = flexcan_chip_start(dev);
+		if (err)
+			return err;
+
+		netif_wake_queue(dev);
+		break;
+
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static const struct net_device_ops flexcan_netdev_ops = {
+	.ndo_open	= flexcan_open,
+	.ndo_stop	= flexcan_close,
+	.ndo_start_xmit	= flexcan_start_xmit,
+};
+
+static int __devinit register_flexcandev(struct net_device *dev)
+{
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->base;
+	u32 reg, err;
+
+	clk_enable(priv->clk);
+
+	/* select "bus clock", chip must be disabled */
+	flexcan_chip_disable(priv);
+	reg = readl(&regs->ctrl);
+	reg |= FLEXCAN_CTRL_CLK_SRC;
+	writel(reg, &regs->ctrl);
+
+	flexcan_chip_enable(priv);
+
+	/* set freeze, halt and activate FIFO, restrict register access */
+	reg = readl(&regs->mcr);
+	reg |= FLEXCAN_MCR_FRZ | FLEXCAN_MCR_HALT |
+		FLEXCAN_MCR_FEN | FLEXCAN_MCR_SUPV;
+	writel(reg, &regs->mcr);
+
+	/*
+	 * Currently we only support newer versions of this core
+	 * featuring a RX FIFO. Older cores found on some Coldfire
+	 * derivates are not yet supported.
+	 */
+	reg = readl(&regs->mcr);
+	if (!(reg & FLEXCAN_MCR_FEN)) {
+		dev_err(dev->dev.parent,
+			"Could not enable RX FIFO, unsupported core\n");
+		err = -ENODEV;
+		goto out;
+	}
+
+	err = register_candev(dev);
+
+ out:
+	/* disable core and turn off clocks */
+	flexcan_chip_disable(priv);
+	clk_disable(priv->clk);
+
+	return err;
+}
+
+static void __devexit unregister_flexcandev(struct net_device *dev)
+{
+	unregister_candev(dev);
+}
+
+static int __devinit flexcan_probe(struct platform_device *pdev)
+{
+	struct net_device *dev;
+	struct flexcan_priv *priv;
+	struct resource *mem;
+	struct clk *clk;
+	void __iomem *base;
+	resource_size_t mem_size;
+	int err, irq;
+
+	clk = clk_get(&pdev->dev, NULL);
+	if (IS_ERR(clk)) {
+		dev_err(&pdev->dev, "no clock defined\n");
+		err = PTR_ERR(clk);
+		goto failed_clock;
+	}
+
+	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	irq = platform_get_irq(pdev, 0);
+	if (!mem || irq <= 0) {
+		err = -ENODEV;
+		goto failed_get;
+	}
+
+	mem_size = resource_size(mem);
+	if (!request_mem_region(mem->start, mem_size, pdev->name)) {
+		err = -EBUSY;
+		goto failed_req;
+	}
+
+	base = ioremap(mem->start, mem_size);
+	if (!base) {
+		err = -ENOMEM;
+		goto failed_map;
+	}
+
+	dev = alloc_candev(sizeof(struct flexcan_priv), 0);
+	if (!dev) {
+		err = -ENOMEM;
+		goto failed_alloc;
+	}
+
+	dev->netdev_ops = &flexcan_netdev_ops;
+	dev->irq = irq;
+	dev->flags |= IFF_ECHO; /* we support local echo in hardware */
+
+	priv = netdev_priv(dev);
+	priv->can.clock.freq = clk_get_rate(clk);
+	priv->can.bittiming_const = &flexcan_bittiming_const;
+	priv->can.do_set_mode = flexcan_set_mode;
+	priv->can.do_get_berr_counter = flexcan_get_berr_counter;
+	priv->can.ctrlmode_supported = CAN_CTRLMODE_LOOPBACK |
+		CAN_CTRLMODE_LISTENONLY	| CAN_CTRLMODE_3_SAMPLES |
+		CAN_CTRLMODE_BERR_REPORTING;
+	priv->base = base;
+	priv->dev = dev;
+	priv->clk = clk;
+	priv->pdata = pdev->dev.platform_data;
+
+	netif_napi_add(dev, &priv->napi, flexcan_poll, FLEXCAN_NAPI_WEIGHT);
+
+	dev_set_drvdata(&pdev->dev, dev);
+	SET_NETDEV_DEV(dev, &pdev->dev);
+
+	err = register_flexcandev(dev);
+	if (err) {
+		dev_err(&pdev->dev, "registering netdev failed\n");
+		goto failed_register;
+	}
+
+	dev_info(&pdev->dev, "device registered (reg_base=%p, irq=%d)\n",
+		 priv->base, dev->irq);
+
+	return 0;
+
+ failed_register:
+	free_candev(dev);
+ failed_alloc:
+	iounmap(base);
+ failed_map:
+	release_mem_region(mem->start, mem_size);
+ failed_req:
+	clk_put(clk);
+ failed_get:
+ failed_clock:
+	return err;
+}
+
+static int __devexit flexcan_remove(struct platform_device *pdev)
+{
+	struct net_device *dev = platform_get_drvdata(pdev);
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct resource *mem;
+
+	unregister_flexcandev(dev);
+	platform_set_drvdata(pdev, NULL);
+	free_candev(dev);
+	iounmap(priv->base);
+
+	mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	release_mem_region(mem->start, resource_size(mem));
+
+	clk_put(priv->clk);
+
+	return 0;
+}
+
+static struct platform_driver flexcan_driver = {
+	.driver.name = DRV_NAME,
+	.probe = flexcan_probe,
+	.remove = __devexit_p(flexcan_remove),
+};
+
+static int __init flexcan_init(void)
+{
+	pr_info("%s netdevice driver\n", DRV_NAME);
+	return platform_driver_register(&flexcan_driver);
+}
+
+static void __exit flexcan_exit(void)
+{
+	platform_driver_unregister(&flexcan_driver);
+	pr_info("%s: driver removed\n", DRV_NAME);
+}
+
+module_init(flexcan_init);
+module_exit(flexcan_exit);
+
+MODULE_AUTHOR("Sascha Hauer <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>, "
+	      "Marc Kleine-Budde <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("CAN port driver for flexcan based chip");
diff --git a/include/linux/can/platform/flexcan.h b/include/linux/can/platform/flexcan.h
new file mode 100644
index 0000000..72b713a
--- /dev/null
+++ b/include/linux/can/platform/flexcan.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2010 Marc Kleine-Budde <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
+ *
+ * This file is released under the GPLv2
+ *
+ */
+
+#ifndef __CAN_PLATFORM_FLEXCAN_H
+#define __CAN_PLATFORM_FLEXCAN_H
+
+/**
+ * struct flexcan_platform_data - flex CAN controller platform data
+ * @transceiver_enable:         - called to power on/off the transceiver
+ *
+ */
+struct flexcan_platform_data {
+	void (*transceiver_switch)(int enable);
+};
+
+#endif /* __CAN_PLATFORM_FLEXCAN_H */
-- 
1.7.1

^ permalink raw reply related

* Re: Another oops + repost
From: Eric W. Biederman @ 2010-07-22 16:43 UTC (permalink / raw)
  To: Martín Ferrari
  Cc: netdev, Ben Hutchings, 577640, Alexey Dobriyan, Mathieu Lacage,
	Daniel Lezcano
In-Reply-To: <AANLkTin_KDKEejs1ld-qv5jOyuALysjZ8NLC2-786htJ@mail.gmail.com>

Martín Ferrari <martin.ferrari@gmail.com> writes:

> Hi again,
>
> First of all, I would like to know if anybody was able to fix this
> problem  that got kinda lost in the thread:

I can't reproduce this on 2.6.35-rc1+

Can you please test a 2.6.35-rc version?  If you can still reproduce
it there can you send me your .config?  Otherwise I expect my last
round of changes to sysfs fixed whatever was the underlying problem.

Thanks,
Eric

^ permalink raw reply

* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw     HIDIOCGFEATURE  and HIDIOCSFEATURE
From: Alan Ott @ 2010-07-22 16:58 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Jiri Kosina, David S Miller, Michael Poole, Bastien Nocera,
	Eric Dumazet, linux-bluetooth, linux-kernel, netdev
In-Reply-To: <1279812115.2621.6.camel@localhost.localdomain>

On 07/22/2010 11:21 AM, Marcel Holtmann wrote:
>>>>>>> what is usb-hid.ko doing here? I would expect a bunch of code
>>>>>>> duplication with minor difference between USB and Bluetooth.
>>>>>>>
>>>>>>>                
>>>>>> usbhid doesn't have a lot of code for hidraw. Two functions are involved:
>>>>>>        usbhid_output_raw_report()
>>>>>>            - calls usb_control_msg() with Get_Report
>>>>>>        usbhid_get_raw_report()
>>>>>>            - calls usb_control_msg() with Set_Report
>>>>>>                OR
>>>>>>            - calls usb_interrupt_msg() on the Ouput pipe.
>>>>>>
>>>>>> This is of course easier than bluetooth because usb_control_msg() is
>>>>>> synchronous, even when requesting reports, mostly because of the nature
>>>>>> of USB, where the request and response are part of the same transfer.
>>>>>>
>>>>>> For Bluetooth, it's a bit more complicated since the kernel treats it
>>>>>> more like a networking interface (and indeed it is). My understanding is
>>>>>> that to make a synchronous transfer in bluetooth, one must:
>>>>>>        - send the request packet
>>>>>>        - block (wait_event_*())
>>>>>>        - when the response is received in the input handler, wake_up_*().
>>>>>>
>>>>>> There's not really any code duplication, mostly because initiating
>>>>>> synchronous USB transfers (input and output) is easy (because of the
>>>>>> usb_*_msg() functions), while making synchronous Bluetooth transfers
>>>>>> must be done manually. If there's a nice, convenient, synchronous
>>>>>> function in Bluetooth similar to usb_control_msg() that I've missed,
>>>>>> then let me know, as it would simplify this whole thing.
>>>>>>
>>>>>>              
>>>>> there is not and I don't think we ever get one. My question here was
>>>>> more in the direction why HID core is doing these synchronously in the
>>>>> first place. Especially since USB can do everything async as well.
>>>>>            
>>>> I'm open to suggestions. The way I see it is from a user space
>>>> perspective. With Get_Feature being on an ioctl(), I don't see any clean
>>>> way to do it other than synchronously. Other operating systems (I can
>>>> say for sure Windows, Mac OS X, and FreeBSD) handle Get/Set Feature the
>>>> same way (synchronously) from user space.
>>>>
>>>> You seem to be proposing an asynchronous interface. What would that look
>>>> like from user space?
>>>>          
>>> not necessarily from user space, but at least from HID core to HIDP and
>>> usb-hid transports. At least that is what I would expect, Jiri?
>>>        
>> Sorry for this taking too long (vacations, conferences, you name it) for
>> me to respond.
>>
>> As all the _raw() callbacks are purely intended for userspace interaction
>> anyway, it's perfectly fine (and in fact desirable) for the low-level
>> transport drivers to perform these operations synchronously (and that's
>> what USB implementation does as well).
>>
>> Marcel, if your opposition to synchronous interface is strong, we'll have
>> to think about other aproaches, but from my POV, the patch is fine as-is
>> for Bluetooth.
>>      
> that the ioctl() API is synchronous is fine to me, however pushing that
> down to the transport drivers seems wrong to me. I think the HID core
> should be able to handle a fully asynchronous transport driver. I know
> that with the USB subsystem you are little bit spoiled here, but for
> Bluetooth it is not the case. And in the end even using the asynchronous
> USB URB calls would be nice as well.
>    

How are the URB calls better than using the synchronous calls? (see below)

> So why not make the core actually wait for responses from the transport
> driver.

Because this makes the USB side a lot more complicated, and it would 
introduce transport specific code into the core. Further, it would 
involve the transport driver calling hidraw with _every_ single packet 
it receives. Further, it would have to call hidraw with HANDSHAKE 
protocol error packets as well.

>   I would make the transport drivers a lot simpler in the long
> run.

It would make the USB transport driver and drivers/hid/hidraw much more 
complicated right now, at the expense of making the BT transport driver 
marginally simpler (and I'm not even convinced whether it would really 
be simpler). (see below for more)

>   And I know that most likely besides Bluetooth and USB you won't see
> another, but you never know.
>
>    

I just don't understand the objection. In each transport type, the 
waiting will have to be done in a different way. USB and BT are 
different enough that this is the case already, without having to 
imagine future buses which use HID. In BT, you have to check each packet 
which comes back from the BT network to see whether it is the response 
packet that hidraw is waiting for. Further, you have to check for 
HANDSHAKE packets indicating protocol error. Where better for this to be 
done than in hidp? Further, how can this possibly happen in 
drivers/hid/hidraw, as it doesn't know about the details of bluetooth to 
make this determination, and why should it? In my last email ( 
http://lkml.org/lkml/2010/7/9/231 ) to which I got no response, I laid 
out how doing the blocking in drivers/hid/hidraw would only make all the 
parts except bluetooth more complicated (including the core, and the USB 
side), and would also introduce bluetooth-specific things into the core.

Further, you're saying that using the asynchronous USB URB calls would 
be a benefit. How is it a benefit to replace a single function call 
which does exactly what I want, with a set of asynchronous calls and 
then adding my own blocking to make it do the same thing? This sounds to 
me like it would be 1: more text, 2: duplication of code, 3: error 
prone. I can't understand how this is of benefit. Please explain to me 
what I'm missing.

In theory, what you're saying makes sense. Making common code and logic 
actually common is always good. In practice though, in this case, I 
submit that there really isn't any commonality, and the only way for 
there to be commonality is to do the USB side the hard way. Further, 
drivers/hid/hidraw can't wait for a bluetooth packet without having code 
that's bluetooth-specific. It seems just that simple to me.

I'll give it some more thought, and take another look at the code to see 
if there's something obvious that I'm missing. If you know what I'm 
missing in my understanding of the problem, please tell me :)

Alan.

^ permalink raw reply

* Re: Re: Fwd: LVS on local node
From: Franchoze Eric @ 2010-07-22 16:59 UTC (permalink / raw)
  To: Simon Horman; +Cc: wensong, lvs-devel, netdev, netfilter-devel
In-Reply-To: <20100722132502.GB28362@verge.net.au>



22.07.10, 17:25, "Simon Horman" <horms@verge.net.au>:

> On Thu, Jul 22, 2010 at 07:51:20AM +0400, Franchoze Eric wrote:
>  > Hello,
>  > 
>  > I'm trying to do load balancing of incoming traffic to my applications. This applications are not very  smp friendly, and I want try to run some instances according to number of cpus on single machine. And balance load of incoming traffic/connections to this applications.
>  > Looks like is should be similar to http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.localnode.html
>  > 
>  >  linux kernel 2.6.32 with  or without hide interface patches.  Tried different configurations but could not see packets on application layer.
>  > 
>  > 192.168.1.165 - eth0 - interface for external connections
>  > 195.0.0.1 - dummy0 - virtual interface, real application is binded to that address.
>  > 
>  > Configuration is:
>  > -A -t 192.168.1.165:1234 -s wlc
>  > -a -t 192.168.1.165:1234 -r 195.0.0.1:1234 -g -w
>  > 
>  > #ipvsadm -L -n
>  > IP Virtual Server version 1.2.1 (size=4096)
>  > Prot LocalAddress:Port Scheduler Flags
>  >   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>  > TCP  192.168.1.165:1234 wlc
>  >   -> 195.0.0.1:1234               Local   1      0          0        
>  > #
>  > 
>  > Log:
>  > [ 2106.897409] IPVS: lookup/out TCP 192.168.1.165:44847->192.168.1.165:1234 not hit
>  > [ 2106.897412] IPVS: lookup service: fwm 0 TCP 192.168.1.165:1234 hit
>  > [ 2106.897414] IPVS: ip_vs_wlc_schedule(): Scheduling...
>  > [ 2106.897416] IPVS: WLC: server 195.0.0.1:1234 activeconns 0 refcnt 2 weight 1 overhead 1
>  > [ 2106.897418] IPVS: Enter: ip_vs_conn_new, net/netfilter/ipvs/ip_vs_conn.c line 693
>  > [ 2106.897421] IPVS: Bind-dest TCP c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 fwd:L s:0 conn->flags:181 conn->refcnt:1 dest->refcnt:3
>  > [ 2106.897425] IPVS: Schedule fwd:L c:192.168.1.165:44847 v:192.168.1.165:1234 d:195.0.0.1:1234 conn->flags:1C1 conn->refcnt:2
>  > [ 2106.897429] IPVS: TCP input  [S...] 195.0.0.1:1234->192.168.1.165:44847 state: NONE->SYN_RECV conn->refcnt:2
>  > [ 2106.897431] IPVS: Enter: ip_vs_null_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 212
>  > [ 2106.897439] IPVS: lookup/in TCP 192.168.1.165:1234->192.168.1.165:44847 not hit
>  > [ 2106.897441] IPVS: lookup/out TCP 192.168.1.165:1234->192.168.1.165:44847 not hit
>  > [ 2107.277535] IPVS: packet type=1 proto=17 daddr=255.255.255.255 ignored
>  > [ 2108.542691] IPVS: packet type=1 proto=17 daddr=192.168.1.255 ignored
>  > 
>  > As the result, server application does receive anything on accept(). I tried to make dummy0 a hidden device and play with arp settings. But without result.
>  > 
>  > I will be happy to hear any idea how to do connection in this environment.
>  
>  Hi,
>  
>  while others have suggested not using LVS for this task for various reasons.
>  I would just like to comment that this should work and this smells
>  like a bug to me. I will try and confirm that. But it won't be today.
>  
>  
>  

With the latest kernel I see that: LVS accepts connections, selects right destination (if round robin is selected destination changes accoring it), then it detects that it is local node and do:
net/netfilter/ipvs/ip_vs_xmit.c:
   ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 struct ip_vs_protocol *pp)

Which does nothing with skb. (here I do not understand what happens with that packet then)
I think if VLS could change destination for packets which go from local node to local node then connection can be established. Is it reasonable?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox