Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 3/4] pcmcia: dev_node removal bugfix
From: Dominik Brodowski @ 2010-06-11 14:44 UTC (permalink / raw)
  To: linux-pcmcia; +Cc: Dominik Brodowski, netdev, linux-wireless
In-Reply-To: <20100611144359.GA9572@comet.dominikbrodowski.net>

Patch c7c2fa07 removed one line too much from smc91c92_cs.c.

Reported-by: Komuro <komurojun-mbn@nifty.com>
CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/net/pcmcia/smc91c92_cs.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/pcmcia/smc91c92_cs.c b/drivers/net/pcmcia/smc91c92_cs.c
index 7b6fe89..64e6a84 100644
--- a/drivers/net/pcmcia/smc91c92_cs.c
+++ b/drivers/net/pcmcia/smc91c92_cs.c
@@ -322,6 +322,7 @@ static int smc91c92_probe(struct pcmcia_device *link)
 	return -ENOMEM;
     smc = netdev_priv(dev);
     smc->p_dev = link;
+    link->priv = dev;
 
     spin_lock_init(&smc->lock);
     link->io.NumPorts1 = 16;
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH v4] netfilter: Xtables: idletimer target implementation
From: Luciano Coelho @ 2010-06-11 14:01 UTC (permalink / raw)
  To: netfilter; +Cc: netdev, Jan Engelhardt, Patrick McHardy, Timo Teras

This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.

Timers are identified by labels and are created when a rule is set with a new
label.  The rules also take a timeout value (in seconds) as an option.  If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.

One entry for each timer is created in sysfs.  This attribute contains the
timer remaining for the timer to expire.  The attributes are located under
the xt_idletimer class:

/sys/class/xt_idletimer/timers/<label>

When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).

Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
---
v2: Fixed according to Jan's comments
v3: Changed to a device class in the virtual bus in sysfs
    Removed unnecessary attribute group
    Fixed missing deallocation in some error cases
v4: Fixed according to Jan's and Patrick's comments to v3
    Changed to mutex locking instead of spin locks
    Save the timer in the target info struct to avoid extra reads
    Other small clean-ups

 include/linux/netfilter/xt_IDLETIMER.h |   45 +++++
 net/netfilter/Kconfig                  |   12 ++
 net/netfilter/Makefile                 |    1 +
 net/netfilter/xt_IDLETIMER.c           |  322 ++++++++++++++++++++++++++++++++
 4 files changed, 380 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_IDLETIMER.h
 create mode 100644 net/netfilter/xt_IDLETIMER.c

diff --git a/include/linux/netfilter/xt_IDLETIMER.h b/include/linux/netfilter/xt_IDLETIMER.h
new file mode 100644
index 0000000..9e95b98
--- /dev/null
+++ b/include/linux/netfilter/xt_IDLETIMER.h
@@ -0,0 +1,45 @@
+/*
+ * linux/include/linux/netfilter/xt_IDLETIMER.h
+ *
+ * Header file for Xtables timer target module.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Converted to x_tables and forward-ported to 2.6.34
+ * by Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#ifndef _XT_IDLETIMER_H
+#define _XT_IDLETIMER_H
+
+#include <linux/types.h>
+
+#define MAX_IDLETIMER_LABEL_SIZE 32
+
+struct idletimer_tg_info {
+	__u32 timeout;
+
+	char label[MAX_IDLETIMER_LABEL_SIZE];
+
+	/* for kernel module internal use only */
+	struct idletimer_tg *timer __attribute((aligned(8)));
+};
+
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8593a77..413ed24 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -424,6 +424,18 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_IDLETIMER
+	tristate  "IDLETIMER target support"
+	depends on NETFILTER_ADVANCED
+	help
+
+	  This option adds the `IDLETIMER' target.  Each matching packet
+	  resets the timer associated with label specified when the rule is
+	  added.  When the timer expires, it triggers a sysfs notification.
+	  The remaining time for expiration can be read via sysfs.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_TARGET_LED
 	tristate '"LED" target support'
 	depends on LEDS_CLASS && LEDS_TRIGGERS
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 14e3a8f..e28420a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
 
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
new file mode 100644
index 0000000..540020a
--- /dev/null
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -0,0 +1,322 @@
+/*
+ * linux/net/netfilter/xt_IDLETIMER.c
+ *
+ * Netfilter module to trigger a timer when packet matches.
+ * After timer expires a kevent will be sent.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Converted to x_tables and reworked for upstream inclusion
+ * by Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/timer.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_IDLETIMER.h>
+#include <linux/kobject.h>
+#include <linux/workqueue.h>
+#include <linux/sysfs.h>
+
+struct idletimer_tg_attr {
+	struct attribute attr;
+	ssize_t	(*show)(struct kobject *kobj,
+			struct attribute *attr, char *buf);
+};
+
+struct idletimer_tg {
+	struct list_head entry;
+	struct timer_list timer;
+	struct work_struct work;
+
+	struct kobject *kobj;
+	struct idletimer_tg_attr attr;
+
+	unsigned int refcnt;
+};
+
+static LIST_HEAD(idletimer_tg_list);
+static DEFINE_MUTEX(list_mutex);
+
+static struct kobject *idletimer_tg_kobj;
+
+static
+struct idletimer_tg *__idletimer_tg_find_by_label(const char *label)
+{
+	struct idletimer_tg *entry;
+
+	BUG_ON(!label);
+
+	list_for_each_entry(entry, &idletimer_tg_list, entry) {
+		if (!strcmp(label, entry->attr.attr.name))
+			return entry;
+	}
+
+	return NULL;
+}
+
+static ssize_t idletimer_tg_show(struct kobject *kobj, struct attribute *attr,
+				 char *buf)
+{
+	struct idletimer_tg *timer;
+	unsigned long expires = 0;
+
+	mutex_lock(&list_mutex);
+
+	timer =	__idletimer_tg_find_by_label(attr->name);
+	if (timer)
+		expires = timer->timer.expires;
+
+	mutex_unlock(&list_mutex);
+
+	if (time_after(expires, jiffies))
+		return sprintf(buf, "%u\n",
+			       jiffies_to_msecs(expires - jiffies) / 1000);
+
+	return sprintf(buf, "0\n");
+}
+
+static void idletimer_tg_work(struct work_struct *work)
+{
+	struct idletimer_tg *timer = container_of(work, struct idletimer_tg,
+						  work);
+
+	sysfs_notify(idletimer_tg_kobj, NULL, timer->attr.attr.name);
+}
+
+static void idletimer_tg_expired(unsigned long data)
+{
+	struct idletimer_tg *timer = (struct idletimer_tg *) data;
+
+	pr_debug("timer %s expired\n", timer->attr.attr.name);
+
+	schedule_work(&timer->work);
+}
+
+static int idletimer_tg_create(struct idletimer_tg_info *info)
+{
+	int ret;
+
+	info->timer = kmalloc(sizeof(*info->timer), GFP_ATOMIC);
+	if (!info->timer) {
+		pr_debug("couldn't alloc timer\n");
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	info->timer->attr.attr.name = kstrdup(info->label, GFP_ATOMIC);
+	if (!info->timer->attr.attr.name) {
+		pr_debug("couldn't alloc attribute name\n");
+		ret = -ENOMEM;
+		goto out_free_timer;
+	}
+	info->timer->attr.attr.mode = S_IRUGO;
+	info->timer->attr.show = idletimer_tg_show;
+
+	ret = sysfs_create_file(idletimer_tg_kobj, &info->timer->attr.attr);
+	if (ret < 0) {
+		pr_debug("couldn't add file to sysfs");
+		goto out_free_attr;
+	}
+
+	list_add(&info->timer->entry, &idletimer_tg_list);
+
+	setup_timer(&info->timer->timer, idletimer_tg_expired,
+		    (unsigned long) info->timer);
+	info->timer->refcnt = 1;
+
+	mod_timer(&info->timer->timer,
+		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+
+	INIT_WORK(&info->timer->work, idletimer_tg_work);
+
+	return 0;
+
+out_free_attr:
+	kfree(info->timer->attr.attr.name);
+out_free_timer:
+	kfree(info->timer);
+out:
+	return ret;
+}
+
+/*
+ * The actual xt_tables plugin.
+ */
+static unsigned int idletimer_tg_target(struct sk_buff *skb,
+					 const struct xt_action_param *par)
+{
+	const struct idletimer_tg_info *info = par->targinfo;
+
+	pr_debug("resetting timer %s, timeout period %u\n",
+		 info->label, info->timeout);
+
+	mutex_lock(&list_mutex);
+
+	BUG_ON(!info->timer);
+
+	mod_timer(&info->timer->timer,
+		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+
+	mutex_unlock(&list_mutex);
+
+	return XT_CONTINUE;
+}
+
+static int idletimer_tg_checkentry(struct xt_tgchk_param *par)
+{
+	struct idletimer_tg_info *info = par->targinfo;
+	int ret;
+
+	pr_debug("checkentry targinfo%s\n", info->label);
+
+	if (info->timeout == 0) {
+		pr_debug("timeout value is zero\n");
+		return -EINVAL;
+	}
+
+	if (info->label[0] == '\0' ||
+	    strnlen(info->label,
+		    MAX_IDLETIMER_LABEL_SIZE) == MAX_IDLETIMER_LABEL_SIZE) {
+		pr_debug("label is empty or not nul-terminated\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&list_mutex);
+
+	info->timer = __idletimer_tg_find_by_label(info->label);
+	if (info->timer) {
+		info->timer->refcnt++;
+		mod_timer(&info->timer->timer,
+			  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+
+		pr_debug("increased refcnt of timer %s to %u\n",
+			 info->label, info->timer->refcnt);
+	} else {
+		ret = idletimer_tg_create(info);
+		if (ret < 0) {
+			pr_debug("failed to create timer\n");
+			mutex_unlock(&list_mutex);
+			return ret;
+		}
+	}
+
+	mutex_unlock(&list_mutex);
+	return 0;
+}
+
+static void idletimer_tg_destroy(const struct xt_tgdtor_param *par)
+{
+	const struct idletimer_tg_info *info = par->targinfo;
+
+	pr_debug("destroy targinfo %s\n", info->label);
+
+	mutex_lock(&list_mutex);
+	if (!info->timer) {
+		mutex_unlock(&list_mutex);
+		return;
+	}
+
+	if (--info->timer->refcnt == 0) {
+		pr_debug("deleting timer %s\n", info->label);
+
+		list_del(&info->timer->entry);
+		del_timer_sync(&info->timer->timer);
+		sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr);
+		kfree(info->timer->attr.attr.name);
+		kfree(info->timer);
+	} else {
+		pr_debug("decreased refcnt of timer %s to %u\n",
+			 info->label, info->timer->refcnt);
+	}
+
+	mutex_unlock(&list_mutex);
+}
+
+static struct xt_target idletimer_tg __read_mostly = {
+	.name		= "IDLETIMER",
+	.family		= NFPROTO_UNSPEC,
+	.target		= idletimer_tg_target,
+	.targetsize     = sizeof(struct idletimer_tg_info),
+	.checkentry	= idletimer_tg_checkentry,
+	.destroy        = idletimer_tg_destroy,
+	.me		= THIS_MODULE,
+};
+
+static struct class *idletimer_tg_class;
+
+static struct device *idletimer_tg_device;
+
+static int __init idletimer_tg_init(void)
+{
+	int err;
+
+	idletimer_tg_class = class_create(THIS_MODULE, "xt_idletimer");
+	err = PTR_ERR(idletimer_tg_class);
+	if (IS_ERR(idletimer_tg_class)) {
+		pr_debug("couldn't register device class\n");
+		goto out;
+	}
+
+	idletimer_tg_device = device_create(idletimer_tg_class, NULL,
+					    MKDEV(0, 0), NULL, "timers");
+	err = PTR_ERR(idletimer_tg_device);
+	if (IS_ERR(idletimer_tg_device)) {
+		pr_debug("couldn't register system device\n");
+		goto out_class;
+	}
+
+	idletimer_tg_kobj = &idletimer_tg_device->kobj;
+
+	err =  xt_register_target(&idletimer_tg);
+	if (err < 0) {
+		pr_debug("couldn't register xt target\n");
+		goto out_dev;
+	}
+
+	return 0;
+out_dev:
+	device_destroy(idletimer_tg_class, MKDEV(0, 0));
+out_class:
+	class_destroy(idletimer_tg_class);
+out:
+	return err;
+}
+
+static void __exit idletimer_tg_exit(void)
+{
+	xt_unregister_target(&idletimer_tg);
+
+	device_destroy(idletimer_tg_class, MKDEV(0, 0));
+	class_destroy(idletimer_tg_class);
+}
+
+module_init(idletimer_tg_init);
+module_exit(idletimer_tg_exit);
+
+MODULE_AUTHOR("Timo Teras <ext-timo.teras@nokia.com>");
+MODULE_AUTHOR("Luciano Coelho <luciano.coelho@nokia.com>");
+MODULE_DESCRIPTION("Xtables: idle time monitor");
+MODULE_LICENSE("GPL v2");
-- 
1.6.3.3


^ permalink raw reply related

* Re: no reassembly for outgoing packets on RAW socket
From: Jiri Olsa @ 2010-06-11 13:10 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Patrick McHardy, netdev, Netfilter Developer Mailing List
In-Reply-To: <alpine.LSU.2.01.1006111152230.30433@obet.zrqbmnf.qr>

On Fri, Jun 11, 2010 at 11:53:32AM +0200, Jan Engelhardt wrote:
> 
> On Friday 2010-06-11 10:16, Jiri Olsa wrote:
> >
> >I prepared the patch implementing IP_NODEFRAG option for IPv4 socket.
> >
> >Also I just got an idea, that there could be no reassembly if there are
> >no rules for connection tracing set.. not sure how can I check that best
> >so far.. any idea?
> >
> >@@ -572,6 +572,14 @@ static int do_ip_setsockopt(struct sock *sk, int level,
> > 		}
> > 		inet->hdrincl = val ? 1 : 0;
> > 		break;
> >+	case IP_NODEFRAG:
> >+		if (sk->sk_type != SOCK_RAW) {
> >+			err = -ENOPROTOOPT;
> >+			break;
> >+		}
> >+		inet->nodefrag = val ? 1 : 0;
> >+		printk("IP_NODEFRAG %p -> %d\n", inet, inet->nodefrag);
> >+		break;
> 
> You want to get rid of this printk otherwise it spews the logs.

oops, I forgot to remove this one... thanks

new patch is attached

wbr,
jirka


---
diff --git a/include/linux/in.h b/include/linux/in.h
index 583c76f..41d88a4 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -85,6 +85,7 @@ struct in_addr {
 #define IP_RECVORIGDSTADDR   IP_ORIGDSTADDR
 
 #define IP_MINTTL       21
+#define IP_NODEFRAG     22
 
 /* IP_MTU_DISCOVER values */
 #define IP_PMTUDISC_DONT		0	/* Never send DF frames */
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 1653de5..1989cfd 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -137,7 +137,8 @@ struct inet_sock {
 				hdrincl:1,
 				mc_loop:1,
 				transparent:1,
-				mc_all:1;
+				mc_all:1,
+				nodefrag:1;
 	int			mc_index;
 	__be32			mc_addr;
 	struct ip_mc_socklist	*mc_list;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 551ce56..84d2c8e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -355,6 +355,8 @@ lookup_protocol:
 	inet = inet_sk(sk);
 	inet->is_icsk = (INET_PROTOSW_ICSK & answer_flags) != 0;
 
+	inet->nodefrag = 0;
+
 	if (SOCK_RAW == sock->type) {
 		inet->inet_num = protocol;
 		if (IPPROTO_RAW == protocol)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ce23178..d8196e1 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -449,7 +449,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 			     (1<<IP_MTU_DISCOVER) | (1<<IP_RECVERR) |
 			     (1<<IP_ROUTER_ALERT) | (1<<IP_FREEBIND) |
 			     (1<<IP_PASSSEC) | (1<<IP_TRANSPARENT) |
-			     (1<<IP_MINTTL))) ||
+			     (1<<IP_MINTTL) | (1<<IP_NODEFRAG))) ||
 	    optname == IP_MULTICAST_TTL ||
 	    optname == IP_MULTICAST_ALL ||
 	    optname == IP_MULTICAST_LOOP ||
@@ -572,6 +572,13 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 		}
 		inet->hdrincl = val ? 1 : 0;
 		break;
+	case IP_NODEFRAG:
+		if (sk->sk_type != SOCK_RAW) {
+			err = -ENOPROTOOPT;
+			break;
+		}
+		inet->nodefrag = val ? 1 : 0;
+		break;
 	case IP_MTU_DISCOVER:
 		if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_PROBE)
 			goto e_inval;
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index cb763ae..eab8de3 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -66,6 +66,11 @@ static unsigned int ipv4_conntrack_defrag(unsigned int hooknum,
 					  const struct net_device *out,
 					  int (*okfn)(struct sk_buff *))
 {
+	struct inet_sock *inet = inet_sk(skb->sk);
+
+	if (inet && inet->nodefrag)
+		return NF_ACCEPT;
+
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 #if !defined(CONFIG_NF_NAT) && !defined(CONFIG_NF_NAT_MODULE)
 	/* Previously seen (loopback)?  Ignore.  Do this before

^ permalink raw reply related

* [PATCH 7/7] ethoc: use devres resource management
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

The point of using the devres resource management routines is that they
simplify the driver by taking care of releasing resources on failure and
release.  A recent commit added a bunch of error handling that is unnecessary
in this context.

This patch removes this redundant error handling, as well as using
dmam_alloc_coherent in place of dma_alloc_coherent in order to use this
framework consistenly throughout the driver.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |   28 +---------------------------
 1 files changed, 1 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 1681f08..37ce8ac 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -964,7 +964,7 @@ static int ethoc_probe(struct platform_device *pdev)
 		}
 	} else {
 		/* Allocate buffer memory */
-		priv->membase = dma_alloc_coherent(NULL,
+		priv->membase = dmam_alloc_coherent(&pdev->dev,
 			buffer_size, (void *)&netdev->mem_start,
 			GFP_KERNEL);
 		if (!priv->membase) {
@@ -1074,21 +1074,6 @@ free_mdio:
 	kfree(priv->mdio->irq);
 	mdiobus_free(priv->mdio);
 free:
-	if (priv) {
-		if (priv->dma_alloc)
-			dma_free_coherent(NULL, priv->dma_alloc, priv->membase,
-					  netdev->mem_start);
-		else if (priv->membase)
-			devm_iounmap(&pdev->dev, priv->membase);
-		if (priv->iobase)
-			devm_iounmap(&pdev->dev, priv->iobase);
-	}
-	if (mem)
-		devm_release_mem_region(&pdev->dev, mem->start,
-					mem->end - mem->start + 1);
-	if (mmio)
-		devm_release_mem_region(&pdev->dev, mmio->start,
-					mmio->end - mmio->start + 1);
 	free_netdev(netdev);
 out:
 	return ret;
@@ -1115,17 +1100,6 @@ static int ethoc_remove(struct platform_device *pdev)
 			kfree(priv->mdio->irq);
 			mdiobus_free(priv->mdio);
 		}
-		if (priv->dma_alloc)
-			dma_free_coherent(NULL, priv->dma_alloc, priv->membase,
-				netdev->mem_start);
-		else {
-			devm_iounmap(&pdev->dev, priv->membase);
-			devm_release_mem_region(&pdev->dev, netdev->mem_start,
-				netdev->mem_end - netdev->mem_start + 1);
-		}
-		devm_iounmap(&pdev->dev, priv->iobase);
-		devm_release_mem_region(&pdev->dev, netdev->base_addr,
-			priv->io_region_size);
 		unregister_netdev(netdev);
 		free_netdev(netdev);
 	}
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 6/7] ethoc: Clear command buffer after write
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

This matches what ethoc_mdio_read does and makes the functions
symmetric.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index e5c2f5b..1681f08 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -613,8 +613,11 @@ static int ethoc_mdio_write(struct mii_bus *bus, int phy, int reg, u16 val)
 
 	while (time_before(jiffies, timeout)) {
 		u32 stat = ethoc_read(priv, MIISTATUS);
-		if (!(stat & MIISTATUS_BUSY))
+		if (!(stat & MIISTATUS_BUSY)) {
+			/* reset MII command register */
+			ethoc_write(priv, MIICOMMAND, 0);
 			return 0;
+		}
 
 		schedule();
 	}
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 2/7] ethoc: Write bus addresses to registers
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

The ethoc driver should be writing bus addresses to the ethoc registers, not
virtual addresses.  This patch adds an array to store the virtual addresses
in and references that array when manipulating the contents of the buffer
descriptors.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |   27 ++++++++++++++++++++++-----
 1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 68093cf..5904ad2 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -180,6 +180,7 @@ MODULE_PARM_DESC(buffer_size, "DMA buffer allocation size");
  * @dty_tx:	last buffer actually sent
  * @num_rx:	number of receive buffers
  * @cur_rx:	current receive buffer
+ * @vma:        pointer to array of virtual memory addresses for buffers
  * @netdev:	pointer to network device structure
  * @napi:	NAPI structure
  * @stats:	network device statistics
@@ -203,6 +204,8 @@ struct ethoc {
 	unsigned int num_rx;
 	unsigned int cur_rx;
 
+	void** vma;
+
 	struct net_device *netdev;
 	struct napi_struct napi;
 	struct net_device_stats stats;
@@ -285,18 +288,20 @@ static inline void ethoc_disable_rx_and_tx(struct ethoc *dev)
 	ethoc_write(dev, MODER, mode);
 }
 
-static int ethoc_init_ring(struct ethoc *dev)
+static int ethoc_init_ring(struct ethoc *dev, void* mem_start)
 {
 	struct ethoc_bd bd;
 	int i;
+	void* vma;
 
 	dev->cur_tx = 0;
 	dev->dty_tx = 0;
 	dev->cur_rx = 0;
 
 	/* setup transmission buffers */
-	bd.addr = virt_to_phys(dev->membase);
+	bd.addr = mem_start;
 	bd.stat = TX_BD_IRQ | TX_BD_CRC;
+	vma = dev->membase;
 
 	for (i = 0; i < dev->num_tx; i++) {
 		if (i == dev->num_tx - 1)
@@ -304,6 +309,9 @@ static int ethoc_init_ring(struct ethoc *dev)
 
 		ethoc_write_bd(dev, i, &bd);
 		bd.addr += ETHOC_BUFSIZ;
+
+		dev->vma[i] = vma;
+		vma += ETHOC_BUFSIZ;
 	}
 
 	bd.stat = RX_BD_EMPTY | RX_BD_IRQ;
@@ -314,6 +322,9 @@ static int ethoc_init_ring(struct ethoc *dev)
 
 		ethoc_write_bd(dev, dev->num_tx + i, &bd);
 		bd.addr += ETHOC_BUFSIZ;
+
+		dev->vma[dev->num_tx + i] = vma;
+		vma += ETHOC_BUFSIZ;
 	}
 
 	return 0;
@@ -415,7 +426,7 @@ static int ethoc_rx(struct net_device *dev, int limit)
 			skb = netdev_alloc_skb_ip_align(dev, size);
 
 			if (likely(skb)) {
-				void *src = phys_to_virt(bd.addr);
+				void *src = priv->vma[entry];
 				memcpy_fromio(skb_put(skb, size), src, size);
 				skb->protocol = eth_type_trans(skb, dev);
 				priv->stats.rx_packets++;
@@ -667,7 +678,7 @@ static int ethoc_open(struct net_device *dev)
 
 	ethoc_write(priv, TX_BD_NUM, priv->num_tx);
 
-	ethoc_init_ring(priv);
+	ethoc_init_ring(priv, (void*)dev->mem_start);
 	ethoc_reset(priv);
 
 	if (netif_queue_stopped(dev)) {
@@ -831,7 +842,7 @@ static netdev_tx_t ethoc_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	else
 		bd.stat &= ~TX_BD_PAD;
 
-	dest = phys_to_virt(bd.addr);
+	dest = priv->vma[entry];
 	memcpy_toio(dest, skb->data, skb->len);
 
 	bd.stat &= ~(TX_BD_STATS | TX_BD_LEN_MASK);
@@ -978,6 +989,12 @@ static int ethoc_probe(struct platform_device *pdev)
 	priv->num_tx = max(2, num_bd / 4);
 	priv->num_rx = num_bd - priv->num_tx;
 
+	priv->vma = devm_kzalloc(&pdev->dev, num_bd*sizeof(void*), GFP_KERNEL);
+	if (!priv->vma) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
 	/* Allow the platform setup code to pass in a MAC address. */
 	if (pdev->dev.platform_data) {
 		struct ethoc_platform_data *pdata =
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 3/7] ethoc: write number of TX buffers in init_ring
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

This moves the write of the TX_BD_NUM to init_ring together with the
rest of the code setting up the transmission buffers.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 5904ad2..afeb993 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -298,6 +298,8 @@ static int ethoc_init_ring(struct ethoc *dev, void* mem_start)
 	dev->dty_tx = 0;
 	dev->cur_rx = 0;
 
+	ethoc_write(dev, TX_BD_NUM, dev->num_tx);
+
 	/* setup transmission buffers */
 	bd.addr = mem_start;
 	bd.stat = TX_BD_IRQ | TX_BD_CRC;
@@ -676,8 +678,6 @@ static int ethoc_open(struct net_device *dev)
 	if (ret)
 		return ret;
 
-	ethoc_write(priv, TX_BD_NUM, priv->num_tx);
-
 	ethoc_init_ring(priv, (void*)dev->mem_start);
 	ethoc_reset(priv);
 
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 1/7] ethoc: calculate number of buffers in ethoc_probe
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn

This moves the calculation of the number of transmission buffers to
ethoc_probe where it more logically fits with the rest of the memory
allocation code.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 6ed2df1..68093cf 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -658,8 +658,6 @@ static int ethoc_mdio_probe(struct net_device *dev)
 static int ethoc_open(struct net_device *dev)
 {
 	struct ethoc *priv = netdev_priv(dev);
-	unsigned int min_tx = 2;
-	unsigned int num_bd;
 	int ret;
 
 	ret = request_irq(dev->irq, ethoc_interrupt, IRQF_SHARED,
@@ -667,11 +665,6 @@ static int ethoc_open(struct net_device *dev)
 	if (ret)
 		return ret;
 
-	/* calculate the number of TX/RX buffers, maximum 128 supported */
-	num_bd = min_t(unsigned int,
-		128, (dev->mem_end - dev->mem_start + 1) / ETHOC_BUFSIZ);
-	priv->num_tx = max(min_tx, num_bd / 4);
-	priv->num_rx = num_bd - priv->num_tx;
 	ethoc_write(priv, TX_BD_NUM, priv->num_tx);
 
 	ethoc_init_ring(priv);
@@ -884,6 +877,7 @@ static int ethoc_probe(struct platform_device *pdev)
 	struct resource *mem = NULL;
 	struct ethoc *priv = NULL;
 	unsigned int phy;
+	int num_bd;
 	int ret = 0;
 
 	/* allocate networking device */
@@ -978,6 +972,12 @@ static int ethoc_probe(struct platform_device *pdev)
 		priv->dma_alloc = buffer_size;
 	}
 
+	/* calculate the number of TX/RX buffers, maximum 128 supported */
+	num_bd = min_t(unsigned int,
+		128, (netdev->mem_end - netdev->mem_start + 1) / ETHOC_BUFSIZ);
+	priv->num_tx = max(2, num_bd / 4);
+	priv->num_rx = num_bd - priv->num_tx;
+
 	/* Allow the platform setup code to pass in a MAC address. */
 	if (pdev->dev.platform_data) {
 		struct ethoc_platform_data *pdata =
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 4/7] ethoc: Clean up PHY probing
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

- No need to iterate over all possible addresses on bus
- Use helper function phy_find_first
- Use phy_connect_direct as we already have the relevant structure

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |   24 ++++++++----------------
 1 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index afeb993..1ee9947 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -635,21 +635,13 @@ static int ethoc_mdio_probe(struct net_device *dev)
 {
 	struct ethoc *priv = netdev_priv(dev);
 	struct phy_device *phy;
+	int err;
 	int i;
 
-	for (i = 0; i < PHY_MAX_ADDR; i++) {
-		phy = priv->mdio->phy_map[i];
-		if (phy) {
-			if (priv->phy_id != -1) {
-				/* attach to specified PHY */
-				if (priv->phy_id == phy->addr)
-					break;
-			} else {
-				/* autoselect PHY if none was specified */
-				if (phy->addr != 0)
-					break;
-			}
-		}
+	if (priv->phy_id != -1) {
+		phy = priv->mdio->phy_map[priv->phy_id];
+	} else {
+		phy = phy_find_first(priv->mdio);
 	}
 
 	if (!phy) {
@@ -657,11 +649,11 @@ static int ethoc_mdio_probe(struct net_device *dev)
 		return -ENXIO;
 	}
 
-	phy = phy_connect(dev, dev_name(&phy->dev), ethoc_mdio_poll, 0,
+	err = phy_connect_direct(dev, phy, ethoc_mdio_poll, 0,
 			PHY_INTERFACE_MODE_GMII);
-	if (IS_ERR(phy)) {
+	if (err) {
 		dev_err(&dev->dev, "could not attach to PHY\n");
-		return PTR_ERR(phy);
+		return err;
 	}
 
 	priv->phy = phy;
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 5/7] Remove unused variable
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>


Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 1ee9947..e5c2f5b 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -636,7 +636,6 @@ static int ethoc_mdio_probe(struct net_device *dev)
 	struct ethoc *priv = netdev_priv(dev);
 	struct phy_device *phy;
 	int err;
-	int i;
 
 	if (priv->phy_id != -1) {
 		phy = priv->mdio->phy_map[priv->phy_id];
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] gianfar: Fix setup of RX time stamping
From: Manfred Rudigier @ 2010-06-11 11:49 UTC (permalink / raw)
  To: 'David Miller'
  Cc: 'Anton Vorontsov', Richard Cochran,
	'netdev@vger.kernel.org',
	'linuxppc-dev@ozlabs.org'

Previously the RCTRL_TS_ENABLE bit was set unconditionally. However, if
the RCTRL_TS_ENABLE is set without TMR_CTRL[TE], the driver does not work
properly on some boards (Anton had problems with the MPC8313ERDB and
MPC8568EMDS).

With this patch the bit will only be set if requested from user space
with the SIOCSHWTSTAMP ioctl command, meaning that time stamping is
disabled during normal operation. Users who are not interested in time
stamps will not experience problems with buggy CPU revisions or
performance drops any more.

The setting of TMR_CTRL[TE] is still up to the user. This is considered
safe because users wanting HW timestamps must initialize the eTSEC clock
first anyway, e.g. with the recently submitted PTP clock driver.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
---
 drivers/net/gianfar.c |   21 +++++++++++++++++----
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 46c69cd..227b628 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -381,10 +381,14 @@ static void gfar_init_mac(struct net_device *ndev)
 	/* Insert receive time stamps into padding alignment bytes */
 	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER) {
 		rctrl &= ~RCTRL_PAL_MASK;
-		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE | RCTRL_PADDING(8);
+		rctrl |= RCTRL_PADDING(8);
 		priv->padding = 8;
 	}
 
+	/* Enable HW time stamping if requested from user space */
+	if (priv->hwts_rx_en)
+		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
+
 	/* keep vlan related bits if it's enabled */
 	if (priv->vlgrp) {
 		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
@@ -747,7 +751,8 @@ static int gfar_of_init(struct of_device *ofdev, struct net_device **pdev)
 			FSL_GIANFAR_DEV_HAS_CSUM |
 			FSL_GIANFAR_DEV_HAS_VLAN |
 			FSL_GIANFAR_DEV_HAS_MAGIC_PACKET |
-			FSL_GIANFAR_DEV_HAS_EXTENDED_HASH;
+			FSL_GIANFAR_DEV_HAS_EXTENDED_HASH |
+			FSL_GIANFAR_DEV_HAS_TIMER;
 
 	ctype = of_get_property(np, "phy-connection-type", NULL);
 
@@ -805,12 +810,20 @@ static int gfar_hwtstamp_ioctl(struct net_device *netdev,
 
 	switch (config.rx_filter) {
 	case HWTSTAMP_FILTER_NONE:
-		priv->hwts_rx_en = 0;
+		if (priv->hwts_rx_en) {
+			stop_gfar(netdev);
+			priv->hwts_rx_en = 0;
+			startup_gfar(netdev);
+		}
 		break;
 	default:
 		if (!(priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER))
 			return -ERANGE;
-		priv->hwts_rx_en = 1;
+		if (!priv->hwts_rx_en) {
+			stop_gfar(netdev);
+			priv->hwts_rx_en = 1;
+			startup_gfar(netdev);
+		}
 		config.rx_filter = HWTSTAMP_FILTER_ALL;
 		break;
 	}
-- 
1.6.3.3

^ permalink raw reply related

* Weak host model vs .interface down
From: Joakim Tjernlund @ 2010-06-11 12:24 UTC (permalink / raw)
  To: netdev

Linux uses the weak host model which makes the IP addresses part of the system
rather than the interface. However consider this:

System A, eth0 connected to the network
# > ifconfig eth0 192.168.1.16
# > ifconfig eth1 192.168.1.17 down

System B
# > ping 192.168.1.17
PING 192.168.1.17 (192.168.1.17) 56(84) bytes of data.
64 bytes from 192.168.1.17: icmp_seq=1 ttl=64 time=0.618 ms

Isn't it a bit much to respond on 192.168.1.17 when its interface is down?
I even tried to set rp_filter=1 for all interfaces and that didn't help
either(not that I should need to)

   Jocke

^ permalink raw reply

* Re: [PATCH] gianfar: Fix setup of RX time stamping
From: Anton Vorontsov @ 2010-06-11 12:20 UTC (permalink / raw)
  To: Manfred Rudigier
  Cc: 'David Miller', Richard Cochran,
	'netdev@vger.kernel.org',
	'linuxppc-dev@ozlabs.org'
In-Reply-To: <95DC1AA8EC908B48939B72CF375AA5E3F03D8C46@alice.at.omicron.at>

On Fri, Jun 11, 2010 at 01:49:05PM +0200, Manfred Rudigier wrote:
> Previously the RCTRL_TS_ENABLE bit was set unconditionally. However, if
> the RCTRL_TS_ENABLE is set without TMR_CTRL[TE], the driver does not work
> properly on some boards (Anton had problems with the MPC8313ERDB and
> MPC8568EMDS).
> 
> With this patch the bit will only be set if requested from user space
> with the SIOCSHWTSTAMP ioctl command, meaning that time stamping is
> disabled during normal operation. Users who are not interested in time
> stamps will not experience problems with buggy CPU revisions or
> performance drops any more.
> 
> The setting of TMR_CTRL[TE] is still up to the user. This is considered
> safe because users wanting HW timestamps must initialize the eTSEC clock
> first anyway, e.g. with the recently submitted PTP clock driver.
> 
> Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
> ---

Looks OK. I tested that it doesn't break anything, but I didn't
test the timestamping functionality. So

Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>

Thanks,

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply

* Re: Packet capture and Bonding asymmetries
From: Paul LeoNerd Evans @ 2010-06-11 12:18 UTC (permalink / raw)
  To: Jay Vosburgh, netdev
In-Reply-To: <17501.1276123951@death.nxdomain.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]

On Wed, Jun 09, 2010 at 03:52:31PM -0700, Jay Vosburgh wrote:
> 	For your own private testing, you could add a call to
> __netif_nit_deliver in netif_receive_skb prior to this part:
> 
>         master = ACCESS_ONCE(orig_dev->master);
>         if (master) {
>                 if (skb_bond_should_drop(skb, master))
>                         null_or_orig = orig_dev; /* deliver only exact match */
>                 else
>                         skb->dev = master;
>         }
> 
> 	This will give you multiple captures of the same packet, as is
> seen for transmit (i.e., one on the slave, one on the bond).  For
> non-bonding devices, tcpdump will see each packet twice on the same
> device, so it's not really suitable for general use.

As per my last post, I've just tested the following patch and found it
to work just fine:

# pktdump -f "icmp" -n
[13:04:30] RX(eth0): ICMP| 192.168.56.1->192.168.56.6 echo-request seq=1
[13:04:30] RX(bond0): ICMP| 192.168.56.1->192.168.56.6 echo-request seq=1
[13:04:30] TX(bond0): ICMP| 192.168.56.6->192.168.56.1 echo-reply seq=1
[13:04:30] TX(eth0): ICMP| 192.168.56.6->192.168.56.1 echo-reply seq=1

I'll resubmit the patch properly for latest kernel version; this being
2.6.31.12 doesn't apply cleanly to upstream:

-----

--- linux-2.6.31.12-router/net/core/dev.c       2010-01-18 18:30:45.000000000 +0000
+++ linux-2.6.31.12-router_leobonding/net/core/dev.c    2010-06-11 12:39:43.000000000 +0100
@@ -2265,6 +2265,7 @@
        null_or_orig = NULL;
        orig_dev = skb->dev;
        if (orig_dev->master) {
+               netif_nit_deliver(skb);
                if (skb_bond_should_drop(skb))
                        null_or_orig = orig_dev; /* deliver only exact match */
                else

-----

This patch quite deliberately includes packets arriving from non-active
bonding slaves, because the intention of tcpdump, pktdump, et.al., is to
see "close to the wire"; a view of what's happening down that physical
ethernet cable.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* [PATCH] Phytec PCM027: register CAN resources
From: Marc Kleine-Budde @ 2010-06-11 11:46 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Juergen Beisert, Marc Kleine-Budde,
	Wolfgang Grandegger

From: Juergen Beisert <jbe-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

applies to current net-next-2.6
(which includes Wolfgang's patche "can: sja1000 platform data fixes")

cheers, Marc


>From 66af2a1778a468610e25403336cc27650fddef2a Mon Sep 17 00:00:00 2001
From: Juergen Beisert <jbe-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Date: Fri, 11 Jun 2010 13:27:03 +0200
Subject: [PATCH] Phytec PCM027: register CAN resources

This patch adds resources for the SJA1000 platform device.

Signed-off-by: Juergen Beisert <jbe-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 arch/arm/mach-pxa/include/mach/pcm027.h |    2 +-
 arch/arm/mach-pxa/pcm027.c              |   40 ++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-pxa/include/mach/pcm027.h b/arch/arm/mach-pxa/include/mach/pcm027.h
index 0408326..b65fcc4 100644
--- a/arch/arm/mach-pxa/include/mach/pcm027.h
+++ b/arch/arm/mach-pxa/include/mach/pcm027.h
@@ -49,7 +49,7 @@
 /* CAN controller SJA1000 (unsupported yet) */
 #define PCM027_CAN_IRQ_GPIO	114
 #define PCM027_CAN_IRQ		IRQ_GPIO(PCM027_CAN_IRQ_GPIO)
-#define PCM027_CAN_IRQ_EDGE	IRQ_TYPE_EDGE_FALLING
+#define PCM027_CAN_IRQ_EDGE	IORESOURCE_IRQ_LOWEDGE
 #define PCM027_CAN_PHYS		0x22000000
 #define PCM027_CAN_SIZE		0x100
 
diff --git a/arch/arm/mach-pxa/pcm027.c b/arch/arm/mach-pxa/pcm027.c
index 2190af0..2ef7686 100644
--- a/arch/arm/mach-pxa/pcm027.c
+++ b/arch/arm/mach-pxa/pcm027.c
@@ -26,6 +26,7 @@
 #include <linux/spi/spi.h>
 #include <linux/spi/max7301.h>
 #include <linux/leds.h>
+#include <linux/can/platform/sja1000.h>
 
 #include <asm/mach-types.h>
 #include <asm/mach/arch.h>
@@ -204,13 +205,50 @@ static struct platform_device pcm027_led_dev = {
 #endif /* CONFIG_LEDS_GPIO */
 
 /*
+ * SJA1000 CAN controller
+ */
+#if defined(CONFIG_CAN_SJA1000_PLATFORM) || defined(CONFIG_CAN_SJA1000_PLATFORM_MODULE)
+static struct resource pcm027_sja1000_resources[] = {
+	[0] = {
+		.start	= PCM027_CAN_PHYS,
+		.end	= PCM027_CAN_PHYS + PCM027_CAN_SIZE - 1,
+		.flags	= IORESOURCE_MEM,
+	},
+	[1] = {
+		.start	= PCM027_CAN_IRQ,
+		.end	= PCM027_CAN_IRQ,
+		.flags	= IORESOURCE_IRQ | PCM027_CAN_IRQ_EDGE,
+	},
+};
+
+static struct sja1000_platform_data pcm027_sja1000_platform_data = {
+	.osc_freq	= 16000000,
+	.ocr		= OCR_TX1_PULLDOWN | OCR_TX0_PUSHPULL,
+	.cdr		= CDR_CBP,
+};
+
+static struct platform_device pcm027_sja1000_device = {
+	.name		= "sja1000_platform",
+	.dev = {
+		.platform_data = &pcm027_sja1000_platform_data,
+	},
+	.num_resources	= ARRAY_SIZE(pcm027_sja1000_resources),
+	.resource	= pcm027_sja1000_resources,
+};
+#endif /* CONFIG_CAN_SJA1000_PLATFORM || CONFIG_CAN_SJA1000_PLATFORM_MODULE */
+
+
+/*
  * declare the available device resources on this board
  */
 static struct platform_device *devices[] __initdata = {
 	&smc91x_device,
 	&pcm027_flash,
 #ifdef CONFIG_LEDS_GPIO
-	&pcm027_led_dev
+	&pcm027_led_dev,
+#endif
+#if defined(CONFIG_CAN_SJA1000_PLATFORM) || defined(CONFIG_CAN_SJA1000_PLATFORM_MODULE)
+	&pcm027_sja1000_device,
 #endif
 };
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] allow to configure tcp_retries1 and tcp_retries2 per TCP socket
From: Salvador Fandino @ 2010-06-11 10:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: netdev, David S. Miller, linux-kernel, vger.kernel.org
In-Reply-To: <87bpbi4ycc.fsf@basil.nowhere.org>

On 06/10/2010 07:00 PM, Andi Kleen wrote:
> Salvador Fandino<salvador@qindel.com>  writes:
>
>
>    
>> The included patch adds support for setting the tcp_retries1 and
>> tcp_retries2 options in a per socket fashion as it is done for the
>> keepalive options TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL.
>>
>> The issue I am trying to solve is that when a socket has data queued for
>> delivering, the keepalive logic is not triggered. Instead, the
>> tcp_retries1/2 parameters are used to determine how many delivering
>> attempts should be performed before giving up.
>>      
> And why exactly do you need new tunables to solve this?
>    

How else could it be solved?

I can think of making the retransmission logic to also honor the 
keepalive settings, switching to sending packets every keepintvl seconds 
when the elapsed time goes over keepidle and abort after keepcnt.

Or, make retransmits_timed_out() also consider 
(keepidle+keepcnt*keepintvl) as a ceiling.

But frankly, I don't like any one of them. IMO, leaving alone backward 
compatibility issues, it would make more sense to do it the other way 
and change the keepalive logic to follow the same sending pattern used 
for data retransmissions, using keepidle, retries1 and retries2 as its 
parameters.

Well, another option would be to use keepcnt as retries2 when defined in 
tcp_sock. IMO it would make sense, but could be confusing for the user.

>> The patch is very straight forward and just replicates similar
>> functionality. There is one thing I am not completely sure and is if the
>> new per-socket fields should go into inet_connection_sock instead of
>> into tcp_sock.
>>      
> tcp_sock is already quite big (>2k on 64bit)
>
> IMHO any new fields in there need very good justification.
>    

If this is a problem, there are some room for optimization in the 
inet_connection_sock and tcp_sock structures. For instance, 
keepalive_time and keepalive_intvl are limited to MAX_KEEPALIVE_TIME * 
HZ, that is 32767 * 1000 ==> 25 bits, so they would fit in a u32.

retries1 and retries2 fields also fit in u32 and actually, a per socket 
retries1 field is not absolutely required because the check against 
retries2 is always performed, so the impact of this patch on the 
structure size could be limited to 4 bytes.

- Salva

^ permalink raw reply

* Source address selection vs. policy routing?
From: Andrew Lutomirski @ 2010-06-11 10:41 UTC (permalink / raw)
  To: netdev

The docs seem rather vague
(http://linux-ip.net/html/routing-saddr-selection.html) and I'm about
to set up a system with two different internet connections, so I'll
ask here:

If I use policy routing (ip rule) based on the src selector and a
local application calls connect without first calling bind, how is the
source address selected?  (Just reading the docs naively gives a
circular answer: the routing table is chosen based on source address
and then the source address is influenced by the src field on the
chosen route.)

Similarly, how does masquerade choose its source address?

I took a look at the code, but all the fast paths, slow paths, and
flow structures are a bit confusing to the uninitiated.

Thanks,
Andy

^ permalink raw reply

* [PATCH v4] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
From: sonic zhang @ 2010-06-11 10:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, uclinux-dist-devel

>From 4779e43a5a8446f695f8d6f5a006cfb45dc093d8 Mon Sep 17 00:00:00 2001
From: Sonic Zhang <sonic.zhang@analog.com>
Date: Fri, 11 Jun 2010 17:44:31 +0800
Subject: [PATCH v4] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer

SKBs hold onto resources that can't be held indefinitely, such as TCP
socket references and netfilter conntrack state.  So if a packet is left
in TX ring for a long time, there might be a TCP socket that cannot be
closed and freed up.

Current blackfin EMAC driver always reclaim and free used tx skbs in future
transfers. The problem is that future transfer may not come as soon as
possible. This patch start a timer after transfer to reclaim and free skb.
There is nearly no performance drop with this patch.

TX interrupt is not enabled because of a strange behavior of the Blackfin EMAC.
If EMAC TX transfer control is turned on, endless TX interrupts are triggered
no matter if TX DMA is enabled or not. Since DMA walks down the ring automatically,
TX transfer control can't be turned off in the middle. The only way is to disable
TX interrupt completely.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
---
 drivers/net/bfin_mac.c |  123 +++++++++++++++++++++++++++++++-----------------
 drivers/net/bfin_mac.h |    5 ++
 2 files changed, 85 insertions(+), 43 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 368f333..012613f 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -922,61 +922,73 @@ static void bfin_mac_hwtstamp_init(struct net_device *netdev)
 # define bfin_tx_hwtstamp(dev, skb)
 #endif
 
-static void adjust_tx_list(void)
+static inline void _tx_reclaim_skb(void)
+{
+	do {
+		tx_list_head->desc_a.config &= ~DMAEN;
+		tx_list_head->status.status_word = 0;
+		if (tx_list_head->skb) {
+			dev_kfree_skb(tx_list_head->skb);
+			tx_list_head->skb = NULL;
+		}
+		tx_list_head = tx_list_head->next;
+
+	} while (tx_list_head->status.status_word != 0);
+}
+
+static void tx_reclaim_skb(struct bfin_mac_local *lp)
 {
 	int timeout_cnt = MAX_TIMEOUT_CNT;
 
-	if (tx_list_head->status.status_word != 0 &&
-	    current_tx_ptr != tx_list_head) {
-		goto adjust_head;	/* released something, just return; */
-	}
+	if (tx_list_head->status.status_word != 0)
+		_tx_reclaim_skb();
 
-	/*
-	 * if nothing released, check wait condition
-	 * current's next can not be the head,
-	 * otherwise the dma will not stop as we want
-	 */
-	if (current_tx_ptr->next->next == tx_list_head) {
+	if (current_tx_ptr->next == tx_list_head) {
 		while (tx_list_head->status.status_word == 0) {
+			/* slow down polling to avoid too many queue stop. */
 			udelay(10);
-			if (tx_list_head->status.status_word != 0 ||
-			    !(bfin_read_DMA2_IRQ_STATUS() & DMA_RUN)) {
-				goto adjust_head;
-			}
-			if (timeout_cnt-- < 0) {
-				printk(KERN_ERR DRV_NAME
-				": wait for adjust tx list head timeout\n");
+			/* reclaim skb if DMA is not running. */
+			if (!(bfin_read_DMA2_IRQ_STATUS() & DMA_RUN))
+				break;
+			if (timeout_cnt-- < 0)
 				break;
-			}
-		}
-		if (tx_list_head->status.status_word != 0) {
-			goto adjust_head;
 		}
+
+		if (timeout_cnt >= 0)
+			_tx_reclaim_skb();
+		else
+			netif_stop_queue(lp->ndev);
 	}
 
-	return;
+	if (current_tx_ptr->next != tx_list_head &&
+		netif_queue_stopped(lp->ndev))
+		netif_wake_queue(lp->ndev);
+
+	if (tx_list_head != current_tx_ptr) {
+		/* shorten the timer interval if tx queue is stopped */
+		if (netif_queue_stopped(lp->ndev))
+			lp->tx_reclaim_timer.expires =
+				jiffies + (TX_RECLAIM_JIFFIES >> 4);
+		else
+			lp->tx_reclaim_timer.expires =
+				jiffies + TX_RECLAIM_JIFFIES;
+
+		mod_timer(&lp->tx_reclaim_timer,
+			lp->tx_reclaim_timer.expires);
+	}
 
-adjust_head:
-	do {
-		tx_list_head->desc_a.config &= ~DMAEN;
-		tx_list_head->status.status_word = 0;
-		if (tx_list_head->skb) {
-			dev_kfree_skb(tx_list_head->skb);
-			tx_list_head->skb = NULL;
-		} else {
-			printk(KERN_ERR DRV_NAME
-			       ": no sk_buff in a transmitted frame!\n");
-		}
-		tx_list_head = tx_list_head->next;
-	} while (tx_list_head->status.status_word != 0 &&
-		 current_tx_ptr != tx_list_head);
 	return;
+}
 
+static void tx_reclaim_skb_timeout(unsigned long lp)
+{
+	tx_reclaim_skb((struct bfin_mac_local *)lp);
 }
 
 static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 				struct net_device *dev)
 {
+	struct bfin_mac_local *lp = netdev_priv(dev);
 	u16 *data;
 	u32 data_align = (unsigned long)(skb->data) & 0x3;
 	union skb_shared_tx *shtx = skb_tx(skb);
@@ -1009,8 +1021,6 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 			skb->len);
 		current_tx_ptr->desc_a.start_addr =
 			(u32)current_tx_ptr->packet;
-		if (current_tx_ptr->status.status_word != 0)
-			current_tx_ptr->status.status_word = 0;
 		blackfin_dcache_flush_range(
 			(u32)current_tx_ptr->packet,
 			(u32)(current_tx_ptr->packet + skb->len + 2));
@@ -1022,6 +1032,9 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 	 */
 	SSYNC();
 
+	/* always clear status buffer before start tx dma */
+	current_tx_ptr->status.status_word = 0;
+
 	/* enable this packet's dma */
 	current_tx_ptr->desc_a.config |= DMAEN;
 
@@ -1037,13 +1050,14 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 	bfin_write_EMAC_OPMODE(bfin_read_EMAC_OPMODE() | TE);
 
 out:
-	adjust_tx_list();
-
 	bfin_tx_hwtstamp(dev, skb);
 
 	current_tx_ptr = current_tx_ptr->next;
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += (skb->len);
+
+	tx_reclaim_skb(lp);
+
 	return NETDEV_TX_OK;
 }
 
@@ -1167,8 +1181,11 @@ real_rx:
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void bfin_mac_poll(struct net_device *dev)
 {
+	struct bfin_mac_local *lp = netdev_priv(dev);
+
 	disable_irq(IRQ_MAC_RX);
 	bfin_mac_interrupt(IRQ_MAC_RX, dev);
+	tx_reclaim_skb(lp);
 	enable_irq(IRQ_MAC_RX);
 }
 #endif				/* CONFIG_NET_POLL_CONTROLLER */
@@ -1232,12 +1249,27 @@ static int bfin_mac_enable(void)
 /* Our watchdog timed out. Called by the networking layer */
 static void bfin_mac_timeout(struct net_device *dev)
 {
+	struct bfin_mac_local *lp = netdev_priv(dev);
+
 	pr_debug("%s: %s\n", dev->name, __func__);
 
 	bfin_mac_disable();
 
-	/* reset tx queue */
-	tx_list_tail = tx_list_head->next;
+	del_timer(&lp->tx_reclaim_timer);
+
+	/* reset tx queue and free skb */
+	while (tx_list_head != current_tx_ptr) {
+		tx_list_head->desc_a.config &= ~DMAEN;
+		tx_list_head->status.status_word = 0;
+		if (tx_list_head->skb) {
+			dev_kfree_skb(tx_list_head->skb);
+			tx_list_head->skb = NULL;
+		}
+		tx_list_head = tx_list_head->next;
+	}
+
+	if (netif_queue_stopped(lp->ndev))
+		netif_wake_queue(lp->ndev);
 
 	bfin_mac_enable();
 
@@ -1430,6 +1462,7 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 	SET_NETDEV_DEV(ndev, &pdev->dev);
 	platform_set_drvdata(pdev, ndev);
 	lp = netdev_priv(ndev);
+	lp->ndev = ndev;
 
 	/* Grab the MAC address in the MAC */
 	*(__le32 *) (&(ndev->dev_addr[0])) = cpu_to_le32(bfin_read_EMAC_ADDRLO());
@@ -1485,6 +1518,10 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 	ndev->netdev_ops = &bfin_mac_netdev_ops;
 	ndev->ethtool_ops = &bfin_mac_ethtool_ops;
 
+	init_timer(&lp->tx_reclaim_timer);
+	lp->tx_reclaim_timer.data = (unsigned long)lp;
+	lp->tx_reclaim_timer.function = tx_reclaim_skb_timeout;
+
 	spin_lock_init(&lp->lock);
 
 	/* now, enable interrupts */
diff --git a/drivers/net/bfin_mac.h b/drivers/net/bfin_mac.h
index 1ae7b82..04e4050 100644
--- a/drivers/net/bfin_mac.h
+++ b/drivers/net/bfin_mac.h
@@ -13,9 +13,12 @@
 #include <linux/net_tstamp.h>
 #include <linux/clocksource.h>
 #include <linux/timecompare.h>
+#include <linux/timer.h>
 
 #define BFIN_MAC_CSUM_OFFLOAD
 
+#define TX_RECLAIM_JIFFIES (HZ / 5)
+
 struct dma_descriptor {
 	struct dma_descriptor *next_dma_desc;
 	unsigned long start_addr;
@@ -68,6 +71,8 @@ struct bfin_mac_local {
 
 	int wol;		/* Wake On Lan */
 	int irq_wake_requested;
+	struct timer_list tx_reclaim_timer;
+	struct net_device *ndev;
 
 	/* MII and PHY stuffs */
 	int old_link;          /* used by bf537_adjust_link */
-- 
1.6.0




^ permalink raw reply related

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Johannes Berg @ 2010-06-11  9:55 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <AANLkTilUd2Bp1h4SzlW5f5KevoRM-os_fQrivLHbZyKY@mail.gmail.com>

On Tue, 2010-06-08 at 18:39 +0200, Kay Sievers wrote:

> That all works if you have two modules, like almost all buses have.
> That's what I meant, that we need to add stuff to the core to be able
> to cleanup bus devices internally too, if we use everything in a
> single module, which is also supposed to cleanup on unload, like the
> network devices like to do.

Or some "wait for bus to be cleaned up" we can call in the module exit
maybe?

johannes

^ permalink raw reply

* Re: no reassembly for outgoing packets on RAW socket
From: Jan Engelhardt @ 2010-06-11  9:53 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Patrick McHardy, netdev, Netfilter Developer Mailing List
In-Reply-To: <20100611081604.GA1739@jolsa.Belkin>


On Friday 2010-06-11 10:16, Jiri Olsa wrote:
>
>I prepared the patch implementing IP_NODEFRAG option for IPv4 socket.
>
>Also I just got an idea, that there could be no reassembly if there are
>no rules for connection tracing set.. not sure how can I check that best
>so far.. any idea?
>
>@@ -572,6 +572,14 @@ static int do_ip_setsockopt(struct sock *sk, int level,
> 		}
> 		inet->hdrincl = val ? 1 : 0;
> 		break;
>+	case IP_NODEFRAG:
>+		if (sk->sk_type != SOCK_RAW) {
>+			err = -ENOPROTOOPT;
>+			break;
>+		}
>+		inet->nodefrag = val ? 1 : 0;
>+		printk("IP_NODEFRAG %p -> %d\n", inet, inet->nodefrag);
>+		break;

You want to get rid of this printk otherwise it spews the logs.

^ permalink raw reply

* [PATCH] hso: remove setting of low_latency flag
From: f.aben @ 2010-06-11  9:17 UTC (permalink / raw)
  To: davem; +Cc: linux-usb, netdev, j.dumon


This patch removes the setting of the low_latency flag. 
tty_flip_buffer_push() is occasionally being called in irq context, which 
causes a hang if the low_latency flag is set.
Removing the low_latency flag only seems to impact the flush to ldisc, 
which will now be put on a workqueue.

Signed-off-by: Filip Aben <f.aben@option.com>

---

diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index 0a3c41f..4dd2351 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -1334,7 +1334,6 @@ static int hso_serial_open(struct tty_struct *tty, struct file *filp)
 	/* check for port already opened, if not set the termios */
 	serial->open_count++;
 	if (serial->open_count == 1) {
-		tty->low_latency = 1;
 		serial->rx_state = RX_IDLE;
 		/* Force default termio settings */
 		_hso_serial_set_termios(tty, NULL);

^ permalink raw reply related

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Jens Axboe @ 2010-06-11  9:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Rafael J. Wysocki, Carl Worth, Eric Anholt,
	Venkatesh Pallipadi, Dave Airlie, Jesse Barnes, David H?rdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <20100611085520.GA20218@elte.hu>

On 2010-06-11 10:55, Ingo Molnar wrote:
> 
> * Jens Axboe <jaxboe@fusionio.com> wrote:
> 
>> On 2010-06-11 10:32, Ingo Molnar wrote:
>>>
>>> * Jens Axboe <jaxboe@fusionio.com> wrote:
>>>
>>>> On 2010-06-09 03:53, Linus Torvalds wrote:
>>>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
>>>>>> Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
>>>>>> Submitter	: Jan Kreuzer <kontrollator@gmx.de>
>>>>>> Date		: 2010-06-05 06:15 (4 days old)
>>>>>
>>>>> This seems to have been introduced by
>>>>>
>>>>> 	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
>>>>> 	Author: Ingo Molnar <mingo@elte.hu>
>>>>> 	Date:   Sat Nov 8 17:05:38 2008 +0100
>>>>>
>>>>> 	    sched: optimize sched_clock() a bit
>>>>>     
>>>>> 	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
>>>>> 	    variant of __cycles_2_ns().
>>>>>     
>>>>> 	    Most of the time sched_clock() is called with irqs disabled already.
>>>>> 	    The few places that call it with irqs enabled need to be updated..
>>>>>     
>>>>> 	    Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>>>>
>>>>> and this seems to be one of those calling cases that need to be updated..
>>>>>
>>>>> Ingo? The call trace is:
>>>>>
>>>>> 	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
>>>>> 	caller is native_sched_clock+0x3c/0x68
>>>>> 	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
>>>>> 	Call Trace:
>>>>> 	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
>>>>> 	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
>>>>> 	[<ffffffff8101043d>] sched_clock+0x9/0xd
>>>>> 	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
>>>>> 	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
>>>>> 	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
>>>>> 	[<ffffffff81215537>] __make_request+0x338/0x45b
>>>>> 	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
>>>>> 	[<ffffffff81214909>] submit_bio+0xd2/0xef
>>>>> 	[<ffffffff811413cb>] submit_bh+0xf4/0x116
>>>>> 	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
>>>>> 	[<ffffffff81144875>] block_write_full_page+0x15/0x17
>>>>> 	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
>>>>> 	[<ffffffff810e1f91>] __writepage+0x1a/0x39
>>>>> 	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
>>>>> 	[<ffffffff810e3406>] generic_writepages+0x27/0x29
>>>>> 	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
>>>>> 	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
>>>>> 	[<ffffffff811d0cba>] kjournald2+0x147/0x37a
>>>>>
>>>>> (from the bugzilla thing)
>>>>
>>>> This should be fixed by commit 28f4197e which was merged on friday.
>>>
>>> Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With some 
>>> configs i get bad spinlock warnings during bootup:
>>>
>>> [   28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 usecs
>>> [   28.972003] calling  b44_init+0x0/0x55 @ 1
>>> [   28.976009] bus: 'pci': add driver b44
>>> [   28.976374]  sda:
>>> [   28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
>>> [   28.980000]  lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
>>> [   28.980000] Pid: 117, comm: async/0 Not tainted 2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
>>> [   28.980000] Call Trace:
>>> [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
>>> [   28.980000]  [<4134b7b7>] spin_bug+0x7c/0x87
>>> [   28.980000]  [<4134b853>] do_raw_spin_lock+0x1e/0x123
>>> [   28.980000]  [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
>>> [   28.980000]  [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
>>> [   28.980000]  [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
>>> [   28.980000]  [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
>>> [   28.980000]  [<41337bc7>] cfq_insert_request+0x8c/0x425
>>> [   28.980000]  [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
>>> [   28.980000]  [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
>>> [   28.980000]  [<41329225>] elv_insert+0x107/0x1a0
>>> [   28.980000]  [<41329354>] __elv_add_request+0x96/0x9d
>>> [   28.980000]  [<4132bb8c>] ? drive_stat_acct+0x9d/0xc6
>>> [   28.980000]  [<4132dd64>] __make_request+0x335/0x376
>>> [   28.980000]  [<4132c726>] generic_make_request+0x336/0x39d
>>> [   28.980000]  [<410ad422>] ? kmem_cache_alloc+0xa1/0x105
>>> [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
>>> [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
>>> [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
>>> [   28.980000]  [<41089347>] ? mempool_alloc+0x57/0xe2
>>> [   28.980000]  [<4132c804>] submit_bio+0x77/0x8f
>>> [   28.980000]  [<410d2cbc>] ? bio_alloc_bioset+0x37/0x94
>>> [   28.980000]  [<410ceb90>] submit_bh+0xc3/0xe2
>>> [   28.980000]  [<410d1474>] block_read_full_page+0x249/0x259
>>> [   28.980000]  [<410d31fb>] ? blkdev_get_block+0x0/0xc6
>>> [   28.980000]  [<41087bfa>] ? add_to_page_cache_locked+0x94/0xb5
>>> [   28.980000]  [<410d3d92>] blkdev_readpage+0xf/0x11
>>> [   28.980000]  [<41088823>] do_read_cache_page+0x7d/0x11a
>>> [   28.980000]  [<410d3d83>] ? blkdev_readpage+0x0/0x11
>>> [   28.980000]  [<410888f4>] read_cache_page_async+0x16/0x1b
>>> [   28.980000]  [<41088904>] read_cache_page+0xb/0x12
>>> [   28.980000]  [<410e80e1>] read_dev_sector+0x2a/0x63
>>> [   28.980000]  [<410e92e8>] adfspart_check_ICS+0x2e/0x166
>>> [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
>>> [   28.980000]  [<410e8d23>] rescan_partitions+0x196/0x3e4
>>> [   28.980000]  [<41ba7dc7>] ? __mutex_unlock_slowpath+0x98/0x9f
>>> [   28.980000]  [<410e92ba>] ? adfspart_check_ICS+0x0/0x166
>>> [   28.980000]  [<410d4277>] __blkdev_get+0x1e7/0x292
>>> [   28.980000]  [<4133a201>] ? kobject_put+0x14/0x16
>>> [   28.980000]  [<410d432c>] blkdev_get+0xa/0xc
>>> [   28.980000]  [<410e81fb>] register_disk+0x94/0xe5
>>> [   28.980000]  [<413326c6>] ? blk_register_region+0x1b/0x20
>>> [   28.980000]  [<41332815>] add_disk+0x57/0x95
>>> [   28.980000]  [<41331fc6>] ? exact_match+0x0/0x8
>>> [   28.980000]  [<4133233f>] ? exact_lock+0x0/0x11
>>> [   28.980000]  [<41643848>] sd_probe_async+0x108/0x1be
>>> [   28.980000]  [<41048865>] async_thread+0xf5/0x1e6
>>> [   28.980000]  [<4102cbcb>] ? default_wake_function+0x0/0xd
>>> [   28.980000]  [<41048770>] ? async_thread+0x0/0x1e6
>>> [   28.980000]  [<410433df>] kthread+0x5f/0x64
>>> [   28.980000]  [<41043380>] ? kthread+0x0/0x64
>>> [   28.980000]  [<41002cc6>] kernel_thread_helper+0x6/0x10
>>> [   29.264071] async/1 used greatest stack depth: 2336 bytes left
>>> [   29.267020] bus: 'ssb': add driver b44
>>> [   29.267072] initcall b44_init+0x0/0x55 returned 0 after 281250 usecs
>>> [   29.267076] calling  init_nic+0x0/0x16 @ 1
>>>
>>> Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config 
>>> attached. Reproducible on that sha1 and with that config.
>>
>> I think I see it, the internal CFQ blkg groups are not properly
>> initialized... Will send a patch shortly.
> 
> Cool - can test it with a short turnaround, the bug is easy to reproduce.

Thanks, I need to ensure what the best way to solve it is. The problem
is that if you have BLK_CGROUP set but don't enable the CFQ cgroup
stuff, then you end up calling the real update functions but CFQ has not
initialized them.

-- 
Jens Axboe


^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Ingo Molnar @ 2010-06-11  8:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Rafael J. Wysocki, Carl Worth, Eric Anholt,
	Venkatesh Pallipadi, Dave Airlie, Jesse Barnes, David H?rdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <4C11F661.3070604-5c4llco8/ftWk0Htik3J/w@public.gmane.org>


* Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:

> On 2010-06-11 10:32, Ingo Molnar wrote:
> > 
> > * Jens Axboe <jaxboe-5c4llco8/ftWk0Htik3J/w@public.gmane.org> wrote:
> > 
> >> On 2010-06-09 03:53, Linus Torvalds wrote:
> >>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
> >>>> Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> >>>> Submitter	: Jan Kreuzer <kontrollator-Mmb7MZpHnFY@public.gmane.org>
> >>>> Date		: 2010-06-05 06:15 (4 days old)
> >>>
> >>> This seems to have been introduced by
> >>>
> >>> 	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> >>> 	Author: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> >>> 	Date:   Sat Nov 8 17:05:38 2008 +0100
> >>>
> >>> 	    sched: optimize sched_clock() a bit
> >>>     
> >>> 	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> >>> 	    variant of __cycles_2_ns().
> >>>     
> >>> 	    Most of the time sched_clock() is called with irqs disabled already.
> >>> 	    The few places that call it with irqs enabled need to be updated.
> >>>     
> >>> 	    Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> >>>
> >>> and this seems to be one of those calling cases that need to be updated..
> >>>
> >>> Ingo? The call trace is:
> >>>
> >>> 	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> >>> 	caller is native_sched_clock+0x3c/0x68
> >>> 	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> >>> 	Call Trace:
> >>> 	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> >>> 	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> >>> 	[<ffffffff8101043d>] sched_clock+0x9/0xd
> >>> 	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> >>> 	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
> >>> 	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> >>> 	[<ffffffff81215537>] __make_request+0x338/0x45b
> >>> 	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> >>> 	[<ffffffff81214909>] submit_bio+0xd2/0xef
> >>> 	[<ffffffff811413cb>] submit_bh+0xf4/0x116
> >>> 	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> >>> 	[<ffffffff81144875>] block_write_full_page+0x15/0x17
> >>> 	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> >>> 	[<ffffffff810e1f91>] __writepage+0x1a/0x39
> >>> 	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> >>> 	[<ffffffff810e3406>] generic_writepages+0x27/0x29
> >>> 	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> >>> 	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> >>> 	[<ffffffff811d0cba>] kjournald2+0x147/0x37a
> >>>
> >>> (from the bugzilla thing)
> >>
> >> This should be fixed by commit 28f4197e which was merged on friday.
> > 
> > Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With some 
> > configs i get bad spinlock warnings during bootup:
> > 
> > [   28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 usecs
> > [   28.972003] calling  b44_init+0x0/0x55 @ 1
> > [   28.976009] bus: 'pci': add driver b44
> > [   28.976374]  sda:
> > [   28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
> > [   28.980000]  lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> > [   28.980000] Pid: 117, comm: async/0 Not tainted 2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
> > [   28.980000] Call Trace:
> > [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
> > [   28.980000]  [<4134b7b7>] spin_bug+0x7c/0x87
> > [   28.980000]  [<4134b853>] do_raw_spin_lock+0x1e/0x123
> > [   28.980000]  [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
> > [   28.980000]  [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
> > [   28.980000]  [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
> > [   28.980000]  [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
> > [   28.980000]  [<41337bc7>] cfq_insert_request+0x8c/0x425
> > [   28.980000]  [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
> > [   28.980000]  [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
> > [   28.980000]  [<41329225>] elv_insert+0x107/0x1a0
> > [   28.980000]  [<41329354>] __elv_add_request+0x96/0x9d
> > [   28.980000]  [<4132bb8c>] ? drive_stat_acct+0x9d/0xc6
> > [   28.980000]  [<4132dd64>] __make_request+0x335/0x376
> > [   28.980000]  [<4132c726>] generic_make_request+0x336/0x39d
> > [   28.980000]  [<410ad422>] ? kmem_cache_alloc+0xa1/0x105
> > [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
> > [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
> > [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
> > [   28.980000]  [<41089347>] ? mempool_alloc+0x57/0xe2
> > [   28.980000]  [<4132c804>] submit_bio+0x77/0x8f
> > [   28.980000]  [<410d2cbc>] ? bio_alloc_bioset+0x37/0x94
> > [   28.980000]  [<410ceb90>] submit_bh+0xc3/0xe2
> > [   28.980000]  [<410d1474>] block_read_full_page+0x249/0x259
> > [   28.980000]  [<410d31fb>] ? blkdev_get_block+0x0/0xc6
> > [   28.980000]  [<41087bfa>] ? add_to_page_cache_locked+0x94/0xb5
> > [   28.980000]  [<410d3d92>] blkdev_readpage+0xf/0x11
> > [   28.980000]  [<41088823>] do_read_cache_page+0x7d/0x11a
> > [   28.980000]  [<410d3d83>] ? blkdev_readpage+0x0/0x11
> > [   28.980000]  [<410888f4>] read_cache_page_async+0x16/0x1b
> > [   28.980000]  [<41088904>] read_cache_page+0xb/0x12
> > [   28.980000]  [<410e80e1>] read_dev_sector+0x2a/0x63
> > [   28.980000]  [<410e92e8>] adfspart_check_ICS+0x2e/0x166
> > [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
> > [   28.980000]  [<410e8d23>] rescan_partitions+0x196/0x3e4
> > [   28.980000]  [<41ba7dc7>] ? __mutex_unlock_slowpath+0x98/0x9f
> > [   28.980000]  [<410e92ba>] ? adfspart_check_ICS+0x0/0x166
> > [   28.980000]  [<410d4277>] __blkdev_get+0x1e7/0x292
> > [   28.980000]  [<4133a201>] ? kobject_put+0x14/0x16
> > [   28.980000]  [<410d432c>] blkdev_get+0xa/0xc
> > [   28.980000]  [<410e81fb>] register_disk+0x94/0xe5
> > [   28.980000]  [<413326c6>] ? blk_register_region+0x1b/0x20
> > [   28.980000]  [<41332815>] add_disk+0x57/0x95
> > [   28.980000]  [<41331fc6>] ? exact_match+0x0/0x8
> > [   28.980000]  [<4133233f>] ? exact_lock+0x0/0x11
> > [   28.980000]  [<41643848>] sd_probe_async+0x108/0x1be
> > [   28.980000]  [<41048865>] async_thread+0xf5/0x1e6
> > [   28.980000]  [<4102cbcb>] ? default_wake_function+0x0/0xd
> > [   28.980000]  [<41048770>] ? async_thread+0x0/0x1e6
> > [   28.980000]  [<410433df>] kthread+0x5f/0x64
> > [   28.980000]  [<41043380>] ? kthread+0x0/0x64
> > [   28.980000]  [<41002cc6>] kernel_thread_helper+0x6/0x10
> > [   29.264071] async/1 used greatest stack depth: 2336 bytes left
> > [   29.267020] bus: 'ssb': add driver b44
> > [   29.267072] initcall b44_init+0x0/0x55 returned 0 after 281250 usecs
> > [   29.267076] calling  init_nic+0x0/0x16 @ 1
> > 
> > Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config 
> > attached. Reproducible on that sha1 and with that config.
> 
> I think I see it, the internal CFQ blkg groups are not properly
> initialized... Will send a patch shortly.

Cool - can test it with a short turnaround, the bug is easy to reproduce.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH] gianfar: Fix TX ring processing on SMP machines
From: Esben Haabendal @ 2010-06-11  8:45 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: David Miller, linuxppc-dev, netdev, Martyn Welch, Paul Gortmaker,
	Sandeep Gopalpet
In-Reply-To: <20100303181858.GA458@oksana.dev.rtsoft.ru>

On Wed, Mar 3, 2010 at 8:18 PM, Anton Vorontsov
<avorontsov@ru.mvista.com> wrote:
> Starting with commit a3bc1f11e9b867a4f49505 ("gianfar: Revive SKB
> recycling") gianfar driver sooner or later stops transmitting any
> packets on SMP machines.
>
> start_xmit() prepares new skb for transmitting, generally it does
> three things:
>
> 1. sets up all BDs (marks them ready to send), except the first one.
> 2. stores skb into tx_queue->tx_skbuff so that clean_tx_ring()
>   would cleanup it later.
> 3. sets up the first BD, i.e. marks it ready.
>
> Here is what clean_tx_ring() does:
>
> 1. reads skbs from tx_queue->tx_skbuff
> 2. checks if the *last* BD is ready. If it's still ready [to send]
>   then it it isn't transmitted, so clean_tx_ring() returns.
>   Otherwise it actually cleanups BDs. All is OK.
>
> Now, if there is just one BD, code flow:
>
> - start_xmit(): stores skb into tx_skbuff. Note that the first BD
>  (which is also the last one) isn't marked as ready, yet.
> - clean_tx_ring(): sees that skb is not null, *and* its lstatus
>  says that it is NOT ready (like if BD was sent), so it cleans
>  it up (bad!)
> - start_xmit(): marks BD as ready [to send], but it's too late.
>
> We can fix this simply by reordering lstatus/tx_skbuff writes.
>
> Reported-by: Martyn Welch <martyn.welch@ge.com>
> Bisected-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> Tested-by: Martyn Welch <martyn.welch@ge.com>
> Cc: Sandeep Gopalpet <Sandeep.Kumar@freescale.com>
> Cc: Stable <stable@vger.kernel.org> [2.6.33]
> ---
>  drivers/net/gianfar.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..cccb409 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>        }
>
>        /* setup the TxBD length and buffer pointer for the first BD */
> -       tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
>        txbdp_start->bufPtr = dma_map_single(&priv->ofdev->dev, skb->data,
>                        skb_headlen(skb), DMA_TO_DEVICE);
>
> @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>
>        txbdp_start->lstatus = lstatus;
>
> +       eieio(); /* force lstatus write before tx_skbuff */
> +
> +       tx_queue->tx_skbuff[tx_queue->skb_curtx] = skb;
> +
>        /* Update the current skb pointer to the next entry we will use
>         * (wrapping if necessary) */
>        tx_queue->skb_curtx = (tx_queue->skb_curtx + 1) &

This patch also makes gianfar work stable on mpc8313 with 2.6.33/RT_PREEMPT.
WIthout it, I see exactly the same problems as reported by Anton on SMP.

/Esben
-- 
Esben Haabendal, Senior Software Consultant
DoréDevelopment ApS, Ved Stranden 1, 9560 Hadsund, DK-Denmark
Phone: +45 51 92 53 93, E-mail: eha@doredevelopment.dk
WWW: http://www.doredevelopment.dk

^ permalink raw reply

* Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
From: Jens Axboe @ 2010-06-11  8:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Rafael J. Wysocki, Carl Worth, Eric Anholt,
	Venkatesh Pallipadi, Dave Airlie, Jesse Barnes, David H?rdeman,
	Mauro Carvalho Chehab, Eric Dumazet, Linux Kernel Mailing List,
	Maciej Rutecki, Andrew Morton, Kernel Testers List,
	Network Development, Linux ACPI, Linux PM List, Linux SCSI List,
	Linux Wireless List, DRI
In-Reply-To: <20100611083249.GA11143@elte.hu>

On 2010-06-11 10:32, Ingo Molnar wrote:
> 
> * Jens Axboe <jaxboe@fusionio.com> wrote:
> 
>> On 2010-06-09 03:53, Linus Torvalds wrote:
>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=16129
>>>> Subject	: BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
>>>> Submitter	: Jan Kreuzer <kontrollator@gmx.de>
>>>> Date		: 2010-06-05 06:15 (4 days old)
>>>
>>> This seems to have been introduced by
>>>
>>> 	commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
>>> 	Author: Ingo Molnar <mingo@elte.hu>
>>> 	Date:   Sat Nov 8 17:05:38 2008 +0100
>>>
>>> 	    sched: optimize sched_clock() a bit
>>>     
>>> 	    sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
>>> 	    variant of __cycles_2_ns().
>>>     
>>> 	    Most of the time sched_clock() is called with irqs disabled already.
>>> 	    The few places that call it with irqs enabled need to be updated.
>>>     
>>> 	    Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>>
>>> and this seems to be one of those calling cases that need to be updated..
>>>
>>> Ingo? The call trace is:
>>>
>>> 	BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
>>> 	caller is native_sched_clock+0x3c/0x68
>>> 	Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
>>> 	Call Trace:
>>> 	[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
>>> 	[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
>>> 	[<ffffffff8101043d>] sched_clock+0x9/0xd
>>> 	[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
>>> 	[<ffffffff81214d71>] get_request+0x1c4/0x2d0
>>> 	[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
>>> 	[<ffffffff81215537>] __make_request+0x338/0x45b
>>> 	[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
>>> 	[<ffffffff81214909>] submit_bio+0xd2/0xef
>>> 	[<ffffffff811413cb>] submit_bh+0xf4/0x116
>>> 	[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
>>> 	[<ffffffff81144875>] block_write_full_page+0x15/0x17
>>> 	[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
>>> 	[<ffffffff810e1f91>] __writepage+0x1a/0x39
>>> 	[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
>>> 	[<ffffffff810e3406>] generic_writepages+0x27/0x29
>>> 	[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
>>> 	[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
>>> 	[<ffffffff811d0cba>] kjournald2+0x147/0x37a
>>>
>>> (from the bugzilla thing)
>>
>> This should be fixed by commit 28f4197e which was merged on friday.
> 
> Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With some 
> configs i get bad spinlock warnings during bootup:
> 
> [   28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 usecs
> [   28.972003] calling  b44_init+0x0/0x55 @ 1
> [   28.976009] bus: 'pci': add driver b44
> [   28.976374]  sda:
> [   28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
> [   28.980000]  lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [   28.980000] Pid: 117, comm: async/0 Not tainted 2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
> [   28.980000] Call Trace:
> [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
> [   28.980000]  [<4134b7b7>] spin_bug+0x7c/0x87
> [   28.980000]  [<4134b853>] do_raw_spin_lock+0x1e/0x123
> [   28.980000]  [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
> [   28.980000]  [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
> [   28.980000]  [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
> [   28.980000]  [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
> [   28.980000]  [<41337bc7>] cfq_insert_request+0x8c/0x425
> [   28.980000]  [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
> [   28.980000]  [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
> [   28.980000]  [<41329225>] elv_insert+0x107/0x1a0
> [   28.980000]  [<41329354>] __elv_add_request+0x96/0x9d
> [   28.980000]  [<4132bb8c>] ? drive_stat_acct+0x9d/0xc6
> [   28.980000]  [<4132dd64>] __make_request+0x335/0x376
> [   28.980000]  [<4132c726>] generic_make_request+0x336/0x39d
> [   28.980000]  [<410ad422>] ? kmem_cache_alloc+0xa1/0x105
> [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
> [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
> [   28.980000]  [<41089285>] ? mempool_alloc_slab+0xe/0x10
> [   28.980000]  [<41089347>] ? mempool_alloc+0x57/0xe2
> [   28.980000]  [<4132c804>] submit_bio+0x77/0x8f
> [   28.980000]  [<410d2cbc>] ? bio_alloc_bioset+0x37/0x94
> [   28.980000]  [<410ceb90>] submit_bh+0xc3/0xe2
> [   28.980000]  [<410d1474>] block_read_full_page+0x249/0x259
> [   28.980000]  [<410d31fb>] ? blkdev_get_block+0x0/0xc6
> [   28.980000]  [<41087bfa>] ? add_to_page_cache_locked+0x94/0xb5
> [   28.980000]  [<410d3d92>] blkdev_readpage+0xf/0x11
> [   28.980000]  [<41088823>] do_read_cache_page+0x7d/0x11a
> [   28.980000]  [<410d3d83>] ? blkdev_readpage+0x0/0x11
> [   28.980000]  [<410888f4>] read_cache_page_async+0x16/0x1b
> [   28.980000]  [<41088904>] read_cache_page+0xb/0x12
> [   28.980000]  [<410e80e1>] read_dev_sector+0x2a/0x63
> [   28.980000]  [<410e92e8>] adfspart_check_ICS+0x2e/0x166
> [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
> [   28.980000]  [<410e8d23>] rescan_partitions+0x196/0x3e4
> [   28.980000]  [<41ba7dc7>] ? __mutex_unlock_slowpath+0x98/0x9f
> [   28.980000]  [<410e92ba>] ? adfspart_check_ICS+0x0/0x166
> [   28.980000]  [<410d4277>] __blkdev_get+0x1e7/0x292
> [   28.980000]  [<4133a201>] ? kobject_put+0x14/0x16
> [   28.980000]  [<410d432c>] blkdev_get+0xa/0xc
> [   28.980000]  [<410e81fb>] register_disk+0x94/0xe5
> [   28.980000]  [<413326c6>] ? blk_register_region+0x1b/0x20
> [   28.980000]  [<41332815>] add_disk+0x57/0x95
> [   28.980000]  [<41331fc6>] ? exact_match+0x0/0x8
> [   28.980000]  [<4133233f>] ? exact_lock+0x0/0x11
> [   28.980000]  [<41643848>] sd_probe_async+0x108/0x1be
> [   28.980000]  [<41048865>] async_thread+0xf5/0x1e6
> [   28.980000]  [<4102cbcb>] ? default_wake_function+0x0/0xd
> [   28.980000]  [<41048770>] ? async_thread+0x0/0x1e6
> [   28.980000]  [<410433df>] kthread+0x5f/0x64
> [   28.980000]  [<41043380>] ? kthread+0x0/0x64
> [   28.980000]  [<41002cc6>] kernel_thread_helper+0x6/0x10
> [   29.264071] async/1 used greatest stack depth: 2336 bytes left
> [   29.267020] bus: 'ssb': add driver b44
> [   29.267072] initcall b44_init+0x0/0x55 returned 0 after 281250 usecs
> [   29.267076] calling  init_nic+0x0/0x16 @ 1
> 
> Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config 
> attached. Reproducible on that sha1 and with that config.

I think I see it, the internal CFQ blkg groups are not properly
initialized... Will send a patch shortly.

-- 
Jens Axboe


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox