Netdev List
 help / color / mirror / Atom feed
* Re: Weak host model vs .interface down
From: Joakim Tjernlund @ 2010-06-11 17:06 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev
In-Reply-To: <4C126514.30905@hp.com>

Rick Jones <rick.jones2@hp.com> wrote on 2010/06/11 18:32:20:
> Joakim Tjernlund wrote:
> > Linux uses the weak host model which makes the IP addresses part of the system
> > rather than the interface. However consider this:
> >
> > System A, eth0 connected to the network
> > # > ifconfig eth0 192.168.1.16
> > # > ifconfig eth1 192.168.1.17 down
> >
> > System B
> > # > ping 192.168.1.17
> > PING 192.168.1.17 (192.168.1.17) 56(84) bytes of data.
> > 64 bytes from 192.168.1.17: icmp_seq=1 ttl=64 time=0.618 ms
> >
> > Isn't it a bit much to respond on 192.168.1.17 when its interface is down?
>
> As you said at the beginning, the weak end system model presumes the IP address
> is part of the system.  Seems to me that means unless one removes the IP address
> from the system it is reasonable for the system to continue to respond to that
> IP address.  Regardless of what happens to any individual interface.

The weak model doesn't go into such detail, it is assumption/impl. detail
to assume that the ip address still is part of the system even when the interface
is down. One could just as well define interface down as temporarly removing
the IP address from the system too. This makes make much more sense to me and
if you always want the system to answer on a IP adress you make it an IP alias.

Since the current behaviour is a problem to me and routers in general, can
we change this? What is the current usage model which needs it to stay as is?

>
> Now, I wouldn't expect it to continue to respond to 192.168.1.17 through eth1,
> but if eth0 is indeed connected to the same broadcast domain, given the
> following of the weak end-system model, continuing to respond seems consistent
> with enthusiasticaly following the weak end-system model.

Dosnt matter if it is in the same broadcast domain, you can use a bridge
interface or dummy interface too. It will still respond to 192.168.1.17
I can't find a way disable this behaviour, can you?


^ permalink raw reply

* Re: [ath5k-devel] [PATCH] [ath5k][leds] Ability to disable leds support. If leds support enabled do not force mac802.11 leds layer selection.
From: Bob Copeland @ 2010-06-11 17:07 UTC (permalink / raw)
  To: Dmytro Milinevskyy
  Cc: Kalle Valo, GeunSik Lim, Greg Kroah-Hartman, Jiri Slaby,
	netdev-u79uwXL29TY76Z2rM5mHXA, ath5k-devel-xDcbHBWguxEUs3QNXV6qNA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, John W. Linville,
	Keng-Yu Lin, Jiri Kosina, Shahar Or,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Luca Verdesca
In-Reply-To: <AANLkTimXqdkQxv_Y1JaleTTWNsDIQHQaPn8HR7efOupb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, Jun 9, 2010 at 10:43 AM, Dmytro Milinevskyy
<milinevskyy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi, Bob.
>
> For instance I don't use 802.11 leds layer and trigger leds from
> userspace according to my purposes.

FWIW you can disable mac80211 triggers from userspace as well.
But, I guess hooking up null triggers will work too.

-- 
Bob Copeland %% www.bobcopeland.com
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Weak host model vs .interface down
From: Rick Jones @ 2010-06-11 16:32 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev
In-Reply-To: <OF1D06F11A.44DE8C5A-ONC125773F.00437856-C125773F.004425B9@transmode.se>

Joakim Tjernlund wrote:
> Linux uses the weak host model which makes the IP addresses part of the system
> rather than the interface. However consider this:
> 
> System A, eth0 connected to the network
> # > ifconfig eth0 192.168.1.16
> # > ifconfig eth1 192.168.1.17 down
> 
> System B
> # > ping 192.168.1.17
> PING 192.168.1.17 (192.168.1.17) 56(84) bytes of data.
> 64 bytes from 192.168.1.17: icmp_seq=1 ttl=64 time=0.618 ms
> 
> Isn't it a bit much to respond on 192.168.1.17 when its interface is down?

As you said at the beginning, the weak end system model presumes the IP address 
is part of the system.  Seems to me that means unless one removes the IP address 
from the system it is reasonable for the system to continue to respond to that 
IP address.  Regardless of what happens to any individual interface.

Now, I wouldn't expect it to continue to respond to 192.168.1.17 through eth1, 
but if eth0 is indeed connected to the same broadcast domain, given the 
following of the weak end-system model, continuing to respond seems consistent 
with enthusiasticaly following the weak end-system model.

rick jones

^ permalink raw reply

* Fw: [Bug 16183] New: sch_teql no longer works post 2.6.30
From: Stephen Hemminger @ 2010-06-11 16:08 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev



Begin forwarded message:

Date: Fri, 11 Jun 2010 15:56:44 GMT
From: bugzilla-daemon@bugzilla.kernel.org
To: shemminger@linux-foundation.org
Subject: [Bug 16183] New: sch_teql no longer works post 2.6.30


https://bugzilla.kernel.org/show_bug.cgi?id=16183

           Summary: sch_teql no longer works post 2.6.30
           Product: Networking
           Version: 2.5
    Kernel Version: 2.6.31 through 2.6.34
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
        AssignedTo: shemminger@linux-foundation.org
        ReportedBy: tom@compton.nu
        Regression: Yes


Created an attachment (id=26729)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=26729)
Patch to fix sch_teql

The sch_teql module, which can be used to load balance over a set of underlying
interfaces, stopped working after 2.6.30 and has been broken in all kernels
since then.

The problem is that the transmit routine relies on being able to access the
destination address in the skb in order to do ARP resolution once it has
decided which underlying interface it is going to transmit through.

In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by default for
all interfaces, which causes the destination address to be release before the
transmit routine for the interface is called.

The solution (implemented in the attached patch) is to clear that flag for teql
interfaces.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


-- 

^ permalink raw reply

* [PATCH net-next-2.6] ipv4: sysctl to block responding on down interface
From: Stephen Hemminger @ 2010-06-11 15:48 UTC (permalink / raw)
  To: Joakim Tjernlund, David Miller; +Cc: netdev
In-Reply-To: <OF1D06F11A.44DE8C5A-ONC125773F.00437856-C125773F.004425B9@transmode.se>

When Linux is used as a router, it is undesirable for the kernel to process
incoming packets when the address assigned to the interface is down.
The initial problem report was for a management application that used ICMP
to check link availability.

The default is disabled to maintain compatibility with previous behavior.
This is not recommended for server systems because it makes fail over more
difficult, and does not account for configurations where multiple interfaces
have the same IP address.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
 Documentation/networking/ip-sysctl.txt |   10 ++++++++++
 include/linux/inetdevice.h             |    2 ++
 net/ipv4/devinet.c                     |    1 +
 net/ipv4/route.c                       |    7 +++++++
 4 files changed, 20 insertions(+)

--- a/include/linux/inetdevice.h	2010-05-28 08:35:11.000000000 -0700
+++ b/include/linux/inetdevice.h	2010-06-11 08:35:55.237028136 -0700
@@ -37,6 +37,7 @@ enum
 	IPV4_DEVCONF_ACCEPT_LOCAL,
 	IPV4_DEVCONF_SRC_VMARK,
 	IPV4_DEVCONF_PROXY_ARP_PVLAN,
+	IPV4_DEVCONF_LINKFILTER,
 	__IPV4_DEVCONF_MAX
 };
 
@@ -140,6 +141,7 @@ static inline void ipv4_devconf_setall(s
 #define IN_DEV_ARP_ANNOUNCE(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_LINKFILTER(in_dev)	IN_DEV_MAXCONF((in_dev), LINKFILTER)
 
 struct in_ifaddr {
 	struct in_ifaddr	*ifa_next;
--- a/net/ipv4/devinet.c	2010-06-01 08:39:12.000000000 -0700
+++ b/net/ipv4/devinet.c	2010-06-11 08:37:03.921248294 -0700
@@ -1416,6 +1416,7 @@ static struct devinet_sysctl_table {
 		DEVINET_SYSCTL_RW_ENTRY(ARP_ACCEPT, "arp_accept"),
 		DEVINET_SYSCTL_RW_ENTRY(ARP_NOTIFY, "arp_notify"),
 		DEVINET_SYSCTL_RW_ENTRY(PROXY_ARP_PVLAN, "proxy_arp_pvlan"),
+		DEVINET_SYSCTL_RW_ENTRY(LINKFILTER, "link_filter"),
 
 		DEVINET_SYSCTL_FLUSHING_ENTRY(NOXFRM, "disable_xfrm"),
 		DEVINET_SYSCTL_FLUSHING_ENTRY(NOPOLICY, "disable_policy"),
--- a/net/ipv4/route.c	2010-06-11 08:13:13.000000000 -0700
+++ b/net/ipv4/route.c	2010-06-11 08:14:28.486271886 -0700
@@ -2152,6 +2152,13 @@ static int ip_route_input_slow(struct sk
 		goto brd_input;
 
 	if (res.type == RTN_LOCAL) {
+		int linkf = IN_DEV_LINKFILTER(in_dev);
+
+		if (linkf && !netif_running(res.fi->fib_dev))
+			goto no_route;
+		if (linkf > 1 && !netif_carrier_ok(res.fi->fib_dev))
+			goto no_route;
+
 		err = fib_validate_source(saddr, daddr, tos,
 					     net->loopback_dev->ifindex,
 					     dev, &spec_dst, &itag, skb->mark);
--- a/Documentation/networking/ip-sysctl.txt	2010-06-11 08:14:46.889751310 -0700
+++ b/Documentation/networking/ip-sysctl.txt	2010-06-11 08:15:35.508471622 -0700
@@ -832,6 +832,16 @@ rp_filter - INTEGER
 	Default value is 0. Note that some distributions enable it
 	in startup scripts.
 
+link_filter - INTEGER
+        0 - Allow packets to be received for the address on this interface
+	even if interface is disabled or no carrier.
+
+	1 - Ignore packets received if interface associated with the incoming
+	address is down.
+
+	2 - Ignore packets received if interface associated with the incoming
+	address is down or has no carrier.
+
 arp_filter - BOOLEAN
 	1 - Allows you to have multiple network interfaces on the same
 	subnet, and have the ARPs for each interface be answered

^ permalink raw reply

* [PATCH 3/4] pcmcia: dev_node removal bugfix
From: Dominik Brodowski @ 2010-06-11 14:44 UTC (permalink / raw)
  To: linux-pcmcia; +Cc: Dominik Brodowski, netdev, linux-wireless
In-Reply-To: <20100611144359.GA9572@comet.dominikbrodowski.net>

Patch c7c2fa07 removed one line too much from smc91c92_cs.c.

Reported-by: Komuro <komurojun-mbn@nifty.com>
CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
 drivers/net/pcmcia/smc91c92_cs.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/pcmcia/smc91c92_cs.c b/drivers/net/pcmcia/smc91c92_cs.c
index 7b6fe89..64e6a84 100644
--- a/drivers/net/pcmcia/smc91c92_cs.c
+++ b/drivers/net/pcmcia/smc91c92_cs.c
@@ -322,6 +322,7 @@ static int smc91c92_probe(struct pcmcia_device *link)
 	return -ENOMEM;
     smc = netdev_priv(dev);
     smc->p_dev = link;
+    link->priv = dev;
 
     spin_lock_init(&smc->lock);
     link->io.NumPorts1 = 16;
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH v4] netfilter: Xtables: idletimer target implementation
From: Luciano Coelho @ 2010-06-11 14:01 UTC (permalink / raw)
  To: netfilter; +Cc: netdev, Jan Engelhardt, Patrick McHardy, Timo Teras

This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.

Timers are identified by labels and are created when a rule is set with a new
label.  The rules also take a timeout value (in seconds) as an option.  If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.

One entry for each timer is created in sysfs.  This attribute contains the
timer remaining for the timer to expire.  The attributes are located under
the xt_idletimer class:

/sys/class/xt_idletimer/timers/<label>

When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).

Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
---
v2: Fixed according to Jan's comments
v3: Changed to a device class in the virtual bus in sysfs
    Removed unnecessary attribute group
    Fixed missing deallocation in some error cases
v4: Fixed according to Jan's and Patrick's comments to v3
    Changed to mutex locking instead of spin locks
    Save the timer in the target info struct to avoid extra reads
    Other small clean-ups

 include/linux/netfilter/xt_IDLETIMER.h |   45 +++++
 net/netfilter/Kconfig                  |   12 ++
 net/netfilter/Makefile                 |    1 +
 net/netfilter/xt_IDLETIMER.c           |  322 ++++++++++++++++++++++++++++++++
 4 files changed, 380 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_IDLETIMER.h
 create mode 100644 net/netfilter/xt_IDLETIMER.c

diff --git a/include/linux/netfilter/xt_IDLETIMER.h b/include/linux/netfilter/xt_IDLETIMER.h
new file mode 100644
index 0000000..9e95b98
--- /dev/null
+++ b/include/linux/netfilter/xt_IDLETIMER.h
@@ -0,0 +1,45 @@
+/*
+ * linux/include/linux/netfilter/xt_IDLETIMER.h
+ *
+ * Header file for Xtables timer target module.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Converted to x_tables and forward-ported to 2.6.34
+ * by Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#ifndef _XT_IDLETIMER_H
+#define _XT_IDLETIMER_H
+
+#include <linux/types.h>
+
+#define MAX_IDLETIMER_LABEL_SIZE 32
+
+struct idletimer_tg_info {
+	__u32 timeout;
+
+	char label[MAX_IDLETIMER_LABEL_SIZE];
+
+	/* for kernel module internal use only */
+	struct idletimer_tg *timer __attribute((aligned(8)));
+};
+
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8593a77..413ed24 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -424,6 +424,18 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_IDLETIMER
+	tristate  "IDLETIMER target support"
+	depends on NETFILTER_ADVANCED
+	help
+
+	  This option adds the `IDLETIMER' target.  Each matching packet
+	  resets the timer associated with label specified when the rule is
+	  added.  When the timer expires, it triggers a sysfs notification.
+	  The remaining time for expiration can be read via sysfs.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_TARGET_LED
 	tristate '"LED" target support'
 	depends on LEDS_CLASS && LEDS_TRIGGERS
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 14e3a8f..e28420a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
 
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
new file mode 100644
index 0000000..540020a
--- /dev/null
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -0,0 +1,322 @@
+/*
+ * linux/net/netfilter/xt_IDLETIMER.c
+ *
+ * Netfilter module to trigger a timer when packet matches.
+ * After timer expires a kevent will be sent.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Converted to x_tables and reworked for upstream inclusion
+ * by Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/timer.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_IDLETIMER.h>
+#include <linux/kobject.h>
+#include <linux/workqueue.h>
+#include <linux/sysfs.h>
+
+struct idletimer_tg_attr {
+	struct attribute attr;
+	ssize_t	(*show)(struct kobject *kobj,
+			struct attribute *attr, char *buf);
+};
+
+struct idletimer_tg {
+	struct list_head entry;
+	struct timer_list timer;
+	struct work_struct work;
+
+	struct kobject *kobj;
+	struct idletimer_tg_attr attr;
+
+	unsigned int refcnt;
+};
+
+static LIST_HEAD(idletimer_tg_list);
+static DEFINE_MUTEX(list_mutex);
+
+static struct kobject *idletimer_tg_kobj;
+
+static
+struct idletimer_tg *__idletimer_tg_find_by_label(const char *label)
+{
+	struct idletimer_tg *entry;
+
+	BUG_ON(!label);
+
+	list_for_each_entry(entry, &idletimer_tg_list, entry) {
+		if (!strcmp(label, entry->attr.attr.name))
+			return entry;
+	}
+
+	return NULL;
+}
+
+static ssize_t idletimer_tg_show(struct kobject *kobj, struct attribute *attr,
+				 char *buf)
+{
+	struct idletimer_tg *timer;
+	unsigned long expires = 0;
+
+	mutex_lock(&list_mutex);
+
+	timer =	__idletimer_tg_find_by_label(attr->name);
+	if (timer)
+		expires = timer->timer.expires;
+
+	mutex_unlock(&list_mutex);
+
+	if (time_after(expires, jiffies))
+		return sprintf(buf, "%u\n",
+			       jiffies_to_msecs(expires - jiffies) / 1000);
+
+	return sprintf(buf, "0\n");
+}
+
+static void idletimer_tg_work(struct work_struct *work)
+{
+	struct idletimer_tg *timer = container_of(work, struct idletimer_tg,
+						  work);
+
+	sysfs_notify(idletimer_tg_kobj, NULL, timer->attr.attr.name);
+}
+
+static void idletimer_tg_expired(unsigned long data)
+{
+	struct idletimer_tg *timer = (struct idletimer_tg *) data;
+
+	pr_debug("timer %s expired\n", timer->attr.attr.name);
+
+	schedule_work(&timer->work);
+}
+
+static int idletimer_tg_create(struct idletimer_tg_info *info)
+{
+	int ret;
+
+	info->timer = kmalloc(sizeof(*info->timer), GFP_ATOMIC);
+	if (!info->timer) {
+		pr_debug("couldn't alloc timer\n");
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	info->timer->attr.attr.name = kstrdup(info->label, GFP_ATOMIC);
+	if (!info->timer->attr.attr.name) {
+		pr_debug("couldn't alloc attribute name\n");
+		ret = -ENOMEM;
+		goto out_free_timer;
+	}
+	info->timer->attr.attr.mode = S_IRUGO;
+	info->timer->attr.show = idletimer_tg_show;
+
+	ret = sysfs_create_file(idletimer_tg_kobj, &info->timer->attr.attr);
+	if (ret < 0) {
+		pr_debug("couldn't add file to sysfs");
+		goto out_free_attr;
+	}
+
+	list_add(&info->timer->entry, &idletimer_tg_list);
+
+	setup_timer(&info->timer->timer, idletimer_tg_expired,
+		    (unsigned long) info->timer);
+	info->timer->refcnt = 1;
+
+	mod_timer(&info->timer->timer,
+		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+
+	INIT_WORK(&info->timer->work, idletimer_tg_work);
+
+	return 0;
+
+out_free_attr:
+	kfree(info->timer->attr.attr.name);
+out_free_timer:
+	kfree(info->timer);
+out:
+	return ret;
+}
+
+/*
+ * The actual xt_tables plugin.
+ */
+static unsigned int idletimer_tg_target(struct sk_buff *skb,
+					 const struct xt_action_param *par)
+{
+	const struct idletimer_tg_info *info = par->targinfo;
+
+	pr_debug("resetting timer %s, timeout period %u\n",
+		 info->label, info->timeout);
+
+	mutex_lock(&list_mutex);
+
+	BUG_ON(!info->timer);
+
+	mod_timer(&info->timer->timer,
+		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+
+	mutex_unlock(&list_mutex);
+
+	return XT_CONTINUE;
+}
+
+static int idletimer_tg_checkentry(struct xt_tgchk_param *par)
+{
+	struct idletimer_tg_info *info = par->targinfo;
+	int ret;
+
+	pr_debug("checkentry targinfo%s\n", info->label);
+
+	if (info->timeout == 0) {
+		pr_debug("timeout value is zero\n");
+		return -EINVAL;
+	}
+
+	if (info->label[0] == '\0' ||
+	    strnlen(info->label,
+		    MAX_IDLETIMER_LABEL_SIZE) == MAX_IDLETIMER_LABEL_SIZE) {
+		pr_debug("label is empty or not nul-terminated\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&list_mutex);
+
+	info->timer = __idletimer_tg_find_by_label(info->label);
+	if (info->timer) {
+		info->timer->refcnt++;
+		mod_timer(&info->timer->timer,
+			  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+
+		pr_debug("increased refcnt of timer %s to %u\n",
+			 info->label, info->timer->refcnt);
+	} else {
+		ret = idletimer_tg_create(info);
+		if (ret < 0) {
+			pr_debug("failed to create timer\n");
+			mutex_unlock(&list_mutex);
+			return ret;
+		}
+	}
+
+	mutex_unlock(&list_mutex);
+	return 0;
+}
+
+static void idletimer_tg_destroy(const struct xt_tgdtor_param *par)
+{
+	const struct idletimer_tg_info *info = par->targinfo;
+
+	pr_debug("destroy targinfo %s\n", info->label);
+
+	mutex_lock(&list_mutex);
+	if (!info->timer) {
+		mutex_unlock(&list_mutex);
+		return;
+	}
+
+	if (--info->timer->refcnt == 0) {
+		pr_debug("deleting timer %s\n", info->label);
+
+		list_del(&info->timer->entry);
+		del_timer_sync(&info->timer->timer);
+		sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr);
+		kfree(info->timer->attr.attr.name);
+		kfree(info->timer);
+	} else {
+		pr_debug("decreased refcnt of timer %s to %u\n",
+			 info->label, info->timer->refcnt);
+	}
+
+	mutex_unlock(&list_mutex);
+}
+
+static struct xt_target idletimer_tg __read_mostly = {
+	.name		= "IDLETIMER",
+	.family		= NFPROTO_UNSPEC,
+	.target		= idletimer_tg_target,
+	.targetsize     = sizeof(struct idletimer_tg_info),
+	.checkentry	= idletimer_tg_checkentry,
+	.destroy        = idletimer_tg_destroy,
+	.me		= THIS_MODULE,
+};
+
+static struct class *idletimer_tg_class;
+
+static struct device *idletimer_tg_device;
+
+static int __init idletimer_tg_init(void)
+{
+	int err;
+
+	idletimer_tg_class = class_create(THIS_MODULE, "xt_idletimer");
+	err = PTR_ERR(idletimer_tg_class);
+	if (IS_ERR(idletimer_tg_class)) {
+		pr_debug("couldn't register device class\n");
+		goto out;
+	}
+
+	idletimer_tg_device = device_create(idletimer_tg_class, NULL,
+					    MKDEV(0, 0), NULL, "timers");
+	err = PTR_ERR(idletimer_tg_device);
+	if (IS_ERR(idletimer_tg_device)) {
+		pr_debug("couldn't register system device\n");
+		goto out_class;
+	}
+
+	idletimer_tg_kobj = &idletimer_tg_device->kobj;
+
+	err =  xt_register_target(&idletimer_tg);
+	if (err < 0) {
+		pr_debug("couldn't register xt target\n");
+		goto out_dev;
+	}
+
+	return 0;
+out_dev:
+	device_destroy(idletimer_tg_class, MKDEV(0, 0));
+out_class:
+	class_destroy(idletimer_tg_class);
+out:
+	return err;
+}
+
+static void __exit idletimer_tg_exit(void)
+{
+	xt_unregister_target(&idletimer_tg);
+
+	device_destroy(idletimer_tg_class, MKDEV(0, 0));
+	class_destroy(idletimer_tg_class);
+}
+
+module_init(idletimer_tg_init);
+module_exit(idletimer_tg_exit);
+
+MODULE_AUTHOR("Timo Teras <ext-timo.teras@nokia.com>");
+MODULE_AUTHOR("Luciano Coelho <luciano.coelho@nokia.com>");
+MODULE_DESCRIPTION("Xtables: idle time monitor");
+MODULE_LICENSE("GPL v2");
-- 
1.6.3.3


^ permalink raw reply related

* Re: no reassembly for outgoing packets on RAW socket
From: Jiri Olsa @ 2010-06-11 13:10 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Patrick McHardy, netdev, Netfilter Developer Mailing List
In-Reply-To: <alpine.LSU.2.01.1006111152230.30433@obet.zrqbmnf.qr>

On Fri, Jun 11, 2010 at 11:53:32AM +0200, Jan Engelhardt wrote:
> 
> On Friday 2010-06-11 10:16, Jiri Olsa wrote:
> >
> >I prepared the patch implementing IP_NODEFRAG option for IPv4 socket.
> >
> >Also I just got an idea, that there could be no reassembly if there are
> >no rules for connection tracing set.. not sure how can I check that best
> >so far.. any idea?
> >
> >@@ -572,6 +572,14 @@ static int do_ip_setsockopt(struct sock *sk, int level,
> > 		}
> > 		inet->hdrincl = val ? 1 : 0;
> > 		break;
> >+	case IP_NODEFRAG:
> >+		if (sk->sk_type != SOCK_RAW) {
> >+			err = -ENOPROTOOPT;
> >+			break;
> >+		}
> >+		inet->nodefrag = val ? 1 : 0;
> >+		printk("IP_NODEFRAG %p -> %d\n", inet, inet->nodefrag);
> >+		break;
> 
> You want to get rid of this printk otherwise it spews the logs.

oops, I forgot to remove this one... thanks

new patch is attached

wbr,
jirka


---
diff --git a/include/linux/in.h b/include/linux/in.h
index 583c76f..41d88a4 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -85,6 +85,7 @@ struct in_addr {
 #define IP_RECVORIGDSTADDR   IP_ORIGDSTADDR
 
 #define IP_MINTTL       21
+#define IP_NODEFRAG     22
 
 /* IP_MTU_DISCOVER values */
 #define IP_PMTUDISC_DONT		0	/* Never send DF frames */
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 1653de5..1989cfd 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -137,7 +137,8 @@ struct inet_sock {
 				hdrincl:1,
 				mc_loop:1,
 				transparent:1,
-				mc_all:1;
+				mc_all:1,
+				nodefrag:1;
 	int			mc_index;
 	__be32			mc_addr;
 	struct ip_mc_socklist	*mc_list;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 551ce56..84d2c8e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -355,6 +355,8 @@ lookup_protocol:
 	inet = inet_sk(sk);
 	inet->is_icsk = (INET_PROTOSW_ICSK & answer_flags) != 0;
 
+	inet->nodefrag = 0;
+
 	if (SOCK_RAW == sock->type) {
 		inet->inet_num = protocol;
 		if (IPPROTO_RAW == protocol)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ce23178..d8196e1 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -449,7 +449,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 			     (1<<IP_MTU_DISCOVER) | (1<<IP_RECVERR) |
 			     (1<<IP_ROUTER_ALERT) | (1<<IP_FREEBIND) |
 			     (1<<IP_PASSSEC) | (1<<IP_TRANSPARENT) |
-			     (1<<IP_MINTTL))) ||
+			     (1<<IP_MINTTL) | (1<<IP_NODEFRAG))) ||
 	    optname == IP_MULTICAST_TTL ||
 	    optname == IP_MULTICAST_ALL ||
 	    optname == IP_MULTICAST_LOOP ||
@@ -572,6 +572,13 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 		}
 		inet->hdrincl = val ? 1 : 0;
 		break;
+	case IP_NODEFRAG:
+		if (sk->sk_type != SOCK_RAW) {
+			err = -ENOPROTOOPT;
+			break;
+		}
+		inet->nodefrag = val ? 1 : 0;
+		break;
 	case IP_MTU_DISCOVER:
 		if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_PROBE)
 			goto e_inval;
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index cb763ae..eab8de3 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -66,6 +66,11 @@ static unsigned int ipv4_conntrack_defrag(unsigned int hooknum,
 					  const struct net_device *out,
 					  int (*okfn)(struct sk_buff *))
 {
+	struct inet_sock *inet = inet_sk(skb->sk);
+
+	if (inet && inet->nodefrag)
+		return NF_ACCEPT;
+
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 #if !defined(CONFIG_NF_NAT) && !defined(CONFIG_NF_NAT_MODULE)
 	/* Previously seen (loopback)?  Ignore.  Do this before

^ permalink raw reply related

* [PATCH 7/7] ethoc: use devres resource management
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

The point of using the devres resource management routines is that they
simplify the driver by taking care of releasing resources on failure and
release.  A recent commit added a bunch of error handling that is unnecessary
in this context.

This patch removes this redundant error handling, as well as using
dmam_alloc_coherent in place of dma_alloc_coherent in order to use this
framework consistenly throughout the driver.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |   28 +---------------------------
 1 files changed, 1 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 1681f08..37ce8ac 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -964,7 +964,7 @@ static int ethoc_probe(struct platform_device *pdev)
 		}
 	} else {
 		/* Allocate buffer memory */
-		priv->membase = dma_alloc_coherent(NULL,
+		priv->membase = dmam_alloc_coherent(&pdev->dev,
 			buffer_size, (void *)&netdev->mem_start,
 			GFP_KERNEL);
 		if (!priv->membase) {
@@ -1074,21 +1074,6 @@ free_mdio:
 	kfree(priv->mdio->irq);
 	mdiobus_free(priv->mdio);
 free:
-	if (priv) {
-		if (priv->dma_alloc)
-			dma_free_coherent(NULL, priv->dma_alloc, priv->membase,
-					  netdev->mem_start);
-		else if (priv->membase)
-			devm_iounmap(&pdev->dev, priv->membase);
-		if (priv->iobase)
-			devm_iounmap(&pdev->dev, priv->iobase);
-	}
-	if (mem)
-		devm_release_mem_region(&pdev->dev, mem->start,
-					mem->end - mem->start + 1);
-	if (mmio)
-		devm_release_mem_region(&pdev->dev, mmio->start,
-					mmio->end - mmio->start + 1);
 	free_netdev(netdev);
 out:
 	return ret;
@@ -1115,17 +1100,6 @@ static int ethoc_remove(struct platform_device *pdev)
 			kfree(priv->mdio->irq);
 			mdiobus_free(priv->mdio);
 		}
-		if (priv->dma_alloc)
-			dma_free_coherent(NULL, priv->dma_alloc, priv->membase,
-				netdev->mem_start);
-		else {
-			devm_iounmap(&pdev->dev, priv->membase);
-			devm_release_mem_region(&pdev->dev, netdev->mem_start,
-				netdev->mem_end - netdev->mem_start + 1);
-		}
-		devm_iounmap(&pdev->dev, priv->iobase);
-		devm_release_mem_region(&pdev->dev, netdev->base_addr,
-			priv->io_region_size);
 		unregister_netdev(netdev);
 		free_netdev(netdev);
 	}
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 6/7] ethoc: Clear command buffer after write
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

This matches what ethoc_mdio_read does and makes the functions
symmetric.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index e5c2f5b..1681f08 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -613,8 +613,11 @@ static int ethoc_mdio_write(struct mii_bus *bus, int phy, int reg, u16 val)
 
 	while (time_before(jiffies, timeout)) {
 		u32 stat = ethoc_read(priv, MIISTATUS);
-		if (!(stat & MIISTATUS_BUSY))
+		if (!(stat & MIISTATUS_BUSY)) {
+			/* reset MII command register */
+			ethoc_write(priv, MIICOMMAND, 0);
 			return 0;
+		}
 
 		schedule();
 	}
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 2/7] ethoc: Write bus addresses to registers
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

The ethoc driver should be writing bus addresses to the ethoc registers, not
virtual addresses.  This patch adds an array to store the virtual addresses
in and references that array when manipulating the contents of the buffer
descriptors.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |   27 ++++++++++++++++++++++-----
 1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 68093cf..5904ad2 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -180,6 +180,7 @@ MODULE_PARM_DESC(buffer_size, "DMA buffer allocation size");
  * @dty_tx:	last buffer actually sent
  * @num_rx:	number of receive buffers
  * @cur_rx:	current receive buffer
+ * @vma:        pointer to array of virtual memory addresses for buffers
  * @netdev:	pointer to network device structure
  * @napi:	NAPI structure
  * @stats:	network device statistics
@@ -203,6 +204,8 @@ struct ethoc {
 	unsigned int num_rx;
 	unsigned int cur_rx;
 
+	void** vma;
+
 	struct net_device *netdev;
 	struct napi_struct napi;
 	struct net_device_stats stats;
@@ -285,18 +288,20 @@ static inline void ethoc_disable_rx_and_tx(struct ethoc *dev)
 	ethoc_write(dev, MODER, mode);
 }
 
-static int ethoc_init_ring(struct ethoc *dev)
+static int ethoc_init_ring(struct ethoc *dev, void* mem_start)
 {
 	struct ethoc_bd bd;
 	int i;
+	void* vma;
 
 	dev->cur_tx = 0;
 	dev->dty_tx = 0;
 	dev->cur_rx = 0;
 
 	/* setup transmission buffers */
-	bd.addr = virt_to_phys(dev->membase);
+	bd.addr = mem_start;
 	bd.stat = TX_BD_IRQ | TX_BD_CRC;
+	vma = dev->membase;
 
 	for (i = 0; i < dev->num_tx; i++) {
 		if (i == dev->num_tx - 1)
@@ -304,6 +309,9 @@ static int ethoc_init_ring(struct ethoc *dev)
 
 		ethoc_write_bd(dev, i, &bd);
 		bd.addr += ETHOC_BUFSIZ;
+
+		dev->vma[i] = vma;
+		vma += ETHOC_BUFSIZ;
 	}
 
 	bd.stat = RX_BD_EMPTY | RX_BD_IRQ;
@@ -314,6 +322,9 @@ static int ethoc_init_ring(struct ethoc *dev)
 
 		ethoc_write_bd(dev, dev->num_tx + i, &bd);
 		bd.addr += ETHOC_BUFSIZ;
+
+		dev->vma[dev->num_tx + i] = vma;
+		vma += ETHOC_BUFSIZ;
 	}
 
 	return 0;
@@ -415,7 +426,7 @@ static int ethoc_rx(struct net_device *dev, int limit)
 			skb = netdev_alloc_skb_ip_align(dev, size);
 
 			if (likely(skb)) {
-				void *src = phys_to_virt(bd.addr);
+				void *src = priv->vma[entry];
 				memcpy_fromio(skb_put(skb, size), src, size);
 				skb->protocol = eth_type_trans(skb, dev);
 				priv->stats.rx_packets++;
@@ -667,7 +678,7 @@ static int ethoc_open(struct net_device *dev)
 
 	ethoc_write(priv, TX_BD_NUM, priv->num_tx);
 
-	ethoc_init_ring(priv);
+	ethoc_init_ring(priv, (void*)dev->mem_start);
 	ethoc_reset(priv);
 
 	if (netif_queue_stopped(dev)) {
@@ -831,7 +842,7 @@ static netdev_tx_t ethoc_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	else
 		bd.stat &= ~TX_BD_PAD;
 
-	dest = phys_to_virt(bd.addr);
+	dest = priv->vma[entry];
 	memcpy_toio(dest, skb->data, skb->len);
 
 	bd.stat &= ~(TX_BD_STATS | TX_BD_LEN_MASK);
@@ -978,6 +989,12 @@ static int ethoc_probe(struct platform_device *pdev)
 	priv->num_tx = max(2, num_bd / 4);
 	priv->num_rx = num_bd - priv->num_tx;
 
+	priv->vma = devm_kzalloc(&pdev->dev, num_bd*sizeof(void*), GFP_KERNEL);
+	if (!priv->vma) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
 	/* Allow the platform setup code to pass in a MAC address. */
 	if (pdev->dev.platform_data) {
 		struct ethoc_platform_data *pdata =
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 3/7] ethoc: write number of TX buffers in init_ring
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

This moves the write of the TX_BD_NUM to init_ring together with the
rest of the code setting up the transmission buffers.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 5904ad2..afeb993 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -298,6 +298,8 @@ static int ethoc_init_ring(struct ethoc *dev, void* mem_start)
 	dev->dty_tx = 0;
 	dev->cur_rx = 0;
 
+	ethoc_write(dev, TX_BD_NUM, dev->num_tx);
+
 	/* setup transmission buffers */
 	bd.addr = mem_start;
 	bd.stat = TX_BD_IRQ | TX_BD_CRC;
@@ -676,8 +678,6 @@ static int ethoc_open(struct net_device *dev)
 	if (ret)
 		return ret;
 
-	ethoc_write(priv, TX_BD_NUM, priv->num_tx);
-
 	ethoc_init_ring(priv, (void*)dev->mem_start);
 	ethoc_reset(priv);
 
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 1/7] ethoc: calculate number of buffers in ethoc_probe
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn

This moves the calculation of the number of transmission buffers to
ethoc_probe where it more logically fits with the rest of the memory
allocation code.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 6ed2df1..68093cf 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -658,8 +658,6 @@ static int ethoc_mdio_probe(struct net_device *dev)
 static int ethoc_open(struct net_device *dev)
 {
 	struct ethoc *priv = netdev_priv(dev);
-	unsigned int min_tx = 2;
-	unsigned int num_bd;
 	int ret;
 
 	ret = request_irq(dev->irq, ethoc_interrupt, IRQF_SHARED,
@@ -667,11 +665,6 @@ static int ethoc_open(struct net_device *dev)
 	if (ret)
 		return ret;
 
-	/* calculate the number of TX/RX buffers, maximum 128 supported */
-	num_bd = min_t(unsigned int,
-		128, (dev->mem_end - dev->mem_start + 1) / ETHOC_BUFSIZ);
-	priv->num_tx = max(min_tx, num_bd / 4);
-	priv->num_rx = num_bd - priv->num_tx;
 	ethoc_write(priv, TX_BD_NUM, priv->num_tx);
 
 	ethoc_init_ring(priv);
@@ -884,6 +877,7 @@ static int ethoc_probe(struct platform_device *pdev)
 	struct resource *mem = NULL;
 	struct ethoc *priv = NULL;
 	unsigned int phy;
+	int num_bd;
 	int ret = 0;
 
 	/* allocate networking device */
@@ -978,6 +972,12 @@ static int ethoc_probe(struct platform_device *pdev)
 		priv->dma_alloc = buffer_size;
 	}
 
+	/* calculate the number of TX/RX buffers, maximum 128 supported */
+	num_bd = min_t(unsigned int,
+		128, (netdev->mem_end - netdev->mem_start + 1) / ETHOC_BUFSIZ);
+	priv->num_tx = max(2, num_bd / 4);
+	priv->num_rx = num_bd - priv->num_tx;
+
 	/* Allow the platform setup code to pass in a MAC address. */
 	if (pdev->dev.platform_data) {
 		struct ethoc_platform_data *pdata =
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 4/7] ethoc: Clean up PHY probing
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>

- No need to iterate over all possible addresses on bus
- Use helper function phy_find_first
- Use phy_connect_direct as we already have the relevant structure

Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |   24 ++++++++----------------
 1 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index afeb993..1ee9947 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -635,21 +635,13 @@ static int ethoc_mdio_probe(struct net_device *dev)
 {
 	struct ethoc *priv = netdev_priv(dev);
 	struct phy_device *phy;
+	int err;
 	int i;
 
-	for (i = 0; i < PHY_MAX_ADDR; i++) {
-		phy = priv->mdio->phy_map[i];
-		if (phy) {
-			if (priv->phy_id != -1) {
-				/* attach to specified PHY */
-				if (priv->phy_id == phy->addr)
-					break;
-			} else {
-				/* autoselect PHY if none was specified */
-				if (phy->addr != 0)
-					break;
-			}
-		}
+	if (priv->phy_id != -1) {
+		phy = priv->mdio->phy_map[priv->phy_id];
+	} else {
+		phy = phy_find_first(priv->mdio);
 	}
 
 	if (!phy) {
@@ -657,11 +649,11 @@ static int ethoc_mdio_probe(struct net_device *dev)
 		return -ENXIO;
 	}
 
-	phy = phy_connect(dev, dev_name(&phy->dev), ethoc_mdio_poll, 0,
+	err = phy_connect_direct(dev, phy, ethoc_mdio_poll, 0,
 			PHY_INTERFACE_MODE_GMII);
-	if (IS_ERR(phy)) {
+	if (err) {
 		dev_err(&dev->dev, "could not attach to PHY\n");
-		return PTR_ERR(phy);
+		return err;
 	}
 
 	priv->phy = phy;
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 5/7] Remove unused variable
From: Jonas Bonn @ 2010-06-11 12:47 UTC (permalink / raw)
  To: netdev; +Cc: Jonas Bonn
In-Reply-To: <1276260460-14531-1-git-send-email-jonas@southpole.se>


Signed-off-by: Jonas Bonn <jonas@southpole.se>
---
 drivers/net/ethoc.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethoc.c b/drivers/net/ethoc.c
index 1ee9947..e5c2f5b 100644
--- a/drivers/net/ethoc.c
+++ b/drivers/net/ethoc.c
@@ -636,7 +636,6 @@ static int ethoc_mdio_probe(struct net_device *dev)
 	struct ethoc *priv = netdev_priv(dev);
 	struct phy_device *phy;
 	int err;
-	int i;
 
 	if (priv->phy_id != -1) {
 		phy = priv->mdio->phy_map[priv->phy_id];
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] gianfar: Fix setup of RX time stamping
From: Manfred Rudigier @ 2010-06-11 11:49 UTC (permalink / raw)
  To: 'David Miller'
  Cc: 'Anton Vorontsov', Richard Cochran,
	'netdev@vger.kernel.org',
	'linuxppc-dev@ozlabs.org'

Previously the RCTRL_TS_ENABLE bit was set unconditionally. However, if
the RCTRL_TS_ENABLE is set without TMR_CTRL[TE], the driver does not work
properly on some boards (Anton had problems with the MPC8313ERDB and
MPC8568EMDS).

With this patch the bit will only be set if requested from user space
with the SIOCSHWTSTAMP ioctl command, meaning that time stamping is
disabled during normal operation. Users who are not interested in time
stamps will not experience problems with buggy CPU revisions or
performance drops any more.

The setting of TMR_CTRL[TE] is still up to the user. This is considered
safe because users wanting HW timestamps must initialize the eTSEC clock
first anyway, e.g. with the recently submitted PTP clock driver.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
---
 drivers/net/gianfar.c |   21 +++++++++++++++++----
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 46c69cd..227b628 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -381,10 +381,14 @@ static void gfar_init_mac(struct net_device *ndev)
 	/* Insert receive time stamps into padding alignment bytes */
 	if (priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER) {
 		rctrl &= ~RCTRL_PAL_MASK;
-		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE | RCTRL_PADDING(8);
+		rctrl |= RCTRL_PADDING(8);
 		priv->padding = 8;
 	}
 
+	/* Enable HW time stamping if requested from user space */
+	if (priv->hwts_rx_en)
+		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
+
 	/* keep vlan related bits if it's enabled */
 	if (priv->vlgrp) {
 		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
@@ -747,7 +751,8 @@ static int gfar_of_init(struct of_device *ofdev, struct net_device **pdev)
 			FSL_GIANFAR_DEV_HAS_CSUM |
 			FSL_GIANFAR_DEV_HAS_VLAN |
 			FSL_GIANFAR_DEV_HAS_MAGIC_PACKET |
-			FSL_GIANFAR_DEV_HAS_EXTENDED_HASH;
+			FSL_GIANFAR_DEV_HAS_EXTENDED_HASH |
+			FSL_GIANFAR_DEV_HAS_TIMER;
 
 	ctype = of_get_property(np, "phy-connection-type", NULL);
 
@@ -805,12 +810,20 @@ static int gfar_hwtstamp_ioctl(struct net_device *netdev,
 
 	switch (config.rx_filter) {
 	case HWTSTAMP_FILTER_NONE:
-		priv->hwts_rx_en = 0;
+		if (priv->hwts_rx_en) {
+			stop_gfar(netdev);
+			priv->hwts_rx_en = 0;
+			startup_gfar(netdev);
+		}
 		break;
 	default:
 		if (!(priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER))
 			return -ERANGE;
-		priv->hwts_rx_en = 1;
+		if (!priv->hwts_rx_en) {
+			stop_gfar(netdev);
+			priv->hwts_rx_en = 1;
+			startup_gfar(netdev);
+		}
 		config.rx_filter = HWTSTAMP_FILTER_ALL;
 		break;
 	}
-- 
1.6.3.3

^ permalink raw reply related

* Weak host model vs .interface down
From: Joakim Tjernlund @ 2010-06-11 12:24 UTC (permalink / raw)
  To: netdev


Linux uses the weak host model which makes the IP addresses part of the system
rather than the interface. However consider this:

System A, eth0 connected to the network
# > ifconfig eth0 192.168.1.16
# > ifconfig eth1 192.168.1.17 down

System B
# > ping 192.168.1.17
PING 192.168.1.17 (192.168.1.17) 56(84) bytes of data.
64 bytes from 192.168.1.17: icmp_seq=1 ttl=64 time=0.618 ms

Isn't it a bit much to respond on 192.168.1.17 when its interface is down?
I even tried to set rp_filter=1 for all interfaces and that didn't help
either(not that I should need to)

   Jocke


^ permalink raw reply

* Re: [PATCH] gianfar: Fix setup of RX time stamping
From: Anton Vorontsov @ 2010-06-11 12:20 UTC (permalink / raw)
  To: Manfred Rudigier
  Cc: 'David Miller', Richard Cochran,
	'netdev@vger.kernel.org',
	'linuxppc-dev@ozlabs.org'
In-Reply-To: <95DC1AA8EC908B48939B72CF375AA5E3F03D8C46@alice.at.omicron.at>

On Fri, Jun 11, 2010 at 01:49:05PM +0200, Manfred Rudigier wrote:
> Previously the RCTRL_TS_ENABLE bit was set unconditionally. However, if
> the RCTRL_TS_ENABLE is set without TMR_CTRL[TE], the driver does not work
> properly on some boards (Anton had problems with the MPC8313ERDB and
> MPC8568EMDS).
> 
> With this patch the bit will only be set if requested from user space
> with the SIOCSHWTSTAMP ioctl command, meaning that time stamping is
> disabled during normal operation. Users who are not interested in time
> stamps will not experience problems with buggy CPU revisions or
> performance drops any more.
> 
> The setting of TMR_CTRL[TE] is still up to the user. This is considered
> safe because users wanting HW timestamps must initialize the eTSEC clock
> first anyway, e.g. with the recently submitted PTP clock driver.
> 
> Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
> ---

Looks OK. I tested that it doesn't break anything, but I didn't
test the timestamping functionality. So

Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>

Thanks,

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply

* Re: Packet capture and Bonding asymmetries
From: Paul LeoNerd Evans @ 2010-06-11 12:18 UTC (permalink / raw)
  To: Jay Vosburgh, netdev
In-Reply-To: <17501.1276123951@death.nxdomain.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]

On Wed, Jun 09, 2010 at 03:52:31PM -0700, Jay Vosburgh wrote:
> 	For your own private testing, you could add a call to
> __netif_nit_deliver in netif_receive_skb prior to this part:
> 
>         master = ACCESS_ONCE(orig_dev->master);
>         if (master) {
>                 if (skb_bond_should_drop(skb, master))
>                         null_or_orig = orig_dev; /* deliver only exact match */
>                 else
>                         skb->dev = master;
>         }
> 
> 	This will give you multiple captures of the same packet, as is
> seen for transmit (i.e., one on the slave, one on the bond).  For
> non-bonding devices, tcpdump will see each packet twice on the same
> device, so it's not really suitable for general use.

As per my last post, I've just tested the following patch and found it
to work just fine:

# pktdump -f "icmp" -n
[13:04:30] RX(eth0): ICMP| 192.168.56.1->192.168.56.6 echo-request seq=1
[13:04:30] RX(bond0): ICMP| 192.168.56.1->192.168.56.6 echo-request seq=1
[13:04:30] TX(bond0): ICMP| 192.168.56.6->192.168.56.1 echo-reply seq=1
[13:04:30] TX(eth0): ICMP| 192.168.56.6->192.168.56.1 echo-reply seq=1

I'll resubmit the patch properly for latest kernel version; this being
2.6.31.12 doesn't apply cleanly to upstream:

-----

--- linux-2.6.31.12-router/net/core/dev.c       2010-01-18 18:30:45.000000000 +0000
+++ linux-2.6.31.12-router_leobonding/net/core/dev.c    2010-06-11 12:39:43.000000000 +0100
@@ -2265,6 +2265,7 @@
        null_or_orig = NULL;
        orig_dev = skb->dev;
        if (orig_dev->master) {
+               netif_nit_deliver(skb);
                if (skb_bond_should_drop(skb))
                        null_or_orig = orig_dev; /* deliver only exact match */
                else

-----

This patch quite deliberately includes packets arriving from non-active
bonding slaves, because the intention of tcpdump, pktdump, et.al., is to
see "close to the wire"; a view of what's happening down that physical
ethernet cable.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* [PATCH] Phytec PCM027: register CAN resources
From: Marc Kleine-Budde @ 2010-06-11 11:46 UTC (permalink / raw)
  To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Juergen Beisert, Marc Kleine-Budde,
	Wolfgang Grandegger

From: Juergen Beisert <jbe-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

applies to current net-next-2.6
(which includes Wolfgang's patche "can: sja1000 platform data fixes")

cheers, Marc


>From 66af2a1778a468610e25403336cc27650fddef2a Mon Sep 17 00:00:00 2001
From: Juergen Beisert <jbe-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Date: Fri, 11 Jun 2010 13:27:03 +0200
Subject: [PATCH] Phytec PCM027: register CAN resources

This patch adds resources for the SJA1000 platform device.

Signed-off-by: Juergen Beisert <jbe-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 arch/arm/mach-pxa/include/mach/pcm027.h |    2 +-
 arch/arm/mach-pxa/pcm027.c              |   40 ++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-pxa/include/mach/pcm027.h b/arch/arm/mach-pxa/include/mach/pcm027.h
index 0408326..b65fcc4 100644
--- a/arch/arm/mach-pxa/include/mach/pcm027.h
+++ b/arch/arm/mach-pxa/include/mach/pcm027.h
@@ -49,7 +49,7 @@
 /* CAN controller SJA1000 (unsupported yet) */
 #define PCM027_CAN_IRQ_GPIO	114
 #define PCM027_CAN_IRQ		IRQ_GPIO(PCM027_CAN_IRQ_GPIO)
-#define PCM027_CAN_IRQ_EDGE	IRQ_TYPE_EDGE_FALLING
+#define PCM027_CAN_IRQ_EDGE	IORESOURCE_IRQ_LOWEDGE
 #define PCM027_CAN_PHYS		0x22000000
 #define PCM027_CAN_SIZE		0x100
 
diff --git a/arch/arm/mach-pxa/pcm027.c b/arch/arm/mach-pxa/pcm027.c
index 2190af0..2ef7686 100644
--- a/arch/arm/mach-pxa/pcm027.c
+++ b/arch/arm/mach-pxa/pcm027.c
@@ -26,6 +26,7 @@
 #include <linux/spi/spi.h>
 #include <linux/spi/max7301.h>
 #include <linux/leds.h>
+#include <linux/can/platform/sja1000.h>
 
 #include <asm/mach-types.h>
 #include <asm/mach/arch.h>
@@ -204,13 +205,50 @@ static struct platform_device pcm027_led_dev = {
 #endif /* CONFIG_LEDS_GPIO */
 
 /*
+ * SJA1000 CAN controller
+ */
+#if defined(CONFIG_CAN_SJA1000_PLATFORM) || defined(CONFIG_CAN_SJA1000_PLATFORM_MODULE)
+static struct resource pcm027_sja1000_resources[] = {
+	[0] = {
+		.start	= PCM027_CAN_PHYS,
+		.end	= PCM027_CAN_PHYS + PCM027_CAN_SIZE - 1,
+		.flags	= IORESOURCE_MEM,
+	},
+	[1] = {
+		.start	= PCM027_CAN_IRQ,
+		.end	= PCM027_CAN_IRQ,
+		.flags	= IORESOURCE_IRQ | PCM027_CAN_IRQ_EDGE,
+	},
+};
+
+static struct sja1000_platform_data pcm027_sja1000_platform_data = {
+	.osc_freq	= 16000000,
+	.ocr		= OCR_TX1_PULLDOWN | OCR_TX0_PUSHPULL,
+	.cdr		= CDR_CBP,
+};
+
+static struct platform_device pcm027_sja1000_device = {
+	.name		= "sja1000_platform",
+	.dev = {
+		.platform_data = &pcm027_sja1000_platform_data,
+	},
+	.num_resources	= ARRAY_SIZE(pcm027_sja1000_resources),
+	.resource	= pcm027_sja1000_resources,
+};
+#endif /* CONFIG_CAN_SJA1000_PLATFORM || CONFIG_CAN_SJA1000_PLATFORM_MODULE */
+
+
+/*
  * declare the available device resources on this board
  */
 static struct platform_device *devices[] __initdata = {
 	&smc91x_device,
 	&pcm027_flash,
 #ifdef CONFIG_LEDS_GPIO
-	&pcm027_led_dev
+	&pcm027_led_dev,
+#endif
+#if defined(CONFIG_CAN_SJA1000_PLATFORM) || defined(CONFIG_CAN_SJA1000_PLATFORM_MODULE)
+	&pcm027_sja1000_device,
 #endif
 };
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] allow to configure tcp_retries1 and tcp_retries2 per TCP socket
From: Salvador Fandino @ 2010-06-11 10:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: netdev, David S. Miller, linux-kernel, vger.kernel.org
In-Reply-To: <87bpbi4ycc.fsf@basil.nowhere.org>

On 06/10/2010 07:00 PM, Andi Kleen wrote:
> Salvador Fandino<salvador@qindel.com>  writes:
>
>
>    
>> The included patch adds support for setting the tcp_retries1 and
>> tcp_retries2 options in a per socket fashion as it is done for the
>> keepalive options TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL.
>>
>> The issue I am trying to solve is that when a socket has data queued for
>> delivering, the keepalive logic is not triggered. Instead, the
>> tcp_retries1/2 parameters are used to determine how many delivering
>> attempts should be performed before giving up.
>>      
> And why exactly do you need new tunables to solve this?
>    

How else could it be solved?

I can think of making the retransmission logic to also honor the 
keepalive settings, switching to sending packets every keepintvl seconds 
when the elapsed time goes over keepidle and abort after keepcnt.

Or, make retransmits_timed_out() also consider 
(keepidle+keepcnt*keepintvl) as a ceiling.

But frankly, I don't like any one of them. IMO, leaving alone backward 
compatibility issues, it would make more sense to do it the other way 
and change the keepalive logic to follow the same sending pattern used 
for data retransmissions, using keepidle, retries1 and retries2 as its 
parameters.

Well, another option would be to use keepcnt as retries2 when defined in 
tcp_sock. IMO it would make sense, but could be confusing for the user.


>> The patch is very straight forward and just replicates similar
>> functionality. There is one thing I am not completely sure and is if the
>> new per-socket fields should go into inet_connection_sock instead of
>> into tcp_sock.
>>      
> tcp_sock is already quite big (>2k on 64bit)
>
> IMHO any new fields in there need very good justification.
>    

If this is a problem, there are some room for optimization in the 
inet_connection_sock and tcp_sock structures. For instance, 
keepalive_time and keepalive_intvl are limited to MAX_KEEPALIVE_TIME * 
HZ, that is 32767 * 1000 ==> 25 bits, so they would fit in a u32.

retries1 and retries2 fields also fit in u32 and actually, a per socket 
retries1 field is not absolutely required because the check against 
retries2 is always performed, so the impact of this patch on the 
structure size could be limited to 4 bytes.

- Salva



^ permalink raw reply

* Source address selection vs. policy routing?
From: Andrew Lutomirski @ 2010-06-11 10:41 UTC (permalink / raw)
  To: netdev

The docs seem rather vague
(http://linux-ip.net/html/routing-saddr-selection.html) and I'm about
to set up a system with two different internet connections, so I'll
ask here:

If I use policy routing (ip rule) based on the src selector and a
local application calls connect without first calling bind, how is the
source address selected?  (Just reading the docs naively gives a
circular answer: the routing table is chosen based on source address
and then the source address is influenced by the src field on the
chosen route.)

Similarly, how does masquerade choose its source address?

I took a look at the code, but all the fast paths, slow paths, and
flow structures are a bit confusing to the uninitiated.

Thanks,
Andy

^ permalink raw reply

* [PATCH v4] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
From: sonic zhang @ 2010-06-11 10:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, uclinux-dist-devel

>From 4779e43a5a8446f695f8d6f5a006cfb45dc093d8 Mon Sep 17 00:00:00 2001
From: Sonic Zhang <sonic.zhang@analog.com>
Date: Fri, 11 Jun 2010 17:44:31 +0800
Subject: [PATCH v4] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer

SKBs hold onto resources that can't be held indefinitely, such as TCP
socket references and netfilter conntrack state.  So if a packet is left
in TX ring for a long time, there might be a TCP socket that cannot be
closed and freed up.

Current blackfin EMAC driver always reclaim and free used tx skbs in future
transfers. The problem is that future transfer may not come as soon as
possible. This patch start a timer after transfer to reclaim and free skb.
There is nearly no performance drop with this patch.

TX interrupt is not enabled because of a strange behavior of the Blackfin EMAC.
If EMAC TX transfer control is turned on, endless TX interrupts are triggered
no matter if TX DMA is enabled or not. Since DMA walks down the ring automatically,
TX transfer control can't be turned off in the middle. The only way is to disable
TX interrupt completely.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
---
 drivers/net/bfin_mac.c |  123 +++++++++++++++++++++++++++++++-----------------
 drivers/net/bfin_mac.h |    5 ++
 2 files changed, 85 insertions(+), 43 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 368f333..012613f 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -922,61 +922,73 @@ static void bfin_mac_hwtstamp_init(struct net_device *netdev)
 # define bfin_tx_hwtstamp(dev, skb)
 #endif
 
-static void adjust_tx_list(void)
+static inline void _tx_reclaim_skb(void)
+{
+	do {
+		tx_list_head->desc_a.config &= ~DMAEN;
+		tx_list_head->status.status_word = 0;
+		if (tx_list_head->skb) {
+			dev_kfree_skb(tx_list_head->skb);
+			tx_list_head->skb = NULL;
+		}
+		tx_list_head = tx_list_head->next;
+
+	} while (tx_list_head->status.status_word != 0);
+}
+
+static void tx_reclaim_skb(struct bfin_mac_local *lp)
 {
 	int timeout_cnt = MAX_TIMEOUT_CNT;
 
-	if (tx_list_head->status.status_word != 0 &&
-	    current_tx_ptr != tx_list_head) {
-		goto adjust_head;	/* released something, just return; */
-	}
+	if (tx_list_head->status.status_word != 0)
+		_tx_reclaim_skb();
 
-	/*
-	 * if nothing released, check wait condition
-	 * current's next can not be the head,
-	 * otherwise the dma will not stop as we want
-	 */
-	if (current_tx_ptr->next->next == tx_list_head) {
+	if (current_tx_ptr->next == tx_list_head) {
 		while (tx_list_head->status.status_word == 0) {
+			/* slow down polling to avoid too many queue stop. */
 			udelay(10);
-			if (tx_list_head->status.status_word != 0 ||
-			    !(bfin_read_DMA2_IRQ_STATUS() & DMA_RUN)) {
-				goto adjust_head;
-			}
-			if (timeout_cnt-- < 0) {
-				printk(KERN_ERR DRV_NAME
-				": wait for adjust tx list head timeout\n");
+			/* reclaim skb if DMA is not running. */
+			if (!(bfin_read_DMA2_IRQ_STATUS() & DMA_RUN))
+				break;
+			if (timeout_cnt-- < 0)
 				break;
-			}
-		}
-		if (tx_list_head->status.status_word != 0) {
-			goto adjust_head;
 		}
+
+		if (timeout_cnt >= 0)
+			_tx_reclaim_skb();
+		else
+			netif_stop_queue(lp->ndev);
 	}
 
-	return;
+	if (current_tx_ptr->next != tx_list_head &&
+		netif_queue_stopped(lp->ndev))
+		netif_wake_queue(lp->ndev);
+
+	if (tx_list_head != current_tx_ptr) {
+		/* shorten the timer interval if tx queue is stopped */
+		if (netif_queue_stopped(lp->ndev))
+			lp->tx_reclaim_timer.expires =
+				jiffies + (TX_RECLAIM_JIFFIES >> 4);
+		else
+			lp->tx_reclaim_timer.expires =
+				jiffies + TX_RECLAIM_JIFFIES;
+
+		mod_timer(&lp->tx_reclaim_timer,
+			lp->tx_reclaim_timer.expires);
+	}
 
-adjust_head:
-	do {
-		tx_list_head->desc_a.config &= ~DMAEN;
-		tx_list_head->status.status_word = 0;
-		if (tx_list_head->skb) {
-			dev_kfree_skb(tx_list_head->skb);
-			tx_list_head->skb = NULL;
-		} else {
-			printk(KERN_ERR DRV_NAME
-			       ": no sk_buff in a transmitted frame!\n");
-		}
-		tx_list_head = tx_list_head->next;
-	} while (tx_list_head->status.status_word != 0 &&
-		 current_tx_ptr != tx_list_head);
 	return;
+}
 
+static void tx_reclaim_skb_timeout(unsigned long lp)
+{
+	tx_reclaim_skb((struct bfin_mac_local *)lp);
 }
 
 static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 				struct net_device *dev)
 {
+	struct bfin_mac_local *lp = netdev_priv(dev);
 	u16 *data;
 	u32 data_align = (unsigned long)(skb->data) & 0x3;
 	union skb_shared_tx *shtx = skb_tx(skb);
@@ -1009,8 +1021,6 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 			skb->len);
 		current_tx_ptr->desc_a.start_addr =
 			(u32)current_tx_ptr->packet;
-		if (current_tx_ptr->status.status_word != 0)
-			current_tx_ptr->status.status_word = 0;
 		blackfin_dcache_flush_range(
 			(u32)current_tx_ptr->packet,
 			(u32)(current_tx_ptr->packet + skb->len + 2));
@@ -1022,6 +1032,9 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 	 */
 	SSYNC();
 
+	/* always clear status buffer before start tx dma */
+	current_tx_ptr->status.status_word = 0;
+
 	/* enable this packet's dma */
 	current_tx_ptr->desc_a.config |= DMAEN;
 
@@ -1037,13 +1050,14 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 	bfin_write_EMAC_OPMODE(bfin_read_EMAC_OPMODE() | TE);
 
 out:
-	adjust_tx_list();
-
 	bfin_tx_hwtstamp(dev, skb);
 
 	current_tx_ptr = current_tx_ptr->next;
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += (skb->len);
+
+	tx_reclaim_skb(lp);
+
 	return NETDEV_TX_OK;
 }
 
@@ -1167,8 +1181,11 @@ real_rx:
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void bfin_mac_poll(struct net_device *dev)
 {
+	struct bfin_mac_local *lp = netdev_priv(dev);
+
 	disable_irq(IRQ_MAC_RX);
 	bfin_mac_interrupt(IRQ_MAC_RX, dev);
+	tx_reclaim_skb(lp);
 	enable_irq(IRQ_MAC_RX);
 }
 #endif				/* CONFIG_NET_POLL_CONTROLLER */
@@ -1232,12 +1249,27 @@ static int bfin_mac_enable(void)
 /* Our watchdog timed out. Called by the networking layer */
 static void bfin_mac_timeout(struct net_device *dev)
 {
+	struct bfin_mac_local *lp = netdev_priv(dev);
+
 	pr_debug("%s: %s\n", dev->name, __func__);
 
 	bfin_mac_disable();
 
-	/* reset tx queue */
-	tx_list_tail = tx_list_head->next;
+	del_timer(&lp->tx_reclaim_timer);
+
+	/* reset tx queue and free skb */
+	while (tx_list_head != current_tx_ptr) {
+		tx_list_head->desc_a.config &= ~DMAEN;
+		tx_list_head->status.status_word = 0;
+		if (tx_list_head->skb) {
+			dev_kfree_skb(tx_list_head->skb);
+			tx_list_head->skb = NULL;
+		}
+		tx_list_head = tx_list_head->next;
+	}
+
+	if (netif_queue_stopped(lp->ndev))
+		netif_wake_queue(lp->ndev);
 
 	bfin_mac_enable();
 
@@ -1430,6 +1462,7 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 	SET_NETDEV_DEV(ndev, &pdev->dev);
 	platform_set_drvdata(pdev, ndev);
 	lp = netdev_priv(ndev);
+	lp->ndev = ndev;
 
 	/* Grab the MAC address in the MAC */
 	*(__le32 *) (&(ndev->dev_addr[0])) = cpu_to_le32(bfin_read_EMAC_ADDRLO());
@@ -1485,6 +1518,10 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 	ndev->netdev_ops = &bfin_mac_netdev_ops;
 	ndev->ethtool_ops = &bfin_mac_ethtool_ops;
 
+	init_timer(&lp->tx_reclaim_timer);
+	lp->tx_reclaim_timer.data = (unsigned long)lp;
+	lp->tx_reclaim_timer.function = tx_reclaim_skb_timeout;
+
 	spin_lock_init(&lp->lock);
 
 	/* now, enable interrupts */
diff --git a/drivers/net/bfin_mac.h b/drivers/net/bfin_mac.h
index 1ae7b82..04e4050 100644
--- a/drivers/net/bfin_mac.h
+++ b/drivers/net/bfin_mac.h
@@ -13,9 +13,12 @@
 #include <linux/net_tstamp.h>
 #include <linux/clocksource.h>
 #include <linux/timecompare.h>
+#include <linux/timer.h>
 
 #define BFIN_MAC_CSUM_OFFLOAD
 
+#define TX_RECLAIM_JIFFIES (HZ / 5)
+
 struct dma_descriptor {
 	struct dma_descriptor *next_dma_desc;
 	unsigned long start_addr;
@@ -68,6 +71,8 @@ struct bfin_mac_local {
 
 	int wol;		/* Wake On Lan */
 	int irq_wake_requested;
+	struct timer_list tx_reclaim_timer;
+	struct net_device *ndev;
 
 	/* MII and PHY stuffs */
 	int old_link;          /* used by bf537_adjust_link */
-- 
1.6.0




^ permalink raw reply related

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Johannes Berg @ 2010-06-11  9:55 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <AANLkTilUd2Bp1h4SzlW5f5KevoRM-os_fQrivLHbZyKY@mail.gmail.com>

On Tue, 2010-06-08 at 18:39 +0200, Kay Sievers wrote:

> That all works if you have two modules, like almost all buses have.
> That's what I meant, that we need to add stuff to the core to be able
> to cleanup bus devices internally too, if we use everything in a
> single module, which is also supposed to cleanup on unload, like the
> network devices like to do.

Or some "wait for bus to be cleaned up" we can call in the module exit
maybe?

johannes


^ permalink raw reply

* Re: no reassembly for outgoing packets on RAW socket
From: Jan Engelhardt @ 2010-06-11  9:53 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Patrick McHardy, netdev, Netfilter Developer Mailing List
In-Reply-To: <20100611081604.GA1739@jolsa.Belkin>


On Friday 2010-06-11 10:16, Jiri Olsa wrote:
>
>I prepared the patch implementing IP_NODEFRAG option for IPv4 socket.
>
>Also I just got an idea, that there could be no reassembly if there are
>no rules for connection tracing set.. not sure how can I check that best
>so far.. any idea?
>
>@@ -572,6 +572,14 @@ static int do_ip_setsockopt(struct sock *sk, int level,
> 		}
> 		inet->hdrincl = val ? 1 : 0;
> 		break;
>+	case IP_NODEFRAG:
>+		if (sk->sk_type != SOCK_RAW) {
>+			err = -ENOPROTOOPT;
>+			break;
>+		}
>+		inet->nodefrag = val ? 1 : 0;
>+		printk("IP_NODEFRAG %p -> %d\n", inet, inet->nodefrag);
>+		break;

You want to get rid of this printk otherwise it spews the logs.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox