Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH][V2] MACB: Set PHY address in kernel parameters
From: Anders Darander @ 2010-03-31  9:39 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Haavard Skinnemoen, David S. Miller, Jiri Pirko, Erik Waling,
	Patrick McHardy, Grant Likely, netdev, linux-kernel
In-Reply-To: <4BB31345.5050101@pengutronix.de>

* Marc Kleine-Budde <mkl@pengutronix.de> [100331 11:18]:
> We're using phy_mask in one of our projects, it's still using 2.6.29,
> though. I think it's worth testing if it's still working.

Well, that was obviously something I overlooked / misinterpreted when I
tried to solve our problem!

As I understand it, phy_mask is a bitfield for setting which PHYs should
be enabled... Then this should have worked OK for us!

> static struct at91_eth_data __initdata p298_macb_data = {
>        .is_rmii        = 0,
>        .phy_mask       = ~(1 << 8),
> };

> at91_add_device_eth(&p298_macb_data);

I've not tested it, but it should probably have worked.

Regards,
Anders

^ permalink raw reply

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
From: Eric W. Biederman @ 2010-03-31  9:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery
In-Reply-To: <4BB30644.9090809@gmail.com>

Tejun Heo <htejun@gmail.com> writes:

> Just wanna add a bit more.
>
> On 03/31/2010 05:17 PM, Tejun Heo wrote:
>> If you think all those callbacks are absolute necessities, can you
>> please at least add boatload of comments around them explaning what
>> they're meant to do and how they're gonna be used?  It's probably
>> because I don't have any experience with namespaces but I really can't
>> wrap my head around it as it currently stands.
>
> The reason why I talked about proper layering is the same reason.
> It's very difficult to review your code because I have no idea how
> those callbacks are meant to be used and gonna behave and that lowers
> maintainability significantly in the long run.  If at all possible,
> please make it implement a discrete function which is used to
> implement something higher up.  If it's already done like that and I'm
> just being stupid, please feel free to enlighten me.

Apologies.   There is a fine line between sending enough patches
to give context and completely overwhelming people with patches,
and of course by this time I am so accustomed to this code I am
practically blind to it.

Let me try a happy median between overwhelming and too little
information by giving you some experts, and a bit of overview.

(Ugh after have writing this I certainly will agree that we
 have some many layers in the device model that they become
 obfuscating abstractions).

Looking through my code there are 3 types of callbacks.
- Callbacks to the namespace type of a children.
  .child_ns_type
- Callbacks to find the namespace of a kobject.
  .namespace
- Callbacks on the a namespace type to find the namespace
  of a particular context.
  .current_ns
  .initial_ns  (not used in my patchset)
  .netlink_ns  (not used in my patchset)


In a world of weird explicitness I expect .child_ns_type and
.namespace could be made to go away by pushing through explicit
ns_type, and namespace parameters everywhere. But that seems
like an awful lot of unnecessary code churn and bloat with
the only real advantage being that we have an abstraction
stored explicit at each layer.

I use child_ns_type to see if a directory should be tagged
and to figure out the type of the tags on a sysfs directory.

I use current_ns to capture the namespace (of ns_type) of the
current process when sysfs is mounted so I know what to show
userspace.

I use ktype->namespace to figure out which namespace a given
kobject's name is in.

There are intermediate steps on those methods but that is
just what appears to be the necessary boilerplate to get
from a class down to a kobject.

The nstype callbacks initial_ns and netlink_ns are not used in this
patchset.  Instead they play a role in the filtering of events sent to
userspace.

netlink_ns is used to find the namespace of a netlink socket
to see if it is ok to send an event over a netlink socket.

static int kobj_bcast_filter(struct sock *dest_sk, struct sk_buff *skb, void *data)
{
	struct kobject *kobj = data;
	const struct kobj_ns_type_operations *ops;

	ops = kobj_ns_ops(kobj);
	if (ops) {
		const void *sock_ns, *ns;
		ns = kobj->ktype->namespace(kobj);
		sock_ns = ops->netlink_ns(dsk);
		return sock_ns != ns;
	}

	return 0;
}

initial_ns is used to figure out what the initial/default
namespace is for a class of namespaces.  We only report
with /sbin/hotplug events in the initial network namespace.
At least for now.

static int kobj_usermode_filter(struct kobject *kobj)
{
	const struct kobj_ns_type_operations *ops;

	ops = kobj_ns_ops(kobj);
	if (ops) {
		const void *init_ns, *ns;
		ns = kobj->ktype->namespace(kobj);
		init_ns = ops->initial_ns();
		return ns != init_ns;
	}

	return 0;
}

This is my change that adds support for the network namespace.
The only namespace I expect to add support for in the short term.

I hope this helps,

Eric


commit fdc0adeaa8bfab9a179e1eb349cab400ddb70403
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Thu Jul 3 16:13:11 2008 -0600

    netns: Teach network device kobjects which namespace they are in.

    The problem.  Network devices show up in sysfs and with the network
    namespace active multiple devices with the same name can show up in
    the same directory, ouch!

    To avoid that problem and allow existing applications in network namespaces
    to see the same interface that is currently presented in sysfs, this
    patch enables the tagging directory support in sysfs.

    By using the network namespace pointers as tags to separate out the
    the sysfs directory entries we ensure that we don't have conflicts
    in the directories and applications only see a limited set of
    the network devices.

    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index d9456f6..9452e39 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -138,6 +138,7 @@ extern const struct sysfs_ops kobj_sysfs_ops;

 enum kobj_ns_type {
 	KOBJ_NS_TYPE_NONE = 0,
+	KOBJ_NS_TYPE_NET,
 	KOBJ_NS_TYPES
 };

diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..265e33b 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES

 menu "Networking options"

+config NET_NS
+	bool "Network namespace support"
+	default n
+	depends on EXPERIMENTAL && NAMESPACES
+	help
+	  Allow user space to create what appear to be multiple instances
+	  of the network stack.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 099c753..1b98e36 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -13,7 +13,9 @@
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
+#include <linux/nsproxy.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <net/wext.h>
@@ -466,6 +468,37 @@ static struct attribute_group wireless_group = {
 };
 #endif

+static const void *net_current_ns(void)
+{
+	return current->nsproxy->net_ns;
+}
+
+static const void *net_initial_ns(void)
+{
+	return &init_net;
+}
+
+static const void *net_netlink_ns(struct sock *sk)
+{
+	return sock_net(sk);
+}
+
+static struct kobj_ns_type_operations net_ns_type_operations = {
+	.type = KOBJ_NS_TYPE_NET,
+	.current_ns = net_current_ns,
+	.netlink_ns = net_netlink_ns,
+	.initial_ns = net_initial_ns,
+};
+
+static void net_kobj_ns_exit(struct net *net)
+{
+	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
+}
+
+static struct pernet_operations sysfs_net_ops = {
+	.exit = net_kobj_ns_exit,
+};
+
 #endif /* CONFIG_SYSFS */

 #ifdef CONFIG_HOTPLUG
@@ -506,6 +539,13 @@ static void netdev_release(struct device *d)
 	kfree((char *)dev - dev->padded);
 }

+static const void *net_namespace(struct device *d)
+{
+	struct net_device *dev;
+	dev = container_of(d, struct net_device, dev);
+	return dev_net(dev);
+}
+
 static struct class net_class = {
 	.name = "net",
 	.dev_release = netdev_release,
@@ -515,6 +555,8 @@ static struct class net_class = {
 #ifdef CONFIG_HOTPLUG
 	.dev_uevent = netdev_uevent,
 #endif
+	.ns_type = &net_ns_type_operations,
+	.namespace = net_namespace,
 };

 /* Delete sysfs entries but hold kobject reference until after all
@@ -587,5 +629,9 @@ void netdev_initialize_kobject(struct net_device *net)

 int netdev_kobject_init(void)
 {
+	kobj_ns_type_register(&net_ns_type_operations);
+#ifdef CONFIG_SYSFS
+	register_pernet_subsys(&sysfs_net_ops);
+#endif
 	return class_register(&net_class);
 }

^ permalink raw reply related

* Re: [PATCH v3 10/12] l2tp: Add L2TP ethernet pseudowire support
From: James Chapman @ 2010-03-31  9:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20100330093252.60d9cbee@nehalam>

Stephen Hemminger wrote:
> On Tue, 30 Mar 2010 17:18:19 +0100
> James Chapman <jchapman@katalix.com> wrote:
> 
>> +struct l2tp_eth_net {
>> +	struct list_head l2tp_eth_dev_list;
>> +	rwlock_t l2tp_eth_lock;
>> +};
> 
> Reader/write locks are discouraged because they are slower than
> spin locks.  If you have lots of readers use RCU, if reading
> is infrequent just use a spin lock.

Ok. In doing the conversion of the rwlocks in l2tp_core.c, I'm finding
that some list access primitives don't have rcu equivalents, namely
list_is_last(), list_for_each_entry_safe(). Is this intentional? Should
I add the missing ones in a separate patch?


-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development


^ permalink raw reply

* Re: iproute u32 filter - server hang
From: Patrick McHardy @ 2010-03-31  9:34 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Linux Network Development list
In-Reply-To: <4BB3116F.803@itcare.pl>

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

Paweł Staszewski wrote:
> I find some problem with iproute2 and u32 filters
> 
> To reproduce the problem (need to make one mistake in filter parent
> declaration 1:101):
> 
> ...
> tc filter add dev eth0 protocol ip parent 1:101 u32 match ip protocol 1
> 0xff flowid 1:101
> 
> ping 212.77.100.101
> And after this server will stop responding to anything - without any
> error (hang).

This is caused by hfsc_classify() looping endlessly since the filter
points to the originating class. hfsc_bind_tcf() is actually supposed
to prevent this, but it only prevents resolving the filter immediately
and we still run into the loop at runtime.

This patch (based on how CBQ handles this) should abort classification
and fall back to the default class. It would be better to simply catch
this at configuration time, but that looks a bit more involved. I'll try
to look into it this weekend.



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1295 bytes --]

diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index b38b39c..a435cf1 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1155,7 +1155,7 @@ static struct hfsc_class *
 hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 {
 	struct hfsc_sched *q = qdisc_priv(sch);
-	struct hfsc_class *cl;
+	struct hfsc_class *head, *cl;
 	struct tcf_result res;
 	struct tcf_proto *tcf;
 	int result;
@@ -1166,6 +1166,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 			return cl;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+	head = &q->root;
 	tcf = q->root.filter_list;
 	while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
@@ -1180,6 +1181,8 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 		if ((cl = (struct hfsc_class *)res.class) == NULL) {
 			if ((cl = hfsc_find_class(res.classid, sch)) == NULL)
 				break; /* filter selected invalid classid */
+			if (cl->level >= head->level)
+				break; /* filter may only point downwards */
 		}
 
 		if (cl->level == 0)
@@ -1187,6 +1190,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 
 		/* apply inner filter chain */
 		tcf = cl->filter_list;
+		head = cl;
 	}
 
 	/* classification failed, try default class */

^ permalink raw reply related

* Re: [PATCH] virtio_net: avoid BUG_ON() with large packets when CONFIG_DEBUG_SG=y
From: Michael S. Tsirkin @ 2010-03-31  9:17 UTC (permalink / raw)
  To: Rusty Russell; +Cc: netdev
In-Reply-To: <201003301526.35720.rusty@rustcorp.com.au>

On Tue, Mar 30, 2010 at 03:26:35PM +1030, Rusty Russell wrote:
> AFAICT only weird kvm setups and lguest traverse this code path now.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

vhost still lacks support mergeable buffers so it uses this path.

> ---
>  drivers/net/virtio_net.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -351,6 +351,9 @@ static int add_recvbuf_big(struct virtne
>  	char *p;
>  	int i, err, offset;
>  
> +	/* This is a waste of cycles, but satisfies CONFIG_DEBUG_SG. */
> +	sg_init_table(sg, ARRAY_SIZE(sg));
> +

How about moving sg into virtnet_info? This way we could call
this only once, right?

>  	/* page in sg[MAX_SKB_FRAGS + 1] is list tail */
>  	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>  		first = get_a_page(vi, gfp);

^ permalink raw reply

* Re: [PATCH next-next-2.6 v2] virtio_net: missing sg_init_table
From: Michael S. Tsirkin @ 2010-03-31  9:20 UTC (permalink / raw)
  To: Shirley Ma; +Cc: David Miller, Thomas Müller, netdev, linux-kernel
In-Reply-To: <1269911955.14013.4.camel@localhost.localdomain>

On Mon, Mar 29, 2010 at 06:19:15PM -0700, Shirley Ma wrote:
> Add missing sg_init_table for sg_set_buf in virtio_net which
> induced in defer skb patch.
> 
> Reported-by: Thomas Müller <thomas@mathtm.de>
> Tested-by: Thomas Müller <thomas@mathtm.de>
> Signed-off-by: Shirley Ma <xma@us.ibm.com>

I'm concerned that the 'big' path might cause a performance regression.
Let's move sg into virtnet_info so that this needs to be only called
once?

> ---
>  drivers/net/virtio_net.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 25dc77c..3f5be35 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -326,6 +326,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
>  	struct scatterlist sg[2];
>  	int err;
>  
> +	sg_init_table(sg, 2);
>  	skb = netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN);
>  	if (unlikely(!skb))
>  		return -ENOMEM;
> @@ -351,6 +352,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
>  	char *p;
>  	int i, err, offset;
>  
> +	sg_init_table(sg, MAX_SKB_FRAGS + 2);
>  	/* page in sg[MAX_SKB_FRAGS + 1] is list tail */
>  	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>  		first = get_a_page(vi, gfp);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH][V2] MACB: Set PHY address in kernel parameters
From: Marc Kleine-Budde @ 2010-03-31  9:17 UTC (permalink / raw)
  To: Anders Darander
  Cc: Haavard Skinnemoen, David S. Miller, Jiri Pirko, Erik Waling,
	Patrick McHardy, Anders Darander, Grant Likely, netdev,
	linux-kernel
In-Reply-To: <1270025218-7245-1-git-send-email-anders.darander@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

Anders Darander wrote:
> From: Anders Darander <ad@datarespons.se>
> 
> Add the possibility to set the phy address. This is needed if an integrated
> switch is connected to the MAC, as it is often the case that the highest port
> is the one connected to the MAC of the MCU.
> 
> E.g. in the case of the Micrel KSZ8873, port 3 is the one to connect to the
> MCU, thus, the MAC needs to connect to phy address 0x03, instead of the first
> phy found.

We're using phy_mask in one of our projects, it's still using 2.6.29,
though. I think it's worth testing if it's still working.

static struct at91_eth_data __initdata p298_macb_data = {
       .is_rmii        = 0,
       .phy_mask       = ~(1 << 8),
};

at91_add_device_eth(&p298_macb_data);

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply

* iproute u32 filter - server hang
From: Paweł Staszewski @ 2010-03-31  9:10 UTC (permalink / raw)
  To: Linux Network Development list

I find some problem with iproute2 and u32 filters

To reproduce the problem (need to make one mistake in filter parent 
declaration 1:101):

tc qdisc add dev eth0 root handle 1: hfsc default 63
tc class add dev eth0 parent 1: classid 1:1 hfsc sc rate 100mbit ul rate 
100mbit
tc class add dev eth0 parent 1:1 classid 1:2 hfsc sc rate 1mbit ul rate 
1mbit
tc class add dev eth0 parent 1:1 classid 1:63 hfsc sc rate 99mbit ul 
rate 99mbit
tc class add dev eth0 parent 1:1 classid 1:101 hfsc sc rate 8kbit ul 
rate 1mbit
tc class add dev eth0 parent 1:101 classid 1:102 hfsc sc rate 8kbit ul 
rate 1mbit
tc filter add dev eth0 protocol ip parent 1: u32 match ip dst 
212.77.100.101 flowid 1:101
tc filter add dev eth0 protocol ip parent 1:101 u32 match ip protocol 1 
0xff flowid 1:101

ping 212.77.100.101
And after this server will stop responding to anything - without any 
error (hang).



With a little different rules:
tc qdisc add dev eth0 root handle 1: hfsc default 63
tc class add dev eth0 parent 1: classid 1:1 hfsc sc rate 100mbit ul rate 
100mbit
tc class add dev eth0 parent 1:1 classid 1:2 hfsc sc rate 1mbit ul rate 
1mbit
tc class add dev eth0 parent 1:1 classid 1:63 hfsc sc rate 99mbit ul 
rate 99mbit
tc class add dev eth0 parent 1:1 classid 1:101 hfsc sc rate 8kbit ul 
rate 1mbit
tc class add dev eth0 parent 1:101 classid 1:102 hfsc sc rate 8kbit ul 
rate 1mbit
tc filter add dev eth0 protocol ip parent 1: u32 match ip dst 
212.77.100.101 flowid 1:101
tc filter add dev eth0 protocol ip parent 1: u32 match ip protocol 1 
0xff flowid 1:101

ping 212.77.100.101
All is ok.

I check this with kernels 2.6.30.1 / 2.6.33 / 2.6.33.1
iproute tc utility version: iproute2-ss090324



Best Regards
Paweł Staszewski

^ permalink raw reply

* Re: [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode
From: Eric Dumazet @ 2010-03-31  9:08 UTC (permalink / raw)
  To: Andy Gospodarek, David Miller; +Cc: netdev, lhh, fubar, bonding-devel
In-Reply-To: <20100325214033.GA28741@gospo.rdu.redhat.com>

Le jeudi 25 mars 2010 à 17:40 -0400, Andy Gospodarek a écrit :
> Round-robin (mode 0) does nothing to ensure that any multicast traffic
> originally destined for the host will continue to arrive at the host when
> the link that sent the IGMP join or membership report goes down.  One of
> the benefits of absolute round-robin transmit.
> 
> Keeping track of subscribed multicast groups for each slave did not seem
> like a good use of resources, so I decided to simply send on the
> curr_active slave of the bond (typically the first enslaved device that
> is up).  This makes failover management simple as IGMP membership
> reports only need to be sent when the curr_active_slave changes.  I
> tested this patch and it appears to work as expected.
> 
> Originally reported by Lon Hohberger <lhh@redhat.com>.
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> CC: Lon Hohberger <lhh@redhat.com>
> CC: Jay Vosburgh <fubar@us.ibm.com>
> 
> ---
>  drivers/net/bonding/bond_main.c |   34 ++++++++++++++++++++++++++--------
>  1 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 430c022..0b38455 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1235,6 +1235,11 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
>  			write_lock_bh(&bond->curr_slave_lock);
>  		}
>  	}
> +
> +	/* resend IGMP joins since all were sent on curr_active_slave */
> +	if (bond->params.mode == BOND_MODE_ROUNDROBIN) {
> +		bond_resend_igmp_join_requests(bond);
> +	}
>  }
>  
>  /**
> @@ -4138,22 +4143,35 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
>  	struct bonding *bond = netdev_priv(bond_dev);
>  	struct slave *slave, *start_at;
>  	int i, slave_no, res = 1;
> +	struct iphdr *iph = ip_hdr(skb);
>  
>  	read_lock(&bond->lock);
>  
>  	if (!BOND_IS_OK(bond))
>  		goto out;
> -
>  	/*
> -	 * Concurrent TX may collide on rr_tx_counter; we accept that
> -	 * as being rare enough not to justify using an atomic op here
> +	 * Start with the curr_active_slave that joined the bond as the
> +	 * default for sending IGMP traffic.  For failover purposes one
> +	 * needs to maintain some consistency for the interface that will
> +	 * send the join/membership reports.  The curr_active_slave found
> +	 * will send all of this type of traffic.
>  	 */
> -	slave_no = bond->rr_tx_counter++ % bond->slave_cnt;
> +	if ((skb->protocol == htons(ETH_P_IP)) &&
> +	    (iph->protocol == htons(IPPROTO_IGMP))) {

Hmm...

iph->protocol is a u8, how can htons(IPPROTO_IGMP) be equal to
iph->protocol ?

[PATCH] bonding: bond_xmit_roundrobin() fix

Commit a2fd940f (bonding: fix broken multicast with round-robin mode)
added a problem on litle endian machines.

drivers/net/bonding/bond_main.c:4159: warning: comparison is always
false due to limited range of data type

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5b92fbf..5972a52 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4156,7 +4156,7 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
 	 * send the join/membership reports.  The curr_active_slave found
 	 * will send all of this type of traffic.
 	 */
-	if ((iph->protocol == htons(IPPROTO_IGMP)) &&
+	if ((iph->protocol == IPPROTO_IGMP) &&
 	    (skb->protocol == htons(ETH_P_IP))) {
 
 		read_lock(&bond->curr_slave_lock);




^ permalink raw reply related

* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Richard Hartmann @ 2010-03-31  9:01 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-ppp
In-Reply-To: <2d460de71003260850x7f90d04cy79ac853464108182@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6366 bytes --]

Hi all,

this is our attempt at a cleaner version. It is still far from being
perfect and packets seem to gain one to two bytes in size sometimes
which means that you will run into "normal" fragmentation if you set
your MTU to the possible maximum (as you are then over said max) but it
does what it should, can be changed at run-time and is not panicking the
kernel.

Feedback appreciated, code even more so.


Thanks,
Richard

PS: The main patch is inline, both the main and the debug patch are
attached.


--- drivers/net/ppp_generic.c.orig      2010-03-25 16:56:05.000000000 +0100
+++ drivers/net/ppp_generic.c   2010-03-30 19:56:15.000000000 +0200
@@ -129,6 +129,7 @@
        u32             nextseq;        /* MP: seq no of next packet */
        u32             minseq;         /* MP: min of most recent seqnos */
        struct sk_buff_head mrq;        /* MP: receive reconstruction queue */
+       int     rrsched;    /* round robin scheduler for packet distribution */
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
        struct sock_filter *pass_filter;        /* filter for packets to pass */
@@ -227,6 +228,17 @@
 #define B      0x80            /* this fragment begins a packet */
 #define E      0x40            /* this fragment ends a packet */

+#ifdef CONFIG_PPP_MULTILINK
+/* alternate fragmentation algorithm and multilink behaviour options
added by uli.staerk@globalways.net */
+static int ppp_ml_noexplode = 0;
+module_param(ppp_ml_noexplode, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noexplode, "Set this to any other values than
zero to avoid fragmentation over connected channels");
+
+static int ppp_ml_noheader = 0;
+module_param(ppp_ml_noheader, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noheader, "Set this to any other value than
zero to remove the ppp-multilink protocol header (pppoes.protocol not
0x003d) which enables fragmentation and reordering");
+#endif /* CONFIG_PPP_MULTILINK */
+
 /* Compare multilink sequence numbers (assumed to be 32 bits wide) */
 #define seq_before(a, b)	((s32)((a) - (b)) < 0)
 #define seq_after(a, b)		((s32)((a) - (b)) > 0)
@@ -250,6 +262,7 @@
 static void ppp_mp_insert(struct ppp *ppp, struct sk_buff *skb);
 static struct sk_buff *ppp_mp_reconstruct(struct ppp *ppp);
 static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb);
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb);
 #endif /* CONFIG_PPP_MULTILINK */
 static int ppp_set_compress(struct ppp *ppp, unsigned long arg);
 static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
@@ -1292,10 +1305,18 @@
 	}

 #ifdef CONFIG_PPP_MULTILINK
-	/* Multilink: fragment the packet over as many links
-	   as can take the packet at the moment. */
-	if (!ppp_mp_explode(ppp, skb))
+	/* send packet without multilink header */
+	if(ppp_ml_noheader) {
+		ppp_mp_roundrobin(ppp, skb);
 		return;
+	}
+	else {
+		/* Multilink: fragment the packet over as many links
+		   as can take the packet at the moment. */
+		if (!ppp_mp_explode(ppp, skb)) {
+			return;
+		}
+	}
 #endif /* CONFIG_PPP_MULTILINK */

 	ppp->xmit_pending = NULL;
@@ -1304,6 +1325,38 @@

 #ifdef CONFIG_PPP_MULTILINK
 /*
+ * Send packet through the next channel (round robin)
+ */
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb)
+{
+	int i;
+	struct channel *pch;
+
+	ppp->rrsched++;
+	i = 0;
+	list_for_each_entry(pch, &ppp->channels, clist)      {
+		if(pch->chan == NULL) continue;
+
+		if (ppp->rrsched % ppp->n_channels == i) {
+			spin_lock_bh(&pch->downl);
+			if (pch->chan) {
+				if (pch->chan->ops->start_xmit(pch->chan, skb)) {
+					ppp->xmit_pending = NULL;
+				}
+			} else {
+				/* channel got unregistered */
+				kfree_skb(skb);
+				ppp->xmit_pending = NULL;
+			}
+			spin_unlock_bh(&pch->downl);
+			return;
+		}
+		i++;
+	}
+	return;
+}
+
+/*
  * Divide a packet to be transmitted into fragments and
  * send them out the individual links.
  */
@@ -1352,13 +1405,21 @@
 		}
 		++i;
 	}
-	/*
-	 * Don't start sending this	packet unless at least half	of
-	 * the channels	are	free.  This	gives much better TCP
-	 * performance if we have a	lot	of channels.
-	 */
-	if (nfree == 0 || nfree	< navail / 2)
-		return 0; /* can't take now, leave it in xmit_pending	*/
+
+
+	if(ppp_ml_noexplode) {
+	}
+	else {
+		/*
+		 * Don't start sending this	packet unless at least half	of
+		 * the channels	are	free.  This	gives much better TCP
+		 * performance if we have a	lot	of channels.
+		 */
+		if (nfree == 0 || nfree	< navail / 2) {
+			return 0; /* can't take now, leave it in xmit_pending	*/
+
+		}
+	}

 	/* Do protocol field compression (XXX this should be optional) */
 	p =	skb->data;
@@ -1371,6 +1432,7 @@
 	totlen = len;
 	nbigger	= len %	nfree;

+
 	/* skip	to the channel after the one we	last used
 	   and start at	that one */
 	list = &ppp->channels;
@@ -1432,33 +1494,40 @@
 		*of the channel we are going to transmit on
 		*/
 		flen = len;
-		if (nfree > 0) {
-			if (pch->speed == 0) {
-				flen = totlen/nfree	;
-				if (nbigger > 0) {
-					flen++;
-					nbigger--;
-				}
-			} else {
-				flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
-					((totspeed*totfree)/pch->speed)) - hdrlen;
-				if (nbigger > 0) {
-					flen += ((totfree - nzero)*pch->speed)/totspeed;
-					nbigger -= ((totfree - nzero)*pch->speed)/
-							totspeed;
+
+		if(ppp_ml_noexplode) {
+			nfree--;
+		}
+		else {
+			if (nfree > 0) {
+				if (pch->speed == 0) {
+					flen = totlen/nfree	;
+					if (nbigger > 0) {
+						flen++;
+						nbigger--;
+					}
+				} else {
+					flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
+						((totspeed*totfree)/pch->speed)) - hdrlen;
+					if (nbigger > 0) {
+						flen += ((totfree - nzero)*pch->speed)/totspeed;
+						nbigger -= ((totfree - nzero)*pch->speed)/
+								totspeed;
+					}
 				}
+				nfree--;
 			}
-			nfree--;
+			/*
+			 *check	if we are on the last channel or
+			 *we exceded the lenght	of the data	to
+			 *fragment
+			 */
+			if ((nfree <= 0) || (flen > len))
+				flen = len;
+
 		}

 		/*
-		 *check	if we are on the last channel or
-		 *we exceded the lenght	of the data	to
-		 *fragment
-		 */
-		if ((nfree <= 0) || (flen > len))
-			flen = len;
-		/*
 		 *it is not worth to tx on slow channels:
 		 *in that case from the resulting flen according to the
 		 *above formula will be equal or less than zero.

[-- Attachment #2: ppp_ml_noexplode.patch --]
[-- Type: text/x-diff, Size: 5747 bytes --]

--- drivers/net/ppp_generic.c.orig	2010-03-25 16:56:05.000000000 +0100
+++ drivers/net/ppp_generic.c	2010-03-30 19:56:15.000000000 +0200
@@ -129,6 +129,7 @@
 	u32		nextseq;	/* MP: seq no of next packet */
 	u32		minseq;		/* MP: min of most recent seqnos */
 	struct sk_buff_head mrq;	/* MP: receive reconstruction queue */
+	int     rrsched;    /* round robin scheduler for packet distribution */
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
 	struct sock_filter *pass_filter;	/* filter for packets to pass */
@@ -227,6 +228,17 @@
 #define B	0x80		/* this fragment begins a packet */
 #define E	0x40		/* this fragment ends a packet */
 
+#ifdef CONFIG_PPP_MULTILINK
+/* alternate fragmentation algorithm and multilink behaviour options added by uli.staerk@globalways.net */
+static int ppp_ml_noexplode = 0;
+module_param(ppp_ml_noexplode, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noexplode, "Set this to any other values than zero to avoid fragmentation over connected channels");
+
+static int ppp_ml_noheader = 0;
+module_param(ppp_ml_noheader, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noheader, "Set this to any other value than zero to remove the ppp-multilink protocol header (pppoes.protocol not 0x003d) which enables fragmentation and reordering");
+#endif /* CONFIG_PPP_MULTILINK */
+
 /* Compare multilink sequence numbers (assumed to be 32 bits wide) */
 #define seq_before(a, b)	((s32)((a) - (b)) < 0)
 #define seq_after(a, b)		((s32)((a) - (b)) > 0)
@@ -250,6 +262,7 @@
 static void ppp_mp_insert(struct ppp *ppp, struct sk_buff *skb);
 static struct sk_buff *ppp_mp_reconstruct(struct ppp *ppp);
 static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb);
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb);
 #endif /* CONFIG_PPP_MULTILINK */
 static int ppp_set_compress(struct ppp *ppp, unsigned long arg);
 static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
@@ -1292,10 +1305,18 @@
 	}
 
 #ifdef CONFIG_PPP_MULTILINK
-	/* Multilink: fragment the packet over as many links
-	   as can take the packet at the moment. */
-	if (!ppp_mp_explode(ppp, skb))
+	/* send packet without multilink header */
+	if(ppp_ml_noheader) {
+		ppp_mp_roundrobin(ppp, skb);
 		return;
+	}
+	else {
+		/* Multilink: fragment the packet over as many links
+		   as can take the packet at the moment. */
+		if (!ppp_mp_explode(ppp, skb)) {
+			return;
+		}
+	}
 #endif /* CONFIG_PPP_MULTILINK */
 
 	ppp->xmit_pending = NULL;
@@ -1304,6 +1325,38 @@
 
 #ifdef CONFIG_PPP_MULTILINK
 /*
+ * Send packet through the next channel (round robin)
+ */
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb)
+{
+	int i;
+	struct channel *pch;
+
+	ppp->rrsched++;
+	i = 0;
+	list_for_each_entry(pch, &ppp->channels, clist)      {
+		if(pch->chan == NULL) continue;
+
+		if (ppp->rrsched % ppp->n_channels == i) {
+			spin_lock_bh(&pch->downl);
+			if (pch->chan) {
+				if (pch->chan->ops->start_xmit(pch->chan, skb)) {
+					ppp->xmit_pending = NULL;
+				}
+			} else {
+				/* channel got unregistered */
+				kfree_skb(skb);
+				ppp->xmit_pending = NULL;
+			}
+			spin_unlock_bh(&pch->downl);
+			return;
+		}
+		i++;
+	}
+	return;
+}
+
+/*
  * Divide a packet to be transmitted into fragments and
  * send them out the individual links.
  */
@@ -1352,13 +1405,21 @@
 		}
 		++i;
 	}
-	/*
-	 * Don't start sending this	packet unless at least half	of
-	 * the channels	are	free.  This	gives much better TCP
-	 * performance if we have a	lot	of channels.
-	 */
-	if (nfree == 0 || nfree	< navail / 2)
-		return 0; /* can't take now, leave it in xmit_pending	*/
+
+
+	if(ppp_ml_noexplode) {
+	}
+	else {
+		/*
+		 * Don't start sending this	packet unless at least half	of
+		 * the channels	are	free.  This	gives much better TCP
+		 * performance if we have a	lot	of channels.
+		 */
+		if (nfree == 0 || nfree	< navail / 2) {
+			return 0; /* can't take now, leave it in xmit_pending	*/
+
+		}
+	}
 
 	/* Do protocol field compression (XXX this should be optional) */
 	p =	skb->data;
@@ -1371,6 +1432,7 @@
 	totlen = len;
 	nbigger	= len %	nfree;
 
+
 	/* skip	to the channel after the one we	last used
 	   and start at	that one */
 	list = &ppp->channels;
@@ -1432,33 +1494,40 @@
 		*of the channel we are going to transmit on
 		*/
 		flen = len;
-		if (nfree > 0) {
-			if (pch->speed == 0) {
-				flen = totlen/nfree	;
-				if (nbigger > 0) {
-					flen++;
-					nbigger--;
-				}
-			} else {
-				flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
-					((totspeed*totfree)/pch->speed)) - hdrlen;
-				if (nbigger > 0) {
-					flen += ((totfree - nzero)*pch->speed)/totspeed;
-					nbigger -= ((totfree - nzero)*pch->speed)/
-							totspeed;
+
+		if(ppp_ml_noexplode) {
+			nfree--;
+		}
+		else {
+			if (nfree > 0) {
+				if (pch->speed == 0) {
+					flen = totlen/nfree	;
+					if (nbigger > 0) {
+						flen++;
+						nbigger--;
+					}
+				} else {
+					flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
+						((totspeed*totfree)/pch->speed)) - hdrlen;
+					if (nbigger > 0) {
+						flen += ((totfree - nzero)*pch->speed)/totspeed;
+						nbigger -= ((totfree - nzero)*pch->speed)/
+								totspeed;
+					}
 				}
+				nfree--;
 			}
-			nfree--;
+			/*
+			 *check	if we are on the last channel or
+			 *we exceded the lenght	of the data	to
+			 *fragment
+			 */
+			if ((nfree <= 0) || (flen > len))
+				flen = len;
+
 		}
 
 		/*
-		 *check	if we are on the last channel or
-		 *we exceded the lenght	of the data	to
-		 *fragment
-		 */
-		if ((nfree <= 0) || (flen > len))
-			flen = len;
-		/*
 		 *it is not worth to tx on slow channels:
 		 *in that case from the resulting flen according to the
 		 *above formula will be equal or less than zero.

[-- Attachment #3: ppp_ml_noexplode-with_debug.patch --]
[-- Type: text/x-diff, Size: 7913 bytes --]

--- drivers/net/ppp_generic.c.orig	2010-03-25 16:56:05.000000000 +0100
+++ drivers/net/ppp_generic.c.patched_with_debug	2010-03-30 20:03:31.000000000 +0200
@@ -129,6 +129,7 @@
 	u32		nextseq;	/* MP: seq no of next packet */
 	u32		minseq;		/* MP: min of most recent seqnos */
 	struct sk_buff_head mrq;	/* MP: receive reconstruction queue */
+	int     rrsched;    /* round robin scheduler for packet distribution */
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
 	struct sock_filter *pass_filter;	/* filter for packets to pass */
@@ -227,6 +228,17 @@
 #define B	0x80		/* this fragment begins a packet */
 #define E	0x40		/* this fragment ends a packet */
 
+#ifdef CONFIG_PPP_MULTILINK
+/* alternate fragmentation algorithm and multilink behaviour options added by uli.staerk@globalways.net */
+static int ppp_ml_noexplode = 0;
+module_param(ppp_ml_noexplode, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noexplode, "Set this to any other values than zero to avoid fragmentation over connected channels");
+
+static int ppp_ml_noheader = 0;
+module_param(ppp_ml_noheader, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noheader, "Set this to any other value than zero to remove the ppp-multilink protocol header (pppoes.protocol not 0x003d) which enables fragmentation and reordering");
+#endif /* CONFIG_PPP_MULTILINK */
+
 /* Compare multilink sequence numbers (assumed to be 32 bits wide) */
 #define seq_before(a, b)	((s32)((a) - (b)) < 0)
 #define seq_after(a, b)		((s32)((a) - (b)) > 0)
@@ -250,6 +262,7 @@
 static void ppp_mp_insert(struct ppp *ppp, struct sk_buff *skb);
 static struct sk_buff *ppp_mp_reconstruct(struct ppp *ppp);
 static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb);
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb);
 #endif /* CONFIG_PPP_MULTILINK */
 static int ppp_set_compress(struct ppp *ppp, unsigned long arg);
 static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
@@ -1273,6 +1286,7 @@
 		return;
 	}
 
+printk(KERN_ERR "send packet\n");
 	if ((ppp->flags & SC_MULTILINK) == 0) {
 		/* not doing multilink: send it down the first channel */
 		list = list->next;
@@ -1292,11 +1306,24 @@
 	}
 
 #ifdef CONFIG_PPP_MULTILINK
-	/* Multilink: fragment the packet over as many links
-	   as can take the packet at the moment. */
-	if (!ppp_mp_explode(ppp, skb))
+	/* send packet without multilink header */
+	if(ppp_ml_noheader) {
+printk(KERN_ERR "BEGIN SEND RR\n");
+		ppp_mp_roundrobin(ppp, skb);
 		return;
+printk(KERN_ERR "END SEND RR\n");
+	}
+	else {
+printk(KERN_ERR "BEGIN SEND MULTILINK\n");
+		/* Multilink: fragment the packet over as many links
+		   as can take the packet at the moment. */
+		if (!ppp_mp_explode(ppp, skb)) {
+printk(KERN_ERR "END SEND MULTILINK\n");
+			return;
+		}
+	}
 #endif /* CONFIG_PPP_MULTILINK */
+printk(KERN_ERR "END SEND DROP PACKET\n");
 
 	ppp->xmit_pending = NULL;
 	kfree_skb(skb);
@@ -1304,6 +1331,41 @@
 
 #ifdef CONFIG_PPP_MULTILINK
 /*
+ * Send packet through the next channel (round robin)
+ */
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb)
+{
+	int i;
+	struct channel *pch;
+
+	ppp->rrsched++;
+printk(KERN_ERR "  RR counter=%d, len=%d, devmtu=%d\n", ppp->rrsched, skb->len, ppp->dev->mtu);
+	i = 0;
+	list_for_each_entry(pch, &ppp->channels, clist)      {
+		if(pch->chan == NULL) continue;
+
+		if (ppp->rrsched % ppp->n_channels == i) {
+printk(KERN_ERR "  RR send via %d, chmtu=%d\n", i, pch->chan->mtu);
+			spin_lock_bh(&pch->downl);
+			if (pch->chan) {
+				if (pch->chan->ops->start_xmit(pch->chan, skb)) {
+					ppp->xmit_pending = NULL;
+				}
+			} else {
+printk(KERN_ERR "  RR dropped at %d\n", i);
+				/* channel got unregistered */
+				kfree_skb(skb);
+				ppp->xmit_pending = NULL;
+			}
+			spin_unlock_bh(&pch->downl);
+			return;
+		}
+		i++;
+	}
+	return;
+}
+
+/*
  * Divide a packet to be transmitted into fragments and
  * send them out the individual links.
  */
@@ -1352,13 +1414,25 @@
 		}
 		++i;
 	}
-	/*
-	 * Don't start sending this	packet unless at least half	of
-	 * the channels	are	free.  This	gives much better TCP
-	 * performance if we have a	lot	of channels.
-	 */
-	if (nfree == 0 || nfree	< navail / 2)
-		return 0; /* can't take now, leave it in xmit_pending	*/
+
+printk(KERN_ERR "  ML nfree=%d, navail=%d, nzero=%d, totfree=%d, totspeed=%d\n", nfree,navail,nzero,totfree,totspeed);
+
+	if(ppp_ml_noexplode) {
+printk(KERN_ERR "  ML no explode A\n");
+	}
+	else {
+printk(KERN_ERR "  ML explode A\n");
+		/*
+		 * Don't start sending this	packet unless at least half	of
+		 * the channels	are	free.  This	gives much better TCP
+		 * performance if we have a	lot	of channels.
+		 */
+		if (nfree == 0 || nfree	< navail / 2) {
+printk(KERN_ERR "  ML wait A\n");
+			return 0; /* can't take now, leave it in xmit_pending	*/
+
+		}
+	}
 
 	/* Do protocol field compression (XXX this should be optional) */
 	p =	skb->data;
@@ -1371,6 +1445,8 @@
 	totlen = len;
 	nbigger	= len %	nfree;
 
+printk(KERN_ERR "  ML len=%d, totlen=%d, nbigger=%d\n", len, totlen, nbigger);
+
 	/* skip	to the channel after the one we	last used
 	   and start at	that one */
 	list = &ppp->channels;
@@ -1381,10 +1457,12 @@
 			break;
 		}
 	}
+printk(KERN_ERR "  ML skip to channel=%d\n", i);
 
 	/* create a	fragment for each channel */
 	bits = B;
 	while (len	> 0) {
+printk(KERN_ERR "  ML while len=%d, i=%d\n", len, i);
 		list = list->next;
 		if (list ==	&ppp->channels)	{
 			i =	0;
@@ -1432,45 +1510,58 @@
 		*of the channel we are going to transmit on
 		*/
 		flen = len;
-		if (nfree > 0) {
-			if (pch->speed == 0) {
-				flen = totlen/nfree	;
-				if (nbigger > 0) {
-					flen++;
-					nbigger--;
-				}
-			} else {
-				flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
-					((totspeed*totfree)/pch->speed)) - hdrlen;
-				if (nbigger > 0) {
-					flen += ((totfree - nzero)*pch->speed)/totspeed;
-					nbigger -= ((totfree - nzero)*pch->speed)/
-							totspeed;
+printk(KERN_ERR "  ML fragmentlen=%d\n", flen);
+
+		if(ppp_ml_noexplode) {
+printk(KERN_ERR "  ML no explode B \n");
+			nfree--;
+		}
+		else {
+printk(KERN_ERR "  ML explode B \n");
+			if (nfree > 0) {
+				if (pch->speed == 0) {
+					flen = totlen/nfree	;
+					if (nbigger > 0) {
+						flen++;
+						nbigger--;
+					}
+				} else {
+					flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
+						((totspeed*totfree)/pch->speed)) - hdrlen;
+					if (nbigger > 0) {
+						flen += ((totfree - nzero)*pch->speed)/totspeed;
+						nbigger -= ((totfree - nzero)*pch->speed)/
+								totspeed;
+					}
 				}
+				nfree--;
 			}
-			nfree--;
+			/*
+			 *check	if we are on the last channel or
+			 *we exceded the lenght	of the data	to
+			 *fragment
+			 */
+			if ((nfree <= 0) || (flen > len))
+				flen = len;
+
+printk(KERN_ERR "  ML new fragmentlen=%d\n", flen);
 		}
 
 		/*
-		 *check	if we are on the last channel or
-		 *we exceded the lenght	of the data	to
-		 *fragment
-		 */
-		if ((nfree <= 0) || (flen > len))
-			flen = len;
-		/*
 		 *it is not worth to tx on slow channels:
 		 *in that case from the resulting flen according to the
 		 *above formula will be equal or less than zero.
 		 *Skip the channel in this case
 		 */
 		if (flen <=	0) {
+printk(KERN_ERR "  ML fragmentlen is zero\n");
 			pch->avail = 2;
 			spin_unlock_bh(&pch->downl);
 			continue;
 		}
 
 		mtu	= pch->chan->mtu - hdrlen;
+printk(KERN_ERR "  ML mtu=%d (chan-mtu=%d)\n", mtu, pch->chan->mtu);
 		if (mtu	< 4)
 			mtu	= 4;
 		if (flen > mtu)
@@ -1502,6 +1593,7 @@
 		if (!skb_queue_empty(&pch->file.xq)	||
 			!chan->ops->start_xmit(chan, frag))
 			skb_queue_tail(&pch->file.xq, frag);
+printk(KERN_ERR "  ML sent packet with seq: %d\n", ppp->nxseq);
 		pch->had_frag =	1;
 		p += flen;
 		len	-= flen;
@@ -1510,6 +1602,7 @@
 		spin_unlock_bh(&pch->downl);
 	}
 	ppp->nxchan	= i;
+printk(KERN_ERR "  ML END SEND\n");
 
 	return 1;
 

^ permalink raw reply

* Re: [PATCH v3 09/12] l2tp: Add netlink control API for L2TP
From: Patrick McHardy @ 2010-03-31  8:59 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev
In-Reply-To: <20100330161814.9628.43239.stgit@bert.katalix.com>

James Chapman wrote:
> +static struct nla_policy l2tp_nl_policy[L2TP_ATTR_MAX + 1] = {
> +	[L2TP_ATTR_NONE]		= { .type = NLA_UNSPEC, },
> +	[L2TP_ATTR_PW_TYPE]		= { .type = NLA_U16, },
> +	[L2TP_ATTR_ENCAP_TYPE]		= { .type = NLA_U16, },
> +	[L2TP_ATTR_OFFSET]		= { .type = NLA_U16, },
> +	[L2TP_ATTR_DATA_SEQ]		= { .type = NLA_U8, },
> +	[L2TP_ATTR_L2SPEC_TYPE]		= { .type = NLA_U8, },
> +	[L2TP_ATTR_L2SPEC_LEN]		= { .type = NLA_U8, },
> +	[L2TP_ATTR_PROTO_VERSION]	= { .type = NLA_U8, },
> +	[L2TP_ATTR_CONN_ID]		= { .type = NLA_U32, },
> +	[L2TP_ATTR_PEER_CONN_ID]	= { .type = NLA_U32, },
> +	[L2TP_ATTR_SESSION_ID]		= { .type = NLA_U32, },
> +	[L2TP_ATTR_PEER_SESSION_ID]	= { .type = NLA_U32, },
> +	[L2TP_ATTR_UDP_CSUM]		= { .type = NLA_FLAG, },
> +	[L2TP_ATTR_VLAN_ID]		= { .type = NLA_U16, },
> +	[L2TP_ATTR_DEBUG]		= { .type = NLA_U32, },
> +	[L2TP_ATTR_RECV_SEQ]		= { .type = NLA_FLAG, },
> +	[L2TP_ATTR_SEND_SEQ]		= { .type = NLA_FLAG, },
> +	[L2TP_ATTR_LNS_MODE]		= { .type = NLA_FLAG, },
> +	[L2TP_ATTR_USING_IPSEC]		= { .type = NLA_FLAG, },

Please don't use NLA_FLAG, it diverges from the usual netlink
attribute semantics since you either can't unset the flag in
case you check for the presence of the attribute (the attribute
can only encode "flag set") or you'll have to send the current
value in every message even if you don't intend to change it.
In this case it can't be unset.

A better way is to use something like this:

struct l2tp_flags {
	__u32	value;
	__u32	mask;
};

and set mask to the bits you intend to change.

^ permalink raw reply

* Re: [PATCH] MACB: Set PHY address in kernel parameters
From: Anders Darander @ 2010-03-31  8:50 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, linux-kernel
In-Reply-To: <4BB307CA.9080305@trash.net>

* Patrick McHardy <kaber@trash.net> [100331 10:29]:
> Anders Darander wrote:
> > +	if (phy_addr >= PHY_MAX_ADDRESS)
> > +		phydev = phy_find_first(bp->mii_bus);
> > +	else
> > +		phydev = bp->mii_bus->phy_map[phy_addr];

> This looks like you need to use an unsigned to avoid negative
> indices.

Thanks for spotting this. That made me find that I'd also been inconsistent with the variable and module parameter declarations.

A new version, V2, of the patch has been sent.

Best regards,
Anders Darander

^ permalink raw reply

* [PATCH][V2] MACB: Set PHY address in kernel parameters
From: Anders Darander @ 2010-03-31  8:46 UTC (permalink / raw)
  To: Haavard Skinnemoen
  Cc: David S. Miller, Jiri Pirko, Erik Waling, Patrick McHardy,
	Anders Darander, Grant Likely, netdev, linux-kernel

From: Anders Darander <ad@datarespons.se>

Add the possibility to set the phy address. This is needed if an integrated
switch is connected to the MAC, as it is often the case that the highest port
is the one connected to the MAC of the MCU.

E.g. in the case of the Micrel KSZ8873, port 3 is the one to connect to the
MCU, thus, the MAC needs to connect to phy address 0x03, instead of the first
phy found.

Signed-off-by: Anders Darander <ad@datarespons.se>
---

Changes from V1:
	* Made the variable type consistent with the module parameter
	  declaration. (I.e. made it unsigned short).

 drivers/net/macb.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/net/macb.c b/drivers/net/macb.c
index c8a18a6..af7b61b 100644
--- a/drivers/net/macb.c
+++ b/drivers/net/macb.c
@@ -53,6 +53,14 @@
 #define MACB_RX_INT_FLAGS	(MACB_BIT(RCOMP) | MACB_BIT(RXUBR)	\
 				 | MACB_BIT(ISR_ROVR))
 
+/*
+ * Setup PHY probeing
+ */
+
+static unsigned short phy_addr = PHY_MAX_ADDR;
+module_param(phy_addr, ushort, 0);
+MODULE_PARAM_DESC(phy_addr, "PHY address connected to the MACB");
+
 static void __macb_set_hwaddr(struct macb *bp)
 {
 	u32 bottom;
@@ -193,7 +201,11 @@ static int macb_mii_probe(struct net_device *dev)
 	struct eth_platform_data *pdata;
 	int ret;
 
-	phydev = phy_find_first(bp->mii_bus);
+	if (phy_addr >= PHY_MAX_ADDRESS)
+		phydev = phy_find_first(bp->mii_bus);
+	else
+		phydev = bp->mii_bus->phy_map[phy_addr];
+
 	if (!phydev) {
 		printk (KERN_ERR "%s: no PHY found\n", dev->name);
 		return -1;
-- 
1.7.0.3

^ permalink raw reply related

* Re: [PATCH v3 04/12] l2tp: Add ppp device name to L2TP ppp session data
From: David Miller @ 2010-03-31  8:46 UTC (permalink / raw)
  To: jchapman; +Cc: shemminger, netdev
In-Reply-To: <4BB2FD0D.1080105@katalix.com>

From: James Chapman <jchapman@katalix.com>
Date: Wed, 31 Mar 2010 08:43:09 +0100

> Stephen Hemminger wrote:
>> On Tue, 30 Mar 2010 17:17:46 +0100
>> James Chapman <jchapman@katalix.com> wrote:
>> 
>>> When dumping L2TP PPP sessions using /proc/net/l2tp, get
>>> the assigned PPP device name from PPP using ppp_dev_name().
>>>
>>> Signed-off-by: James Chapman <jchapman@katalix.com>
>>> Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
>>>
>> 
>> Why is this a necessary API?
>> Why not put it in debugfs if just a debugging tool?
> 
> With the original driver (merged in 2.6.23), some people use horrible
> hacks in scripts to derive info about their L2TP connections from /proc.
> So I was reluctant to move it to debugfs in the new driver. If it is ok
> to move an existing /proc file to debugfs, I'm happy to do so. People
> should obtain such info from their L2TP userspace daemon, or through
> netlink anyway.

Existing stuff you shouldn't move around, people do depend on it
and thus it has to be retained.

But for new stuff, we can try to think about better ways to export the
information if possible.

^ permalink raw reply

* Re: [PATCH RFC] fix problems with NETIF_F_HIGHDMA in networking drivers v2
From: FUJITA Tomonori @ 2010-03-31  8:35 UTC (permalink / raw)
  To: davem
  Cc: fujita.tomonori, hancockrwd, linux-kernel, netdev, linux-usb,
	bzolnier
In-Reply-To: <20100325.203520.234306178.davem@davemloft.net>

On Thu, 25 Mar 2010 20:35:20 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Date: Fri, 26 Mar 2010 12:33:12 +0900
> 
> > On Thu, 25 Mar 2010 19:03:37 -0600
> > Robert Hancock <hancockrwd@gmail.com> wrote:
> > 
> >> This seems like it could be a reasonable approach. The only thing is
> >> that in this code you're returning 1 if the parent device has no DMA
> >> mask set. Wouldn't it make more sense to return 0 in this case? I'm
> >> assuming that in that situation it's a virtual device not backed by
> >> any hardware and there should be no DMA mask restriction...
> > 
> > I chose the safer option because I don't know enough how net_device
> > structure is used. If returning zero in such case is always safe, it's
> > fine by me. any example of such virtual device driver?
> 
> Like Fujita I'd rather play it safe here.
> 
> Even for virtual devices, DMA information up to the root bus
> ought to be sane.

Agreed. 

Here's the patch in the proper format.

Probably, PCI_DMA_BUS_IS_PHYS should be renamed to something like
DMA_BUS_IS_PHYS and moved to the better place. Then we can avoid
including linux/pci.h.

=
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH] net: change illegal_highdma to use dma_mask

Robert Hancock pointed out two problems about NETIF_F_HIGHDMA:

-Many drivers only set the flag when they detect they can use 64-bit DMA,
since otherwise they could receive DMA addresses that they can't handle
(which on platforms without IOMMU/SWIOTLB support is fatal). This means that if
64-bit support isn't available, even buffers located below 4GB will get copied
unnecessarily.

-Some drivers set the flag even though they can't actually handle 64-bit DMA,
which would mean that on platforms without IOMMU/SWIOTLB they would get a DMA
mapping error if the memory they received happened to be located above 4GB.

http://lkml.org/lkml/2010/3/3/530


We can use the dma_mask if we need bouncing or not here. Then we can
safely fix drivers that misuse NETIF_F_HIGHDMA.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 net/core/dev.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5e3dc28..3a09f76 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -129,6 +129,7 @@
 #include <linux/jhash.h>
 #include <linux/random.h>
 #include <trace/events/napi.h>
+#include <linux/pci.h>
 
 #include "net-sysfs.h"
 
@@ -1790,14 +1791,21 @@ static inline int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 {
 #ifdef CONFIG_HIGHMEM
 	int i;
+	if (!(dev->features & NETIF_F_HIGHDMA)) {
+		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
+			if (PageHighMem(skb_shinfo(skb)->frags[i].page))
+				return 1;
+	}
 
-	if (dev->features & NETIF_F_HIGHDMA)
-		return 0;
-
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-		if (PageHighMem(skb_shinfo(skb)->frags[i].page))
-			return 1;
+	if (PCI_DMA_BUS_IS_PHYS) {
+		struct device *pdev = dev->dev.parent;
 
+		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+			dma_addr_t addr = page_to_phys(skb_shinfo(skb)->frags[i].page);
+			if (!pdev->dma_mask || addr + PAGE_SIZE - 1 > *pdev->dma_mask)
+				return 1;
+		}
+	}
 #endif
 	return 0;
 }
-- 
1.7.0

^ permalink raw reply related

* Re: [PATCH] MACB: Set PHY address in kernel parameters
From: Patrick McHardy @ 2010-03-31  8:28 UTC (permalink / raw)
  To: Anders Darander
  Cc: Haavard Skinnemoen, David S. Miller, Jiri Pirko, Erik Waling,
	Anders Darander, Grant Likely, netdev, linux-kernel
In-Reply-To: <1270021902-6556-1-git-send-email-anders.darander@gmail.com>

Anders Darander wrote:
> From: Anders Darander <ad@datarespons.se>
> 
> Add the possibility to set the phy address. This is needed if an integrated
> switch is connected to the MAC, as it is often the case that the highest port
> is the one connected to the MAC of the MCU.
> 
> E.g. in the case of the Micrel KSZ8873, port 3 is the one to connect to the
> MCU, thus, the MAC needs to connect to phy address 0x03, instead of the first
> phy found.
> 
> Signed-off-by: Anders Darander <ad@datarespons.se>
> ---
>  drivers/net/macb.c |   14 +++++++++++++-
>  1 files changed, 13 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/macb.c b/drivers/net/macb.c
> index c8a18a6..9b4e301 100644
> --- a/drivers/net/macb.c
> +++ b/drivers/net/macb.c
> @@ -53,6 +53,14 @@
>  #define MACB_RX_INT_FLAGS	(MACB_BIT(RCOMP) | MACB_BIT(RXUBR)	\
>  				 | MACB_BIT(ISR_ROVR))
>  
> +/*
> + * Setup PHY probeing
> + */
> +
> +static int phy_addr = PHY_MAX_ADDR;
> +module_param(phy_addr, ushort, 0);
> +MODULE_PARAM_DESC(phy_addr, "PHY address connected to the MACB");
> +
>  static void __macb_set_hwaddr(struct macb *bp)
>  {
>  	u32 bottom;
> @@ -193,7 +201,11 @@ static int macb_mii_probe(struct net_device *dev)
>  	struct eth_platform_data *pdata;
>  	int ret;
>  
> -	phydev = phy_find_first(bp->mii_bus);
> +	if (phy_addr >= PHY_MAX_ADDRESS)
> +		phydev = phy_find_first(bp->mii_bus);
> +	else
> +		phydev = bp->mii_bus->phy_map[phy_addr];

This looks like you need to use an unsigned to avoid negative
indices.

^ permalink raw reply

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
From: Tejun Heo @ 2010-03-31  8:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery
In-Reply-To: <4BB30520.2030100@kernel.org>

Just wanna add a bit more.

On 03/31/2010 05:17 PM, Tejun Heo wrote:
> If you think all those callbacks are absolute necessities, can you
> please at least add boatload of comments around them explaning what
> they're meant to do and how they're gonna be used?  It's probably
> because I don't have any experience with namespaces but I really can't
> wrap my head around it as it currently stands.

The reason why I talked about proper layering is the same reason.
It's very difficult to review your code because I have no idea how
those callbacks are meant to be used and gonna behave and that lowers
maintainability significantly in the long run.  If at all possible,
please make it implement a discrete function which is used to
implement something higher up.  If it's already done like that and I'm
just being stupid, please feel free to enlighten me.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
From: Tejun Heo @ 2010-03-31  8:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery
In-Reply-To: <m11vf12ayb.fsf@fess.ebiederm.org>

Hello, Eric.

On 03/31/2010 04:43 PM, Eric W. Biederman wrote:
>> Is it at all possible to implement it in properly layered manner?
>> ie. sysfs providing mechanisms for selective visibility, driver model
>> wraps it and exports it and namespace implements namespaces on top of
>> those mechanisms?
> 
> I think that is roughly what I have.

Yeah, well, in a sense.  It's all a matter of degree.

> As for the layering itself.  Down in sysfs there are only two bits
> visible.  A void * pointer that in addition to the name is what we use
> to define selective visibility.  A context that we capture when we
> mount sysfs.  Those bits are fundamental things sysfs needs to do.

Well, I guess all I wanna say is... is there *ANY* way to do it w/
less callbacks?  It's very difficult to follow what's going on for
what.

If you think all those callbacks are absolute necessities, can you
please at least add boatload of comments around them explaning what
they're meant to do and how they're gonna be used?  It's probably
because I don't have any experience with namespaces but I really can't
wrap my head around it as it currently stands.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Sascha Hauer @ 2010-03-31  8:15 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Bryan Wu, gerg, amit.kucheria, kernel-team, netdev, linux-kernel,
	linux-arm-kernel
In-Reply-To: <20100331081004.GB23391@pengutronix.de>

On Wed, Mar 31, 2010 at 10:10:04AM +0200, Wolfram Sang wrote:
> > With rounding this would be:
> > 
> > (clk_get_rate() + 4999999) / 5000000
> 
> Better use DIV_ROUND_UP(clk_get_rate(), 5000000)?

Yes.
Sometimes I wish the useful assorted kernel helper of the day in my
fortune :)

Sascha

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Sascha Hauer @ 2010-03-31  8:12 UTC (permalink / raw)
  To: Bryan Wu, gerg, netdev, kernel-team, linux-kernel,
	linux-arm-kernel, w.sang
In-Reply-To: <20100326102957.GG5539@matterhorn.verdurent.com>

On Fri, Mar 26, 2010 at 12:29:57PM +0200, Amit Kucheria wrote:
> On 10 Mar 26, Bryan Wu wrote:
> > BugLink: http://bugs.launchpad.net/bugs/457878
> > 
> >  - removed old MII phy control code
> >  - add phylib supporting
> >  - add ethtool interface to make user space NetworkManager works
> > 
> > Tested on Freescale i.MX51 Babbage board.
> > 
> > This patch is based on a patch from Frederic Rodo <fred.rodo@gmail.com>
> > 
> > Cc: Frederic Rodo <fred.rodo@gmail.com>
> > Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
> 
> While I ack this patch, I wonder if we should add to the various board
> Kconfig options, a dependency select'ing the right phylib for that board.
> 
> This would prevent breakage of ethernet on those boards because they forgot
> to select the right phylib after this change.

I wouldn't go that far. I never saw a board which doesn't work with the
generic phy implementation, so I would only select the specific phy if
I knew that the generic phy does not work.

Sascha


-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Wolfram Sang @ 2010-03-31  8:10 UTC (permalink / raw)
  To: Sascha Hauer
  Cc: Bryan Wu, gerg, amit.kucheria, kernel-team, netdev, linux-kernel,
	linux-arm-kernel
In-Reply-To: <20100331075746.GO2241@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 314 bytes --]

> With rounding this would be:
> 
> (clk_get_rate() + 4999999) / 5000000

Better use DIV_ROUND_UP(clk_get_rate(), 5000000)?

Regards,

   Wolfram

-- 
Pengutronix e.K.                           | Wolfram Sang                |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* iproute2: silence errors about kernel missing 6rd on "ip tun show".
From: Andreas Henriksson @ 2010-03-31  8:08 UTC (permalink / raw)
  To: shemminger, Alexandre Cassen; +Cc: 575970, netdev

Hello!

As reported in http://bugs.debian.org/575970 there is currently a warning
printed for every tunnel when using latest iproute2 on atleast <= 2.6.32
kernels (missing 6rd?!).

The attached patch avoids perror when errno is EINVAL, which I assume
is the way to detect missing 6rd support. A better/cleaner
method to detect and avoid 6rd when there's no kernel support
is more then welcome.

Regards,
Andreas Henriksson


diff --git a/ip/tunnel.c b/ip/tunnel.c
index d389e86..bbb60bf 100644
--- a/ip/tunnel.c
+++ b/ip/tunnel.c
@@ -29,6 +29,7 @@
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <errno.h>
 #include <netinet/in.h>
 #include <linux/if.h>
 #include <linux/ip.h>
@@ -168,7 +169,7 @@ int tnl_del_ioctl(const char *basedev, const char *name, void *p)
 	return err;
 }
 
-static int tnl_gen_ioctl(int cmd, const char *name, void *p)
+static int tnl_gen_ioctl(int cmd, const char *name, void *p, int quiet)
 {
 	struct ifreq ifr;
 	int fd;
@@ -178,7 +179,7 @@ static int tnl_gen_ioctl(int cmd, const char *name, void *p)
 	ifr.ifr_ifru.ifru_data = p;
 	fd = socket(preferred_family, SOCK_DGRAM, 0);
 	err = ioctl(fd, cmd, &ifr);
-	if (err)
+	if (err && !quiet)
 		perror("ioctl");
 	close(fd);
 	return err;
@@ -186,15 +187,18 @@ static int tnl_gen_ioctl(int cmd, const char *name, void *p)
 
 int tnl_prl_ioctl(int cmd, const char *name, void *p)
 {
-	return tnl_gen_ioctl(cmd, name, p);
+	return tnl_gen_ioctl(cmd, name, p, 0);
 }
 
 int tnl_6rd_ioctl(int cmd, const char *name, void *p)
 {
-	return tnl_gen_ioctl(cmd, name, p);
+	return tnl_gen_ioctl(cmd, name, p, 0);
 }
 
 int tnl_ioctl_get_6rd(const char *name, void *p)
 {
-	return tnl_gen_ioctl(SIOCGET6RD, name, p);
+	int err = tnl_gen_ioctl(SIOCGET6RD, name, p, 1);
+	if (err && errno != EINVAL)
+		perror("ioctl");
+	return err;
 }

^ permalink raw reply related

* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Sascha Hauer @ 2010-03-31  7:57 UTC (permalink / raw)
  To: Bryan Wu
  Cc: gerg, amit.kucheria, kernel-team, netdev, linux-kernel,
	linux-arm-kernel, w.sang
In-Reply-To: <1269597052-10104-1-git-send-email-bryan.wu@canonical.com>

On Fri, Mar 26, 2010 at 05:50:52PM +0800, Bryan Wu wrote:
> BugLink: http://bugs.launchpad.net/bugs/457878
> 
>  - removed old MII phy control code
>  - add phylib supporting
>  - add ethtool interface to make user space NetworkManager works
> 
> Tested on Freescale i.MX51 Babbage board.
> 
> This patch is based on a patch from Frederic Rodo <fred.rodo@gmail.com>
> 


Hi Bryan,

The MII speed is calculated twice with this patch applied, one time
in fec_enet_mii_init and one time in fec_enet_init.

The patch didn't work for me because the calculation is wrong (which of
course is not your fault, it was wrong before).
According to the Datasheet the MII clock is clk_get_rate() / (MII * 2).
When we want to have a clock of 2.5MHz we have to do:

clk_get_rate() / 5000000

With rounding this would be:

(clk_get_rate() + 4999999) / 5000000

So you might want to add the following to your patch:

	fep->phy_speed = ((clk_get_rate(fep->clk) + 4999999) / 5000000) << 1;

Otherwise the patch looks good to me.

Sascha

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* [PATCH] MACB: Set PHY address in kernel parameters
From: Anders Darander @ 2010-03-31  7:51 UTC (permalink / raw)
  To: Haavard Skinnemoen
  Cc: David S. Miller, Jiri Pirko, Erik Waling, Patrick McHardy,
	Anders Darander, Grant Likely, netdev, linux-kernel

From: Anders Darander <ad@datarespons.se>

Add the possibility to set the phy address. This is needed if an integrated
switch is connected to the MAC, as it is often the case that the highest port
is the one connected to the MAC of the MCU.

E.g. in the case of the Micrel KSZ8873, port 3 is the one to connect to the
MCU, thus, the MAC needs to connect to phy address 0x03, instead of the first
phy found.

Signed-off-by: Anders Darander <ad@datarespons.se>
---
 drivers/net/macb.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/net/macb.c b/drivers/net/macb.c
index c8a18a6..9b4e301 100644
--- a/drivers/net/macb.c
+++ b/drivers/net/macb.c
@@ -53,6 +53,14 @@
 #define MACB_RX_INT_FLAGS	(MACB_BIT(RCOMP) | MACB_BIT(RXUBR)	\
 				 | MACB_BIT(ISR_ROVR))
 
+/*
+ * Setup PHY probeing
+ */
+
+static int phy_addr = PHY_MAX_ADDR;
+module_param(phy_addr, ushort, 0);
+MODULE_PARAM_DESC(phy_addr, "PHY address connected to the MACB");
+
 static void __macb_set_hwaddr(struct macb *bp)
 {
 	u32 bottom;
@@ -193,7 +201,11 @@ static int macb_mii_probe(struct net_device *dev)
 	struct eth_platform_data *pdata;
 	int ret;
 
-	phydev = phy_find_first(bp->mii_bus);
+	if (phy_addr >= PHY_MAX_ADDRESS)
+		phydev = phy_find_first(bp->mii_bus);
+	else
+		phydev = bp->mii_bus->phy_map[phy_addr];
+
 	if (!phydev) {
 		printk (KERN_ERR "%s: no PHY found\n", dev->name);
 		return -1;
-- 
1.7.0.3

^ permalink raw reply related

* Re: eth1: Detected Hardware Unit Hang
From: Paweł Staszewski @ 2010-03-31  7:47 UTC (permalink / raw)
  To: Allan, Bruce W
  Cc: Linux Network Development list, e1000-devel@lists.sourceforge.net
In-Reply-To: <4BB0E394.2060908@itcare.pl>

Hello

I reproduce this problem on other machine with the same hardware and 
here is dmesg output: (kernel 2.6.33)

Mar 27 18:19:16 TM_01_C1 [1817894.769395] 0000:04:00.0: eth0: Detected 
Hardware Unit Hang:
Mar 27 18:19:16 TM_01_C1 [1817894.769396]   TDH <2e>
Mar 27 18:19:16 TM_01_C1 [1817894.769397]   TDT <1a>
Mar 27 18:19:16 TM_01_C1 [1817894.769397]   next_to_use <1a>
Mar 27 18:19:16 TM_01_C1 [1817894.769398]   next_to_clean <2d>
Mar 27 18:19:16 TM_01_C1 [1817894.769398] buffer_info[next_to_clean]:
Mar 27 18:19:16 TM_01_C1 [1817894.769399]   time_stamp <11b1591e9>
Mar 27 18:19:16 TM_01_C1 [1817894.769399]   next_to_watch <2f>
Mar 27 18:19:16 TM_01_C1 [1817894.769400]   jiffies <11b1592e4>
Mar 27 18:19:16 TM_01_C1 [1817894.769401]   next_to_watch.status <0>
Mar 27 18:19:16 TM_01_C1 [1817894.769401] MAC Status <80080783>
Mar 27 18:19:16 TM_01_C1 [1817894.769402] PHY Status <796d>
Mar 27 18:19:16 TM_01_C1 [1817894.769402] PHY 1000BASE-T Status <3800>
Mar 27 18:19:16 TM_01_C1 [1817894.769403] PHY Extended Status <3000>
Mar 27 18:19:16 TM_01_C1 [1817894.769404] PCI Status <10>
Mar 27 18:19:18 TM_01_C1 [1817896.773365] 0000:04:00.0: eth0: Detected 
Hardware Unit Hang:
Mar 27 18:19:18 TM_01_C1 [1817896.773367]   TDH <2e>
Mar 27 18:19:18 TM_01_C1 [1817896.773368]   TDT <1a>
Mar 27 18:19:18 TM_01_C1 [1817896.773368]   next_to_use <1a>
Mar 27 18:19:18 TM_01_C1 [1817896.773369]   next_to_clean <2d>
Mar 27 18:19:18 TM_01_C1 [1817896.773369] buffer_info[next_to_clean]:
Mar 27 18:19:18 TM_01_C1 [1817896.773370]   time_stamp <11b1591e9>
Mar 27 18:19:18 TM_01_C1 [1817896.773370]   next_to_watch <2f>
Mar 27 18:19:18 TM_01_C1 [1817896.773371]   jiffies <11b1594d8>
Mar 27 18:19:18 TM_01_C1 [1817896.773372]   next_to_watch.status <0>
Mar 27 18:19:18 TM_01_C1 [1817896.773372] MAC Status <80080783>
Mar 27 18:19:18 TM_01_C1 [1817896.773373] PHY Status <796d>
Mar 27 18:19:18 TM_01_C1 [1817896.773373] PHY 1000BASE-T Status <3800>
Mar 27 18:19:18 TM_01_C1 [1817896.773374] PHY Extended Status <3000>
Mar 27 18:19:18 TM_01_C1 [1817896.773375] PCI Status <10>
Mar 27 18:19:20 TM_01_C1 [1817898.769353] 0000:04:00.0: eth0: Detected 
Hardware Unit Hang:
Mar 27 18:19:20 TM_01_C1 [1817898.769355]   TDH <2e>
Mar 27 18:19:20 TM_01_C1 [1817898.769355]   TDT <1a>
Mar 27 18:19:20 TM_01_C1 [1817898.769356]   next_to_use <1a>
Mar 27 18:19:20 TM_01_C1 [1817898.769356]   next_to_clean <2d>
Mar 27 18:19:20 TM_01_C1 [1817898.769357] buffer_info[next_to_clean]:
Mar 27 18:19:20 TM_01_C1 [1817898.769358]   time_stamp <11b1591e9>
Mar 27 18:19:20 TM_01_C1 [1817898.769358]   next_to_watch <2f>
Mar 27 18:19:20 TM_01_C1 [1817898.769359]   jiffies <11b1596cc>
Mar 27 18:19:20 TM_01_C1 [1817898.769359]   next_to_watch.status <0>
Mar 27 18:19:20 TM_01_C1 [1817898.769360] MAC Status <80080783>
Mar 27 18:19:20 TM_01_C1 [1817898.769361] PHY Status <796d>
Mar 27 18:19:20 TM_01_C1 [1817898.769361] PHY 1000BASE-T Status <3800>
Mar 27 18:19:20 TM_01_C1 [1817898.769362] PHY Extended Status <3000>
Mar 27 18:19:20 TM_01_C1 [1817898.769362] PCI Status <18>
Mar 27 18:19:21 TM_01_C1 [1817899.773012] ------------[ cut here 
]------------
Mar 27 18:19:21 TM_01_C1 [1817899.773023] WARNING: at 
net/sched/sch_generic.c:255 dev_watchdog+0x130/0x1d3()
Mar 27 18:19:21 TM_01_C1 [1817899.773026] Hardware name: X7DCT
Mar 27 18:19:21 TM_01_C1 [1817899.773028] NETDEV WATCHDOG: eth0 
(e1000e): transmit queue 0 timed out
Mar 27 18:19:21 TM_01_C1 [1817899.773030] Modules linked in: coretemp 
hwmon_vid hwmon [last unloaded: w83627hf]
Mar 27 18:19:21 TM_01_C1 [1817899.773038] Pid: 0, comm: swapper Not 
tainted 2.6.33 #2
Mar 27 18:19:21 TM_01_C1 [1817899.773040] Call Trace:
Mar 27 18:19:21 TM_01_C1 [1817899.773042] <IRQ>  [<ffffffff813003b3>] ? 
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773050]  [<ffffffff813003b3>] ? 
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773055]  [<ffffffff81032d1a>] ? 
warn_slowpath_common+0x77/0xa3
Mar 27 18:19:21 TM_01_C1 [1817899.773059]  [<ffffffff81032da2>] ? 
warn_slowpath_fmt+0x51/0x59
Mar 27 18:19:21 TM_01_C1 [1817899.773064]  [<ffffffff8102910c>] ? 
enqueue_task_fair+0x3e/0xa1
Mar 27 18:19:21 TM_01_C1 [1817899.773068]  [<ffffffff8102f0c2>] ? 
try_to_wake_up+0x368/0x379
Mar 27 18:19:21 TM_01_C1 [1817899.773072]  [<ffffffff812ee612>] ? 
netdev_drivername+0x3b/0x40
Mar 27 18:19:21 TM_01_C1 [1817899.773075]  [<ffffffff813003b3>] ? 
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773079]  [<ffffffff81026d60>] ? 
__wake_up+0x30/0x44
Mar 27 18:19:21 TM_01_C1 [1817899.773082]  [<ffffffff81300283>] ? 
dev_watchdog+0x0/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773087]  [<ffffffff8103f5d1>] ? 
run_timer_softirq+0x200/0x29e
Mar 27 18:19:21 TM_01_C1 [1817899.773091]  [<ffffffff810386f6>] ? 
__do_softirq+0xd7/0x195
Mar 27 18:19:21 TM_01_C1 [1817899.773099]  [<ffffffff810152b3>] ? 
lapic_next_event+0x18/0x1d
Mar 27 18:19:21 TM_01_C1 [1817899.773104]  [<ffffffff81002e0c>] ? 
call_softirq+0x1c/0x28
Mar 27 18:19:21 TM_01_C1 [1817899.773107]  [<ffffffff81004811>] ? 
do_softirq+0x31/0x63
Mar 27 18:19:21 TM_01_C1 [1817899.773110]  [<ffffffff810384eb>] ? 
irq_exit+0x36/0x78
Mar 27 18:19:21 TM_01_C1 [1817899.773113]  [<ffffffff81015d0b>] ? 
smp_apic_timer_interrupt+0x87/0x95
Mar 27 18:19:21 TM_01_C1 [1817899.773117]  [<ffffffff810028d3>] ? 
apic_timer_interrupt+0x13/0x20
Mar 27 18:19:21 TM_01_C1 [1817899.773119] <EOI>  [<ffffffff81008bdd>] ? 
mwait_idle+0x9b/0xa0
Mar 27 18:19:21 TM_01_C1 [1817899.773126]  [<ffffffff81001385>] ? 
cpu_idle+0x53/0x8b
Mar 27 18:19:21 TM_01_C1 [1817899.773128] ---[ end trace 
4ac842842c6f54b3 ]---

ethtool -i eth0
driver: e1000e
version: 1.0.2-k2
firmware-version: 0.15-5
bus-info: 0000:04:00.0

NIC statistics:
      rx_packets: 8202754725
      tx_packets: 7398272195
      rx_bytes: 4373145698252
      tx_bytes: 5234354904619
      rx_broadcast: 59775
      tx_broadcast: 405
      rx_multicast: 0
      tx_multicast: 0
      rx_errors: 0
      tx_errors: 0
      tx_dropped: 0
      multicast: 0
      collisions: 0
      rx_length_errors: 0
      rx_over_errors: 0
      rx_crc_errors: 0
      rx_frame_errors: 0
      rx_no_buffer_count: 1185
      rx_missed_errors: 1466
      tx_aborted_errors: 0
      tx_carrier_errors: 0
      tx_fifo_errors: 0
      tx_heartbeat_errors: 0
      tx_window_errors: 0
      tx_abort_late_coll: 0
      tx_deferred_ok: 0
      tx_single_coll_ok: 0
      tx_multi_coll_ok: 0
      tx_timeout_count: 0
      tx_restart_queue: 12
      rx_long_length_errors: 0
      rx_short_length_errors: 0
      rx_align_errors: 0
      tx_tcp_seg_good: 0
      tx_tcp_seg_failed: 0
      rx_flow_control_xon: 0
      rx_flow_control_xoff: 0
      tx_flow_control_xon: 0
      tx_flow_control_xoff: 0
      rx_long_byte_count: 4373145698252
      rx_csum_offload_good: 8084424290
      rx_csum_offload_errors: 5690
      rx_header_split: 0
      alloc_rx_buff_failed: 0
      tx_smbus: 0
      rx_smbus: 48588
      dropped_smbus: 0
      rx_dma_failed: 0
      tx_dma_failed: 0


Wnen this occured traffic was about -  RX: 360Mbit/s  and  TX: 340Mbit - 
for eth0 interface.



W dniu 2010-03-29 19:29, Paweł Staszewski pisze:
> lspci -vvv + ethtool -S in attached files.
>
> Network traffic when i get this info:
> eth1:    RX:    157.22 Mb/s    TX:    379.27 Mb/s
>
> ethtool -i eth1
> driver: e1000e
> version: 1.0.2-k2
> firmware-version: 0.5-7
> bus-info: 0000:05:00.0
> This is: Intel Corporation 82573L Gigabit Ethernet Controller
>
>
> But in this server i have another gigabit interface:
> Intel Corporation 82573E Gigabit Ethernet Controller
> this interface has two times more traffic than eth0 (82573L)
> ethtool -i eth0
> driver: e1000e
> version: 1.0.2-k2
> firmware-version: 0.15-5
> bus-info: 0000:04:00.0
>
> And also this server was working 4months without problems on 2.6.29.1 
> kernel
>
> Drivers that I use for e1000e are from kernel (standard kernel 
> build-in e1000e driver).
> I don't tried other drivers.
>
> This is production server so I can't make too much tests.
>
>
> W dniu 2010-03-29 18:41, Allan, Bruce W pisze:
>> [adding e1000-devel]
>>
>> Please provide more information:
>> * what NIC/LOM is this on (preferably send full output from lspci -vvv)
>> * what type of networking workload is running at the time the hang 
>> occurred
>> * a dump of the NIC/LOM statistics might also help (ethtool -S eth1)
>>
>> Have you tried the latest standalone e1000e driver on e1000.sf.net?  
>> Does it reproduce the issue?
>>
>> If we cannot reproduce the hang in-house, would you be able/willing 
>> to run a debug driver to gather more information?
>>
>> Thanks,
>> Bruce.
>>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org 
>> [mailto:netdev-owner@vger.kernel.org] On Behalf Of Pawel Staszewski
>> Sent: Monday, March 29, 2010 8:34 AM
>> To: Linux Network Development list
>> Subject: eth1: Detected Hardware Unit Hang
>>
>> After update to kernel from 2.6.29.1 to 2.6.33.1 i have this info in 
>> dmesg:
>>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>>     TDH<1e>
>>     TDT<a>
>>     next_to_use<a>
>>     next_to_clean<1d>
>> buffer_info[next_to_clean]:
>>     time_stamp<33bae15>
>>     next_to_watch<20>
>>     jiffies<33bafaf>
>>     next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>>     TDH<1e>
>>     TDT<a>
>>     next_to_use<a>
>>     next_to_clean<1d>
>> buffer_info[next_to_clean]:
>>     time_stamp<33bae15>
>>     next_to_watch<20>
>>     jiffies<33bb1a3>
>>     next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>>     TDH<1e>
>>     TDT<a>
>>     next_to_use<a>
>>     next_to_clean<1d>
>> buffer_info[next_to_clean]:
>>     time_stamp<33bae15>
>>     next_to_watch<20>
>>     jiffies<33bb397>
>>     next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> ------------[ cut here ]------------
>> WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x118/0x19c()
>> Hardware name: X7DCT
>> NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
>> Modules linked in:
>> Pid: 0, comm: swapper Not tainted 2.6.33.1 #2
>> Call Trace:
>>    [<c1024e3d>] ? warn_slowpath_common+0x52/0x71
>>    [<c1024e49>] ? warn_slowpath_common+0x5e/0x71
>>    [<c1024e8e>] ? warn_slowpath_fmt+0x26/0x2a
>>    [<c1261f54>] ? dev_watchdog+0x118/0x19c
>>    [<c102135c>] ? __wake_up+0x29/0x39
>>    [<c10320c6>] ? insert_work+0x40/0x44
>>    [<c1261e3c>] ? dev_watchdog+0x0/0x19c
>>    [<c102cc15>] ? run_timer_softirq+0x11a/0x173
>>    [<c1028e5b>] ? __do_softirq+0x74/0xdf
>>    [<c1028ee9>] ? do_softirq+0x23/0x27
>>    [<c10290be>] ? irq_exit+0x26/0x58
>>    [<c10102d7>] ? smp_apic_timer_interrupt+0x6c/0x76
>>    [<c12c5f9a>] ? apic_timer_interrupt+0x2a/0x30
>>    [<c1007e06>] ? mwait_idle+0x49/0x4e
>>    [<c10017e8>] ? cpu_idle+0x41/0x5a
>> ---[ end trace bcca9926a046332c ]---
>>
>>
>> With kernel 2.6.29.1 all was ok.
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox