Netdev List
 help / color / mirror / Atom feed
* Re: Flow Control and Port Mirroring Revisited
From: Michael S. Tsirkin @ 2011-01-06 12:07 UTC (permalink / raw)
  To: Simon Horman
  Cc: Rusty Russell, virtualization, Jesse Gross, dev, virtualization,
	netdev, kvm
In-Reply-To: <20110106113052.GA2541@verge.net.au>

On Thu, Jan 06, 2011 at 08:30:52PM +0900, Simon Horman wrote:
> On Thu, Jan 06, 2011 at 12:27:55PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Jan 06, 2011 at 06:33:12PM +0900, Simon Horman wrote:
> > > Hi,
> > > 
> > > Back in October I reported that I noticed a problem whereby flow control
> > > breaks down when openvswitch is configured to mirror a port[1].
> > 
> > Apropos the UDP flow control.  See this
> > http://www.spinics.net/lists/netdev/msg150806.html
> > for some problems it introduces.
> > Unfortunately UDP does not have built-in flow control.
> > At some level it's just conceptually broken:
> > it's not present in physical networks so why should
> > we try and emulate it in a virtual network?
> > 
> > 
> > Specifically, when you do:
> > # netperf -c -4 -t UDP_STREAM -H 172.17.60.218 -l 30 -- -m 1472
> > You are asking: what happens if I push data faster than it can be received?
> > But why is this an interesting question?
> > Ask 'what is the maximum rate at which I can send data with %X packet
> > loss' or 'what is the packet loss at rate Y Gb/s'. netperf has
> > -b and -w flags for this. It needs to be configured
> > with --enable-intervals=yes for them to work.
> > 
> > If you pose the questions this way the problem of pacing
> > the execution just goes away.
> 
> I am aware that UDP inherently lacks flow control.

Everyone's is aware of that, but this is always followed by a 'however'
:).

> The aspect of flow control that I am interested in is situations where the
> guest can create large amounts of work for the host. However, it seems that
> in the case of virtio with vhostnet that the CPU utilisation seems to be
> almost entirely attributable to the vhost and qemu-system processes.  And
> in the case of virtio without vhost net the CPU is used by the qemu-system
> process. In both case I assume that I could use a cgroup or something
> similar to limit the guests.

cgroups, yes. the vhost process inherits the cgroups
from the qemu process so you can limit them all.

If you are after limiting the max troughput of the guest
you can do this with cgroups as well.

> Assuming all of that is true then from a resource control problem point of
> view, which is mostly what I am concerned about, the problem goes away.
> However, I still think that it would be nice to resolve the situation I
> described.

We need to articulate what's wrong here, otherwise we won't
be able to resolve the situation. We are sending UDP packets
as fast as we can and some receivers can't cope. Is this the problem?
We have made attempts to add a pseudo flow control in the past
in an attempt to make UDP on the same host work better.
Maybe they help some but they also sure introduce problems.

-- 
MST

^ permalink raw reply

* via-velocity: corrupted mac, deadlock on link
From: Janne Karhunen @ 2011-01-06 11:42 UTC (permalink / raw)
  To: netdev

Hey,

Just got my hands on via-velocity pci-e card and tried to give it
a go on new Asus Nvidia Ion2 board. Driver loads fine and everything
seems good until device gets a link - kernel deadlocks instantly. During
bootup hang seems to happen when persistent-net rules rename device
to eth1, after bootup instantly when cable is plugged (first interrupt?).

Any other quick suggestions other than to throw it away as DOA?

Linux rivendell 2.6.35-23-generic #41-Ubuntu SMP Wed Nov 24 10:18:49
UTC 2010 i686 GNU/Linux

eth1      Link encap:Ethernet  HWaddr 00:00:00:00:00:04  (??????????)

[    1.171890] eth0: VIA Networking Velocity Family Gigabit Ethernet Adapter
[   18.468029] via-velocity 0000:0b:00.0: BAR 0: set to [io
0xe800-0xe8ff] (PCI address [0xe800-0xe8ff]
[   18.468048] via-velocity 0000:0b:00.0: BAR 1: set to [mem
0xfbfffc00-0xfbfffcff 64bit] (PCI address [0xfbfffc00-0xfbfffcff]

0b:00.0 Ethernet controller: VIA Technologies, Inc.
VT6120/VT6121/VT6122 Gigabit Ethernet Adapter (rev 82)

 16:          0          0          0          0   IO-APIC-fasteoi
uhci_hcd:usb5
 17:          0          0          0          0   IO-APIC-fasteoi   eth1
 18:       4639       4703       4637       4690   IO-APIC-fasteoi
uhci_hcd:usb4, ahci, hda_intel, nvidia
 19:          0          0          0          1   IO-APIC-fasteoi
uhci_hcd:usb3, xhci_hcd:usb6
 23:      14772      14895      14872      14730   IO-APIC-fasteoi
ehci_hcd:usb1, uhci_hcd:usb2
 40:          0          0          0          0   PCI-MSI-edge      pciehp
 41:          0          0          0          0   PCI-MSI-edge      pciehp
 42:          0          0          0          0   PCI-MSI-edge      pciehp
 43:          0          0          0          0   PCI-MSI-edge      pciehp
 52:      10951      10844      10829      10926   PCI-MSI-edge      eth0


-- 
// Janne

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Simon Horman @ 2011-01-06 11:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, virtualization, Jesse Gross, dev, virtualization,
	netdev, kvm
In-Reply-To: <20110106102755.GC12142@redhat.com>

On Thu, Jan 06, 2011 at 12:27:55PM +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 06, 2011 at 06:33:12PM +0900, Simon Horman wrote:
> > Hi,
> > 
> > Back in October I reported that I noticed a problem whereby flow control
> > breaks down when openvswitch is configured to mirror a port[1].
> 
> Apropos the UDP flow control.  See this
> http://www.spinics.net/lists/netdev/msg150806.html
> for some problems it introduces.
> Unfortunately UDP does not have built-in flow control.
> At some level it's just conceptually broken:
> it's not present in physical networks so why should
> we try and emulate it in a virtual network?
> 
> 
> Specifically, when you do:
> # netperf -c -4 -t UDP_STREAM -H 172.17.60.218 -l 30 -- -m 1472
> You are asking: what happens if I push data faster than it can be received?
> But why is this an interesting question?
> Ask 'what is the maximum rate at which I can send data with %X packet
> loss' or 'what is the packet loss at rate Y Gb/s'. netperf has
> -b and -w flags for this. It needs to be configured
> with --enable-intervals=yes for them to work.
> 
> If you pose the questions this way the problem of pacing
> the execution just goes away.

I am aware that UDP inherently lacks flow control.

The aspect of flow control that I am interested in is situations where the
guest can create large amounts of work for the host. However, it seems that
in the case of virtio with vhostnet that the CPU utilisation seems to be
almost entirely attributable to the vhost and qemu-system processes.  And
in the case of virtio without vhost net the CPU is used by the qemu-system
process. In both case I assume that I could use a cgroup or something
similar to limit the guests.

Assuming all of that is true then from a resource control problem point of
view, which is mostly what I am concerned about, the problem goes away.
However, I still think that it would be nice to resolve the situation I
described.

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Michael S. Tsirkin @ 2011-01-06 10:27 UTC (permalink / raw)
  To: Simon Horman
  Cc: Rusty Russell, virtualization, Jesse Gross, dev, virtualization,
	netdev, kvm
In-Reply-To: <20110106093312.GA1564@verge.net.au>

On Thu, Jan 06, 2011 at 06:33:12PM +0900, Simon Horman wrote:
> Hi,
> 
> Back in October I reported that I noticed a problem whereby flow control
> breaks down when openvswitch is configured to mirror a port[1].

Apropos the UDP flow control.  See this
http://www.spinics.net/lists/netdev/msg150806.html
for some problems it introduces.
Unfortunately UDP does not have built-in flow control.
At some level it's just conceptually broken:
it's not present in physical networks so why should
we try and emulate it in a virtual network?


Specifically, when you do:
# netperf -c -4 -t UDP_STREAM -H 172.17.60.218 -l 30 -- -m 1472
You are asking: what happens if I push data faster than it can be received?
But why is this an interesting question?
Ask 'what is the maximum rate at which I can send data with %X packet
loss' or 'what is the packet loss at rate Y Gb/s'. netperf has
-b and -w flags for this. It needs to be configured
with --enable-intervals=yes for them to work.

If you pose the questions this way the problem of pacing
the execution just goes away.

> 
> I have (finally) looked into this further and the problem appears to relate
> to cloning of skbs, as Jesse Gross originally suspected.
> 
> More specifically, in do_execute_actions[2] the first n-1 times that an skb
> needs to be transmitted it is cloned first and the final time the original
> skb is used.
> 
> In the case that there is only one action, which is the normal case, then
> the original skb will be used. But in the case of mirroring the cloning
> comes into effect. And in my case the cloned skb seems to go to the (slow)
> eth1 interface while the original skb goes to the (fast) dummy0 interface
> that I set up to be a mirror. The result is that dummy0 "paces" the flow,
> and its a cracking pace at that.
> 
> As an experiment I hacked do_execute_actions() to use the original skb
> for the first action instead of the last one.  In my case the result was
> that eth1 "paces" the flow, and things work reasonably nicely.
> 
> Well, sort of. Things work well for non-GSO skbs but extremely poorly for
> GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
> netserv. I'm unsure why, but I digress.
> 
> It seems to me that my hack illustrates the point that the flow ends up
> being "paced" by one interface. However I think that what would be
> desirable is that the flow is "paced" by the slowest link. Unfortunately
> I'm unsure how to achieve that.

What if you have multiple UDP sockets with different targets
in the guest?

> One idea that I had was to skb_get() the original skb each time it is
> cloned - that is easy enough. But unfortunately it seems to me that
> approach would require some sort of callback mechanism in kfree_skb() so
> that the cloned skbs can kfree_skb() the original skb.
> 
> Ideas would be greatly appreciated.
> 
> [1] http://openvswitch.org/pipermail/dev_openvswitch.org/2010-October/003806.html
> [2] http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/actions.c;h=5e16143ca402f7da0ee8fc18ee5eb16c3b7598e6;hb=HEAD

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Eric Dumazet @ 2011-01-06 10:22 UTC (permalink / raw)
  To: Simon Horman
  Cc: Rusty Russell, virtualization, Jesse Gross, dev, virtualization,
	netdev, kvm, Michael S. Tsirkin
In-Reply-To: <20110106093312.GA1564@verge.net.au>

Le jeudi 06 janvier 2011 à 18:33 +0900, Simon Horman a écrit :
> Hi,
> 
> Back in October I reported that I noticed a problem whereby flow control
> breaks down when openvswitch is configured to mirror a port[1].
> 
> I have (finally) looked into this further and the problem appears to relate
> to cloning of skbs, as Jesse Gross originally suspected.
> 
> More specifically, in do_execute_actions[2] the first n-1 times that an skb
> needs to be transmitted it is cloned first and the final time the original
> skb is used.
> 
> In the case that there is only one action, which is the normal case, then
> the original skb will be used. But in the case of mirroring the cloning
> comes into effect. And in my case the cloned skb seems to go to the (slow)
> eth1 interface while the original skb goes to the (fast) dummy0 interface
> that I set up to be a mirror. The result is that dummy0 "paces" the flow,
> and its a cracking pace at that.
> 
> As an experiment I hacked do_execute_actions() to use the original skb
> for the first action instead of the last one.  In my case the result was
> that eth1 "paces" the flow, and things work reasonably nicely.
> 
> Well, sort of. Things work well for non-GSO skbs but extremely poorly for
> GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
> netserv. I'm unsure why, but I digress.
> 
> It seems to me that my hack illustrates the point that the flow ends up
> being "paced" by one interface. However I think that what would be
> desirable is that the flow is "paced" by the slowest link. Unfortunately
> I'm unsure how to achieve that.
> 

Hi Simon !

"pacing" is done because skb is attached to a socket, and a socket has a
limited (but configurable) sndbuf. sk->sk_wmem_alloc is the current sum
of all truesize skbs in flight.

When you enter something that :

1) Get a clone of the skb, queue the clone to device X
2) queue the original skb to device Y

Then :	Socket sndbuf is not affected at all by device X queue.
	This is speed on device Y that matters.

You want to get servo control on both X and Y

You could try to

1) Get a clone of skb
   Attach it to socket too (so that socket get a feedback of final
orphaning for the clone) with skb_set_owner_w()
   queue the clone to device X

Unfortunatly, stacked skb->destructor() makes this possible only for
known destructor (aka sock_wfree())

> One idea that I had was to skb_get() the original skb each time it is
> cloned - that is easy enough. But unfortunately it seems to me that
> approach would require some sort of callback mechanism in kfree_skb() so
> that the cloned skbs can kfree_skb() the original skb.
> 
> Ideas would be greatly appreciated.
> 
> [1] http://openvswitch.org/pipermail/dev_openvswitch.org/2010-October/003806.html
> [2] http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/actions.c;h=5e16143ca402f7da0ee8fc18ee5eb16c3b7598e6;hb=HEAD
> --



^ permalink raw reply

* [PATCH] net: ppp: use {get,put}_unaligned_be{16,32}
From: Changli Gao @ 2011-01-06  9:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: Paul Mackerras, linux-ppp, netdev, Changli Gao

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 drivers/net/ppp_async.c   |   10 +++++-----
 drivers/net/ppp_deflate.c |    9 ++++-----
 drivers/net/ppp_generic.c |    9 ++++-----
 drivers/net/ppp_mppe.c    |    7 +++----
 drivers/net/ppp_synctty.c |    3 ++-
 5 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ppp_async.c b/drivers/net/ppp_async.c
index 78d70a6..cbe1e13 100644
--- a/drivers/net/ppp_async.c
+++ b/drivers/net/ppp_async.c
@@ -32,6 +32,7 @@
 #include <linux/init.h>
 #include <linux/jiffies.h>
 #include <linux/slab.h>
+#include <linux/unaligned/be_struct.h>
 #include <asm/uaccess.h>
 #include <asm/string.h>
 
@@ -542,7 +543,7 @@ ppp_async_encode(struct asyncppp *ap)
 	data = ap->tpkt->data;
 	count = ap->tpkt->len;
 	fcs = ap->tfcs;
-	proto = (data[0] << 8) + data[1];
+	proto = get_unaligned_be16(data);
 
 	/*
 	 * LCP packets with code values between 1 (configure-reqest)
@@ -963,7 +964,7 @@ static void async_lcp_peek(struct asyncppp *ap, unsigned char *data,
 	code = data[0];
 	if (code != CONFACK && code != CONFREQ)
 		return;
-	dlen = (data[2] << 8) + data[3];
+	dlen = get_unaligned_be16(data + 2);
 	if (len < dlen)
 		return;		/* packet got truncated or length is bogus */
 
@@ -997,15 +998,14 @@ static void async_lcp_peek(struct asyncppp *ap, unsigned char *data,
 	while (dlen >= 2 && dlen >= data[1] && data[1] >= 2) {
 		switch (data[0]) {
 		case LCP_MRU:
-			val = (data[2] << 8) + data[3];
+			val = get_unaligned_be16(data + 2);
 			if (inbound)
 				ap->mru = val;
 			else
 				ap->chan.mtu = val;
 			break;
 		case LCP_ASYNCMAP:
-			val = (data[2] << 24) + (data[3] << 16)
-				+ (data[4] << 8) + data[5];
+			val = get_unaligned_be32(data + 2);
 			if (inbound)
 				ap->raccm = val;
 			else
diff --git a/drivers/net/ppp_deflate.c b/drivers/net/ppp_deflate.c
index 695bc83..df3ce78 100644
--- a/drivers/net/ppp_deflate.c
+++ b/drivers/net/ppp_deflate.c
@@ -41,6 +41,7 @@
 #include <linux/ppp-comp.h>
 
 #include <linux/zlib.h>
+#include <linux/unaligned/be_struct.h>
 
 /*
  * State for a Deflate (de)compressor.
@@ -232,11 +233,9 @@ static int z_compress(void *arg, unsigned char *rptr, unsigned char *obuf,
 	 */
 	wptr[0] = PPP_ADDRESS(rptr);
 	wptr[1] = PPP_CONTROL(rptr);
-	wptr[2] = PPP_COMP >> 8;
-	wptr[3] = PPP_COMP;
+	put_unaligned_be16(PPP_COMP, wptr + 2);
 	wptr += PPP_HDRLEN;
-	wptr[0] = state->seqno >> 8;
-	wptr[1] = state->seqno;
+	put_unaligned_be16(state->seqno, wptr);
 	wptr += DEFLATE_OVHD;
 	olen = PPP_HDRLEN + DEFLATE_OVHD;
 	state->strm.next_out = wptr;
@@ -451,7 +450,7 @@ static int z_decompress(void *arg, unsigned char *ibuf, int isize,
 	}
 
 	/* Check the sequence number. */
-	seq = (ibuf[PPP_HDRLEN] << 8) + ibuf[PPP_HDRLEN+1];
+	seq = get_unaligned_be16(ibuf + PPP_HDRLEN);
 	if (seq != (state->seqno & 0xffff)) {
 		if (state->debug)
 			printk(KERN_DEBUG "z_decompress%d: bad seq # %d, expected %d\n",
diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 6456484..0a81f0b 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -46,6 +46,7 @@
 #include <linux/device.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
+#include <linux/unaligned/be_struct.h>
 #include <net/slhc_vj.h>
 #include <asm/atomic.h>
 
@@ -210,7 +211,7 @@ struct ppp_net {
 };
 
 /* Get the PPP protocol number from a skb */
-#define PPP_PROTO(skb)	(((skb)->data[0] << 8) + (skb)->data[1])
+#define PPP_PROTO(skb)	get_unaligned_be16((skb)->data)
 
 /* We limit the length of ppp->file.rq to this (arbitrary) value */
 #define PPP_MAX_RQLEN	32
@@ -964,8 +965,7 @@ ppp_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	pp = skb_push(skb, 2);
 	proto = npindex_to_proto[npi];
-	pp[0] = proto >> 8;
-	pp[1] = proto;
+	put_unaligned_be16(proto, pp);
 
 	netif_stop_queue(dev);
 	skb_queue_tail(&ppp->file.xq, skb);
@@ -1473,8 +1473,7 @@ static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb)
 		q = skb_put(frag, flen + hdrlen);
 
 		/* make the MP header */
-		q[0] = PPP_MP >> 8;
-		q[1] = PPP_MP;
+		put_unaligned_be16(PPP_MP, q);
 		if (ppp->flags & SC_MP_XSHORTSEQ) {
 			q[2] = bits + ((ppp->nxseq >> 8) & 0xf);
 			q[3] = ppp->nxseq;
diff --git a/drivers/net/ppp_mppe.c b/drivers/net/ppp_mppe.c
index 6d1a1b8..44dab65 100644
--- a/drivers/net/ppp_mppe.c
+++ b/drivers/net/ppp_mppe.c
@@ -55,6 +55,7 @@
 #include <linux/ppp_defs.h>
 #include <linux/ppp-comp.h>
 #include <linux/scatterlist.h>
+#include <linux/unaligned/be_struct.h>
 
 #include "ppp_mppe.h"
 
@@ -395,16 +396,14 @@ mppe_compress(void *arg, unsigned char *ibuf, unsigned char *obuf,
 	 */
 	obuf[0] = PPP_ADDRESS(ibuf);
 	obuf[1] = PPP_CONTROL(ibuf);
-	obuf[2] = PPP_COMP >> 8;	/* isize + MPPE_OVHD + 1 */
-	obuf[3] = PPP_COMP;	/* isize + MPPE_OVHD + 2 */
+	put_unaligned_be16(PPP_COMP, obuf + 2);
 	obuf += PPP_HDRLEN;
 
 	state->ccount = (state->ccount + 1) % MPPE_CCOUNT_SPACE;
 	if (state->debug >= 7)
 		printk(KERN_DEBUG "mppe_compress[%d]: ccount %d\n", state->unit,
 		       state->ccount);
-	obuf[0] = state->ccount >> 8;
-	obuf[1] = state->ccount & 0xff;
+	put_unaligned_be16(state->ccount, obuf);
 
 	if (!state->stateful ||	/* stateless mode     */
 	    ((state->ccount & 0xff) == 0xff) ||	/* "flag" packet      */
diff --git a/drivers/net/ppp_synctty.c b/drivers/net/ppp_synctty.c
index 4c95ec3..fe86826 100644
--- a/drivers/net/ppp_synctty.c
+++ b/drivers/net/ppp_synctty.c
@@ -45,6 +45,7 @@
 #include <linux/completion.h>
 #include <linux/init.h>
 #include <linux/slab.h>
+#include <linux/unaligned/be_struct.h>
 #include <asm/uaccess.h>
 
 #define PPP_VERSION	"2.4.2"
@@ -563,7 +564,7 @@ ppp_sync_txmunge(struct syncppp *ap, struct sk_buff *skb)
 	int islcp;
 
 	data  = skb->data;
-	proto = (data[0] << 8) + data[1];
+	proto = get_unaligned_be16(data);
 
 	/* LCP packets with codes between 1 (configure-request)
 	 * and 7 (code-reject) must be sent as though no options

^ permalink raw reply related

* Flow Control and Port Mirroring Revisited
From: Simon Horman @ 2011-01-06  9:33 UTC (permalink / raw)
  To: Rusty Russell
  Cc: virtualization, Jesse Gross, dev, virtualization, netdev, kvm,
	Michael S. Tsirkin

Hi,

Back in October I reported that I noticed a problem whereby flow control
breaks down when openvswitch is configured to mirror a port[1].

I have (finally) looked into this further and the problem appears to relate
to cloning of skbs, as Jesse Gross originally suspected.

More specifically, in do_execute_actions[2] the first n-1 times that an skb
needs to be transmitted it is cloned first and the final time the original
skb is used.

In the case that there is only one action, which is the normal case, then
the original skb will be used. But in the case of mirroring the cloning
comes into effect. And in my case the cloned skb seems to go to the (slow)
eth1 interface while the original skb goes to the (fast) dummy0 interface
that I set up to be a mirror. The result is that dummy0 "paces" the flow,
and its a cracking pace at that.

As an experiment I hacked do_execute_actions() to use the original skb
for the first action instead of the last one.  In my case the result was
that eth1 "paces" the flow, and things work reasonably nicely.

Well, sort of. Things work well for non-GSO skbs but extremely poorly for
GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
netserv. I'm unsure why, but I digress.

It seems to me that my hack illustrates the point that the flow ends up
being "paced" by one interface. However I think that what would be
desirable is that the flow is "paced" by the slowest link. Unfortunately
I'm unsure how to achieve that.

One idea that I had was to skb_get() the original skb each time it is
cloned - that is easy enough. But unfortunately it seems to me that
approach would require some sort of callback mechanism in kfree_skb() so
that the cloned skbs can kfree_skb() the original skb.

Ideas would be greatly appreciated.

[1] http://openvswitch.org/pipermail/dev_openvswitch.org/2010-October/003806.html
[2] http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/actions.c;h=5e16143ca402f7da0ee8fc18ee5eb16c3b7598e6;hb=HEAD

^ permalink raw reply

* Re: [PATCH v3 08/10] ARM: mxs: add ocotp read function
From: Russell King - ARM Linux @ 2011-01-06  9:13 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Jamie Iles, gerg, B32542, netdev, s.hauer, bryan.wu, baruch,
	w.sang, r64343, Shawn Guo, eric, Uwe Kleine-König, davem,
	linux-arm-kernel, lw
In-Reply-To: <20110106005052.GA4476@shareable.org>

On Thu, Jan 06, 2011 at 12:50:52AM +0000, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > On Wed, Jan 05, 2011 at 07:44:18PM +0000, Jamie Lokier wrote:
> > > 'git show 534be1d5' explains how it works: cpu_relax() flushes buffered
> > > writes from _this_ CPU, so that other CPUs which are polling can make
> > > progress, which avoids this CPU getting stuck if there is an indirect
> > > dependency (no matter how convoluted) between what it's polling and which
> > > it wrote just before.
> > > 
> > > So cpu_relax() is *essential* in some polling loops, not a hint.
> > > 
> > > In principle that could happen for I/O polling, if (a) buffered memory
> > > writes are delayed by I/O read transactions, and (b) the device state we're
> > > waiting on depends on I/O yet to be done on another CPU, which could be
> > > polling memory first (e.g. a spinlock).
> > > 
> > > I doubt (a) in practice - but what about buses that block during I/O read?
> > > (I have a chip like that here, but it's ARMv4T.)
> > 
> > Let's be clear - ARMv5 and below generally are well ordered architectures
> > within the limits of caching.  There are cases where the write buffer
> > allows two writes to pass each other.  However, for IO we generally map
> > these - especially for ARMv4 and below - as 'uncacheable unbufferable'.
> > So on these, if the program says "read this location" the pipeline will
> > stall until the read has been issued - and if you use the result in the
> > next instruction, it will stall until the data is available.  So really,
> > it's not a problem here.
> > 
> > ARMv6 and above have a weakly ordered memory model with speculative
> > prefetching, so memory reads/writes can be completely unordered.  Device
> > accesses can pass memory accesses, but device accesses are always visible
> > in program order with respect to each other.
> > 
> > So, if you're spinning in a loop reading an IO device, all previous IO
> > accesses will be completed (in all ARM architectures) before the result
> > of your read is evaluated.
> 
> No, that wasn't the scenario - it was:
> 
> You're spinning reading an IO device, whose state depends indirectly
> on a *CPU memory* write that is forever buffered.
> 
> (Go and re-read 'git show 534be1d5' if you haven't already.)

I know what that's about, and it's about memory based accesses _only_.

What you're talking about is a programming error.  Such errors cause
data corruption if you're talking about DMA stuff.

At the moment, the solution to that is to put whatever's necessary into
readl/writel to ensure that they behave as ordered operations with
respect to everything else.  You'll find that on ARM, writel has a
barrier before it to ensure memory writes are visible before the device
write, and on readl there's a barrier to ensure that no memory read can
happen before the IO device read.

cpu_relax() has nothing to do with ensuring ordering with devices.

With relaxed IO operations, the responsibility for ensuring proper ordering
between memory and IO falls to the programmer.

^ permalink raw reply

* Re: Bad TCP timestamps on non-PC platforms
From: Eric Dumazet @ 2011-01-06  8:09 UTC (permalink / raw)
  To: Alex Dubov; +Cc: netdev
In-Reply-To: <219643.83181.qm@web37604.mail.mud.yahoo.com>

Le mercredi 05 janvier 2011 à 22:55 -0800, Alex Dubov a écrit :
> Greetings.
> I'm dealing with 2.6.37-rc7 kernel on MPC8548 platform.
> 
> It so appears, that recent kernels have sysctl_tcp_timestamps set to "1"
> by default.
> 
> On embedded platforms, where real time clock is initialized lately or
> absent outright, this causes TSval field of outgoing TCP packets to be
> set to some garbage value, in my case in the vicinity of 0xffffffff. As a
> result, other Linux machines silently drop such packets, preventing normal
> completion of network boot or any other TCP dependent operation.
> 
> Therefore, I feel, two changes are necessary (I can send in a patch):
> 1. Make sysctl_tcp_timestamps value config-time selectable (it must be
> disabled by default on machines without RTC).
> 2. When re-enabling tcp_timestamps through sysctl, reset the timestamp
> counter to the current system clock value.
> 
> And an optional, tricky one:
> 3. Postpone TCP timestamp counter initialization until RTC is actually
> available (if RTC is connected to i2c bus, TCP is initialized well ahead
> of it).
> 

I am trying to understand the problem, but I guess you hit something not
related to RTC at all.

#define tcp_time_stamp		((__u32)(jiffies))
#define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))

jiffies can be ZERO 300 seconds after boot.

It should not matter with RFC1323. Some TCP session could in theory be
done without TCP timestamps being activated (because SYN paquet would
carry a ZERO  tsval/tsecr), but this should not prevents normal network
operation.

Are you sure the "other linux machines" dont have
strange /etc/sysctl.conf settings ?




^ permalink raw reply

* [RFC] Connection oriented routing
From: Michael Blizek @ 2011-01-06  7:34 UTC (permalink / raw)
  To: netdev; +Cc: davem, chambilkethakur

Hi!

I am implementing a connection oriented source routed layer 3/4 protocol for
zero administration (community/mesh) networks. It takes care of things like
fair bandwidth allocation and has higher security/privacy so that it can really
run without administration. It is basically running, but not entirely stable
and with some loose ends. You can find some more info here:
http://michaelblizek.twilightparadox.com/projects/cor/index.html

git repo (based on 2.6.28):
http://repo.or.cz/w/cor.git?a=tree

My 2 previous queries to netdev:
http://article.gmane.org/gmane.linux.network/80232
http://article.gmane.org/gmane.linux.network/144915

	-Michi


^ permalink raw reply

* patch ethtool 2.6.37 classificationoptions
From: Kelly Anderson @ 2011-01-06  7:00 UTC (permalink / raw)
  To: bhutchings, netdev

[-- Attachment #1: Type: text/plain, Size: 86 bytes --]

Attached is a fix for the missing space in classificationoptions when 
using --help.


[-- Attachment #2: EthTool-2.6.37-fix-classificationoptions.patch --]
[-- Type: text/plain, Size: 806 bytes --]

--- ./ethtool.c.orig	2010-12-16 17:37:59.000000000 -0700
+++ ./ethtool.c	2011-01-05 23:55:00.363077978 -0700
@@ -231,11 +231,11 @@ static struct option {
                 "               [ online | offline ]\n" },
     { "-S", "--statistics", MODE_GSTATS, "Show adapter statistics" },
     { "-n", "--show-nfc", MODE_GNFC, "Show Rx network flow classification"
-		"options",
+		" options",
 		"		[ rx-flow-hash tcp4|udp4|ah4|sctp4|"
 		"tcp6|udp6|ah6|sctp6 ]\n" },
     { "-f", "--flash", MODE_FLASHDEV, "FILENAME " "Flash firmware image "
-    		"from the specified file to a region on the device",
+		"from the specified file to a region on the device",
 		"               [ REGION-NUMBER-TO-FLASH ]\n" },
     { "-N", "--config-nfc", MODE_SNFC, "Configure Rx network flow "
 		"classification options",

^ permalink raw reply

* [PATCH v4 10/10] ARM: mxs: add initial pm support
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

This is a very initial pm support and basically does nothing.
With this pm support entry, drivers can start testing their own
pm functions.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
Changes for v2:
 - Let build of pm.c depend on CONFIG_PM
 - Remove the blank line above device_initcall in pm.c

 arch/arm/mach-mxs/Makefile |    2 ++
 arch/arm/mach-mxs/pm.c     |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/mach-mxs/pm.c

diff --git a/arch/arm/mach-mxs/Makefile b/arch/arm/mach-mxs/Makefile
index f23ebbd..45a2925 100644
--- a/arch/arm/mach-mxs/Makefile
+++ b/arch/arm/mach-mxs/Makefile
@@ -1,6 +1,8 @@
 # Common support
 obj-y := clock.o devices.o gpio.o icoll.o iomux.o ocotp.o system.o timer.o
 
+obj-$(CONFIG_PM) += pm.o
+
 obj-$(CONFIG_SOC_IMX23) += clock-mx23.o mm-mx23.o
 obj-$(CONFIG_SOC_IMX28) += clock-mx28.o mm-mx28.o
 
diff --git a/arch/arm/mach-mxs/pm.c b/arch/arm/mach-mxs/pm.c
new file mode 100644
index 0000000..fb042da
--- /dev/null
+++ b/arch/arm/mach-mxs/pm.c
@@ -0,0 +1,43 @@
+/*
+ * Copyright (C) 2010 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/suspend.h>
+#include <linux/io.h>
+#include <mach/system.h>
+
+static int mxs_suspend_enter(suspend_state_t state)
+{
+	switch (state) {
+	case PM_SUSPEND_MEM:
+		arch_idle();
+		break;
+
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct platform_suspend_ops mxs_suspend_ops = {
+	.enter = mxs_suspend_enter,
+	.valid = suspend_valid_only_mem,
+};
+
+static int __init mxs_pm_init(void)
+{
+	suspend_set_ops(&mxs_suspend_ops);
+	return 0;
+}
+device_initcall(mxs_pm_init);
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 09/10] ARM: mx28: read fec mac address from ocotp
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

Read fec mac address from ocotp and save it into fec_platform_data
mac field for fec driver to use.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
Changes for v2:
 - It's not necessary to remove "const" for fec_platform_data from
   platform-fec.c and devices-common.h, so add it back.
 - Hard-coding Freescale OUI (00:04:9f) instead of just the first
   two two octets.
 - Correct the return of mx28evk_fec_get_mac() and check it
   with caller

 arch/arm/mach-mxs/mach-mx28evk.c |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-mxs/mach-mx28evk.c b/arch/arm/mach-mxs/mach-mx28evk.c
index def6519..54fa512 100644
--- a/arch/arm/mach-mxs/mach-mx28evk.c
+++ b/arch/arm/mach-mxs/mach-mx28evk.c
@@ -129,12 +129,44 @@ static struct fec_platform_data mx28_fec_pdata[] = {
 	},
 };
 
+static int __init mx28evk_fec_get_mac(void)
+{
+	int i, ret;
+	u32 val;
+
+	/*
+	 * OCOTP only stores the last 4 octets for each mac address,
+	 * so hard-code Freescale OUI (00:04:9f) here.
+	 */
+	for (i = 0; i < 2; i++) {
+		ret = mxs_read_ocotp(0x20 + i * 0x10, 1, &val);
+		if (ret)
+			goto error;
+
+		mx28_fec_pdata[i].mac[0] = 0x00;
+		mx28_fec_pdata[i].mac[1] = 0x04;
+		mx28_fec_pdata[i].mac[2] = 0x9f;
+		mx28_fec_pdata[i].mac[3] = (val >> 16) & 0xff;
+		mx28_fec_pdata[i].mac[4] = (val >> 8) & 0xff;
+		mx28_fec_pdata[i].mac[5] = (val >> 0) & 0xff;
+	}
+
+	return 0;
+
+error:
+	pr_err("%s: timeout when reading fec mac from OCOTP\n", __func__);
+	return ret;
+}
+
 static void __init mx28evk_init(void)
 {
 	mxs_iomux_setup_multiple_pads(mx28evk_pads, ARRAY_SIZE(mx28evk_pads));
 
 	mx28_add_duart();
 
+	if (mx28evk_fec_get_mac())
+		pr_warn("%s: failed on fec mac setup\n", __func__);
+
 	mx28evk_fec_reset();
 	mx28_add_fec(0, &mx28_fec_pdata[0]);
 #ifdef CONFIG_FEC2
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 08/10] ARM: mxs: add ocotp read function
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
Changes for v4:
 - Call cpu_relax() during polling

Changes for v2:
 - Add mutex locking for mxs_read_ocotp()
 - Use type size_t for count and i
 - Add comment for clk_enable/disable skipping
 - Add ERROR bit clearing and polling step

 arch/arm/mach-mxs/Makefile              |    2 +-
 arch/arm/mach-mxs/include/mach/common.h |    1 +
 arch/arm/mach-mxs/ocotp.c               |   79 +++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 1 deletions(-)
 create mode 100644 arch/arm/mach-mxs/ocotp.c

diff --git a/arch/arm/mach-mxs/Makefile b/arch/arm/mach-mxs/Makefile
index 39d3f9c..f23ebbd 100644
--- a/arch/arm/mach-mxs/Makefile
+++ b/arch/arm/mach-mxs/Makefile
@@ -1,5 +1,5 @@
 # Common support
-obj-y := clock.o devices.o gpio.o icoll.o iomux.o system.o timer.o
+obj-y := clock.o devices.o gpio.o icoll.o iomux.o ocotp.o system.o timer.o
 
 obj-$(CONFIG_SOC_IMX23) += clock-mx23.o mm-mx23.o
 obj-$(CONFIG_SOC_IMX28) += clock-mx28.o mm-mx28.o
diff --git a/arch/arm/mach-mxs/include/mach/common.h b/arch/arm/mach-mxs/include/mach/common.h
index 59133eb..cf02552 100644
--- a/arch/arm/mach-mxs/include/mach/common.h
+++ b/arch/arm/mach-mxs/include/mach/common.h
@@ -13,6 +13,7 @@
 
 struct clk;
 
+extern int mxs_read_ocotp(int offset, int count, u32 *values);
 extern int mxs_reset_block(void __iomem *);
 extern void mxs_timer_init(struct clk *, int);
 
diff --git a/arch/arm/mach-mxs/ocotp.c b/arch/arm/mach-mxs/ocotp.c
new file mode 100644
index 0000000..e2d39aa
--- /dev/null
+++ b/arch/arm/mach-mxs/ocotp.c
@@ -0,0 +1,79 @@
+/*
+ * Copyright 2010 Freescale Semiconductor, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/mutex.h>
+
+#include <mach/mxs.h>
+
+#define BM_OCOTP_CTRL_BUSY		(1 << 8)
+#define BM_OCOTP_CTRL_ERROR		(1 << 9)
+#define BM_OCOTP_CTRL_RD_BANK_OPEN	(1 << 12)
+
+static DEFINE_MUTEX(ocotp_mutex);
+
+int mxs_read_ocotp(unsigned offset, size_t count, u32 *values)
+{
+	void __iomem *ocotp_base = MXS_IO_ADDRESS(MXS_OCOTP_BASE_ADDR);
+	int timeout = 0x400;
+	size_t i;
+
+	mutex_lock(&ocotp_mutex);
+
+	/*
+	 * clk_enable(hbus_clk) for ocotp can be skipped
+	 * as it must be on when system is running.
+	 */
+
+	/* try to clear ERROR bit */
+	__mxs_clrl(BM_OCOTP_CTRL_ERROR, ocotp_base);
+
+	/* check both BUSY and ERROR cleared */
+	while ((__raw_readl(ocotp_base) &
+		(BM_OCOTP_CTRL_BUSY | BM_OCOTP_CTRL_ERROR)) && --timeout)
+		cpu_relax();
+
+	if (unlikely(!timeout))
+		goto error_unlock;
+
+	/* open OCOTP banks for read */
+	__mxs_setl(BM_OCOTP_CTRL_RD_BANK_OPEN, ocotp_base);
+
+	/* approximately wait 32 hclk cycles */
+	udelay(1);
+
+	/* poll BUSY bit becoming cleared */
+	timeout = 0x400;
+	while ((__raw_readl(ocotp_base) & BM_OCOTP_CTRL_BUSY) && --timeout)
+		cpu_relax();
+
+	if (unlikely(!timeout))
+		goto error_unlock;
+
+	for (i = 0; i < count; i++, offset += 4)
+		*values++ = __raw_readl(ocotp_base + offset);
+
+	/* close banks for power saving */
+	__mxs_clrl(BM_OCOTP_CTRL_RD_BANK_OPEN, ocotp_base);
+
+	mutex_unlock(&ocotp_mutex);
+
+	return 0;
+
+error_unlock:
+	mutex_unlock(&ocotp_mutex);
+	pr_err("%s: timeout in reading OCOTP\n", __func__);
+	return -ETIMEDOUT;
+}
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 07/10] ARM: mx28: add the second fec device registration
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
 arch/arm/mach-mxs/mach-mx28evk.c |   28 +++++++++++++++++++++++++---
 1 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mach-mxs/mach-mx28evk.c b/arch/arm/mach-mxs/mach-mx28evk.c
index d162e95..def6519 100644
--- a/arch/arm/mach-mxs/mach-mx28evk.c
+++ b/arch/arm/mach-mxs/mach-mx28evk.c
@@ -57,6 +57,19 @@ static const iomux_cfg_t mx28evk_pads[] __initconst = {
 		(MXS_PAD_8MA | MXS_PAD_3V3 | MXS_PAD_PULLUP),
 	MX28_PAD_ENET_CLK__CLKCTRL_ENET |
 		(MXS_PAD_8MA | MXS_PAD_3V3 | MXS_PAD_PULLUP),
+	/* fec1 */
+	MX28_PAD_ENET0_CRS__ENET1_RX_EN |
+		(MXS_PAD_8MA | MXS_PAD_3V3 | MXS_PAD_PULLUP),
+	MX28_PAD_ENET0_RXD2__ENET1_RXD0 |
+		(MXS_PAD_8MA | MXS_PAD_3V3 | MXS_PAD_PULLUP),
+	MX28_PAD_ENET0_RXD3__ENET1_RXD1 |
+		(MXS_PAD_8MA | MXS_PAD_3V3 | MXS_PAD_PULLUP),
+	MX28_PAD_ENET0_COL__ENET1_TX_EN |
+		(MXS_PAD_8MA | MXS_PAD_3V3 | MXS_PAD_PULLUP),
+	MX28_PAD_ENET0_TXD2__ENET1_TXD0 |
+		(MXS_PAD_8MA | MXS_PAD_3V3 | MXS_PAD_PULLUP),
+	MX28_PAD_ENET0_TXD3__ENET1_TXD1 |
+		(MXS_PAD_8MA | MXS_PAD_3V3 | MXS_PAD_PULLUP),
 	/* phy power line */
 	MX28_PAD_SSP1_DATA3__GPIO_2_15 |
 		(MXS_PAD_4MA | MXS_PAD_3V3 | MXS_PAD_NOPULL),
@@ -106,8 +119,14 @@ static void __init mx28evk_fec_reset(void)
 	gpio_set_value(MX28EVK_FEC_PHY_RESET, 1);
 }
 
-static const struct fec_platform_data mx28_fec_pdata __initconst = {
-	.phy = PHY_INTERFACE_MODE_RMII,
+static struct fec_platform_data mx28_fec_pdata[] = {
+	{
+		/* fec0 */
+		.phy = PHY_INTERFACE_MODE_RMII,
+	}, {
+		/* fec1 */
+		.phy = PHY_INTERFACE_MODE_RMII,
+	},
 };
 
 static void __init mx28evk_init(void)
@@ -117,7 +136,10 @@ static void __init mx28evk_init(void)
 	mx28_add_duart();
 
 	mx28evk_fec_reset();
-	mx28_add_fec(0, &mx28_fec_pdata);
+	mx28_add_fec(0, &mx28_fec_pdata[0]);
+#ifdef CONFIG_FEC2
+	mx28_add_fec(1, &mx28_fec_pdata[1]);
+#endif
 }
 
 static void __init mx28evk_timer_init(void)
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 06/10] ARM: mx28: update clock and device name for dual fec support
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

Change device name from "fec" to "imx28-fec", so that fec driver
can distinguish mx28.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
Changes for v4:
 - Use "imx28-fec" as fec device name

Changes for v3:
 - Change device name to "enet-mac"

 arch/arm/mach-mxs/clock-mx28.c           |    3 ++-
 arch/arm/mach-mxs/devices/platform-fec.c |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-mxs/clock-mx28.c b/arch/arm/mach-mxs/clock-mx28.c
index f20b254..e2a8b0f 100644
--- a/arch/arm/mach-mxs/clock-mx28.c
+++ b/arch/arm/mach-mxs/clock-mx28.c
@@ -606,7 +606,8 @@ static struct clk_lookup lookups[] = {
 	_REGISTER_CLOCK("duart", "apb_pclk", xbus_clk)
 	/* for amba-pl011 driver */
 	_REGISTER_CLOCK("duart", NULL, uart_clk)
-	_REGISTER_CLOCK("fec.0", NULL, fec_clk)
+	_REGISTER_CLOCK("imx28-fec.0", NULL, fec_clk)
+	_REGISTER_CLOCK("imx28-fec.1", NULL, fec_clk)
 	_REGISTER_CLOCK("rtc", NULL, rtc_clk)
 	_REGISTER_CLOCK("pll2", NULL, pll2_clk)
 	_REGISTER_CLOCK(NULL, "hclk", hbus_clk)
diff --git a/arch/arm/mach-mxs/devices/platform-fec.c b/arch/arm/mach-mxs/devices/platform-fec.c
index c08168c..c42dff7 100644
--- a/arch/arm/mach-mxs/devices/platform-fec.c
+++ b/arch/arm/mach-mxs/devices/platform-fec.c
@@ -45,6 +45,6 @@ struct platform_device *__init mxs_add_fec(
 		},
 	};
 
-	return mxs_add_platform_device("fec", data->id,
+	return mxs_add_platform_device("imx28-fec", data->id,
 			res, ARRAY_SIZE(res), pdata, sizeof(*pdata));
 }
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 05/10] net/fec: add dual fec support for mx28
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

This patch is to add mx28 dual fec support. Here are some key notes
for mx28 fec controller.

 - The mx28 fec controller naming ENET-MAC is a different IP from FEC
   used on other i.mx variants.  But they are basically compatible
   on software interface, so it's possible to share the same driver.
 - ENET-MAC design on mx28 made an improper assumption that it runs
   on a big-endian system. As the result, driver has to swap every
   frame going to and coming from the controller.
 - The external phys can only be configured by fec0, which means fec1
   can not work independently and both phys need to be configured by
   mii_bus attached on fec0.
 - ENET-MAC reset will get mac address registers reset too.
 - ENET-MAC MII/RMII mode and 10M/100M speed are configured
   differently FEC.
 - ETHER_EN bit must be set to get ENET-MAC interrupt work.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
Changes for v4:
 - Use #ifndef CONFIG_ARM to include ColdFire header files
 - Define quirk bits in id_entry.driver_data to handle controller
   difference, which is more scalable than using device name
 - Define fec0_mii_bus as a static function in fec_enet_mii_init
   to fold the mii_bus instance attached on fec0
 - Use cpu_to_be32 over __swab32 in function swap_buffer

Changes for v3:
 - Move v2 changes into patch #3
 - Use device name to check if it's running on ENET-MAC

 drivers/net/Kconfig |    7 ++-
 drivers/net/fec.c   |  148 +++++++++++++++++++++++++++++++++++++++++++++------
 drivers/net/fec.h   |    5 +-
 3 files changed, 139 insertions(+), 21 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 4f1755b..f34629b 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1944,18 +1944,19 @@ config 68360_ENET
 config FEC
 	bool "FEC ethernet controller (of ColdFire and some i.MX CPUs)"
 	depends on M523x || M527x || M5272 || M528x || M520x || M532x || \
-		MACH_MX27 || ARCH_MX35 || ARCH_MX25 || ARCH_MX5
+		MACH_MX27 || ARCH_MX35 || ARCH_MX25 || ARCH_MX5 || SOC_IMX28
 	select PHYLIB
 	help
 	  Say Y here if you want to use the built-in 10/100 Fast ethernet
 	  controller on some Motorola ColdFire and Freescale i.MX processors.
 
 config FEC2
-	bool "Second FEC ethernet controller (on some ColdFire CPUs)"
+	bool "Second FEC ethernet controller"
 	depends on FEC
 	help
 	  Say Y here if you want to use the second built-in 10/100 Fast
-	  ethernet controller on some Motorola ColdFire processors.
+	  ethernet controller on some Motorola ColdFire and Freescale
+	  i.MX processors.
 
 config FEC_MPC52xx
 	tristate "MPC52xx FEC driver"
diff --git a/drivers/net/fec.c b/drivers/net/fec.c
index 8a1c51f..2a71373 100644
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -17,6 +17,8 @@
  *
  * Bug fixes and cleanup by Philippe De Muyter (phdm@macqel.be)
  * Copyright (c) 2004-2006 Macq Electronique SA.
+ *
+ * Copyright (C) 2010 Freescale Semiconductor, Inc.
  */
 
 #include <linux/module.h>
@@ -45,20 +47,36 @@
 
 #include <asm/cacheflush.h>
 
-#ifndef CONFIG_ARCH_MXC
+#ifndef CONFIG_ARM
 #include <asm/coldfire.h>
 #include <asm/mcfsim.h>
 #endif
 
 #include "fec.h"
 
-#ifdef CONFIG_ARCH_MXC
-#include <mach/hardware.h>
+#if defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)
 #define FEC_ALIGNMENT	0xf
 #else
 #define FEC_ALIGNMENT	0x3
 #endif
 
+#define DRIVER_NAME	"fec"
+
+/* Controller is ENET-MAC */
+#define FEC_QUIRK_ENET_MAC		(1 << 0)
+/* Controller needs driver to swap frame */
+#define FEC_QUIRK_SWAP_FRAME		(1 << 1)
+
+static struct platform_device_id fec_devtype[] = {
+	{
+		.name = DRIVER_NAME,
+		.driver_data = 0,
+	}, {
+		.name = "imx28-fec",
+		.driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME,
+	}
+};
+
 static unsigned char macaddr[ETH_ALEN];
 module_param_array(macaddr, byte, NULL, 0);
 MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
@@ -129,7 +147,8 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
  * account when setting it.
  */
 #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
-    defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARCH_MXC)
+    defined(CONFIG_M520x) || defined(CONFIG_M532x) || \
+    defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)
 #define	OPT_FRAME_SIZE	(PKT_MAXBUF_SIZE << 16)
 #else
 #define	OPT_FRAME_SIZE	0
@@ -208,10 +227,23 @@ static void fec_stop(struct net_device *dev);
 /* Transmitter timeout */
 #define TX_TIMEOUT (2 * HZ)
 
+static void *swap_buffer(void *bufaddr, int len)
+{
+	int i;
+	unsigned int *buf = bufaddr;
+
+	for (i = 0; i < (len + 3) / 4; i++, buf++)
+		*buf = cpu_to_be32(*buf);
+
+	return bufaddr;
+}
+
 static netdev_tx_t
 fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
+	const struct platform_device_id *id_entry =
+				platform_get_device_id(fep->pdev);
 	struct bufdesc *bdp;
 	void *bufaddr;
 	unsigned short	status;
@@ -256,6 +288,14 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		bufaddr = fep->tx_bounce[index];
 	}
 
+	/*
+	 * Some design made an incorrect assumption on endian mode of
+	 * the system that it's running on. As the result, driver has to
+	 * swap every frame going to and coming from the controller.
+	 */
+	if (id_entry->driver_data & FEC_QUIRK_SWAP_FRAME)
+		swap_buffer(bufaddr, skb->len);
+
 	/* Save skb pointer */
 	fep->tx_skbuff[fep->skb_cur] = skb;
 
@@ -424,6 +464,8 @@ static void
 fec_enet_rx(struct net_device *dev)
 {
 	struct	fec_enet_private *fep = netdev_priv(dev);
+	const struct platform_device_id *id_entry =
+				platform_get_device_id(fep->pdev);
 	struct bufdesc *bdp;
 	unsigned short status;
 	struct	sk_buff	*skb;
@@ -487,6 +529,9 @@ fec_enet_rx(struct net_device *dev)
 	        dma_unmap_single(NULL, bdp->cbd_bufaddr, bdp->cbd_datlen,
         			DMA_FROM_DEVICE);
 
+		if (id_entry->driver_data & FEC_QUIRK_SWAP_FRAME)
+			swap_buffer(data, pkt_len);
+
 		/* This does 16 byte alignment, exactly what we need.
 		 * The packet length includes FCS, but we don't want to
 		 * include that when passing upstream as it messes up
@@ -689,6 +734,7 @@ static int fec_enet_mii_probe(struct net_device *dev)
 	char mdio_bus_id[MII_BUS_ID_SIZE];
 	char phy_name[MII_BUS_ID_SIZE + 3];
 	int phy_id;
+	int dev_id = fep->pdev->id;
 
 	fep->phy_dev = NULL;
 
@@ -700,6 +746,8 @@ static int fec_enet_mii_probe(struct net_device *dev)
 			continue;
 		if (fep->mii_bus->phy_map[phy_id]->phy_id == 0)
 			continue;
+		if (dev_id--)
+			continue;
 		strncpy(mdio_bus_id, fep->mii_bus->id, MII_BUS_ID_SIZE);
 		break;
 	}
@@ -737,10 +785,35 @@ static int fec_enet_mii_probe(struct net_device *dev)
 
 static int fec_enet_mii_init(struct platform_device *pdev)
 {
+	static struct mii_bus *fec0_mii_bus;
 	struct net_device *dev = platform_get_drvdata(pdev);
 	struct fec_enet_private *fep = netdev_priv(dev);
+	const struct platform_device_id *id_entry =
+				platform_get_device_id(fep->pdev);
 	int err = -ENXIO, i;
 
+	/*
+	 * The dual fec interfaces are not equivalent with enet-mac.
+	 * Here are the differences:
+	 *
+	 *  - fec0 supports MII & RMII modes while fec1 only supports RMII
+	 *  - fec0 acts as the 1588 time master while fec1 is slave
+	 *  - external phys can only be configured by fec0
+	 *
+	 * That is to say fec1 can not work independently. It only works
+	 * when fec0 is working. The reason behind this design is that the
+	 * second interface is added primarily for Switch mode.
+	 *
+	 * Because of the last point above, both phys are attached on fec0
+	 * mdio interface in board design, and need to be configured by
+	 * fec0 mii_bus.
+	 */
+	if ((id_entry->driver_data & FEC_QUIRK_ENET_MAC) && pdev->id) {
+		/* fec1 uses fec0 mii_bus */
+		fep->mii_bus = fec0_mii_bus;
+		return 0;
+	}
+
 	fep->mii_timeout = 0;
 
 	/*
@@ -777,6 +850,10 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 	if (mdiobus_register(fep->mii_bus))
 		goto err_out_free_mdio_irq;
 
+	/* save fec0 mii_bus */
+	if (id_entry->driver_data & FEC_QUIRK_ENET_MAC)
+		fec0_mii_bus = fep->mii_bus;
+
 	return 0;
 
 err_out_free_mdio_irq:
@@ -1148,12 +1225,25 @@ static void
 fec_restart(struct net_device *dev, int duplex)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
+	const struct platform_device_id *id_entry =
+				platform_get_device_id(fep->pdev);
 	int i;
+	u32 val, temp_mac[2];
 
 	/* Whack a reset.  We should wait for this. */
 	writel(1, fep->hwp + FEC_ECNTRL);
 	udelay(10);
 
+	/*
+	 * enet-mac reset will reset mac address registers too,
+	 * so need to reconfigure it.
+	 */
+	if (id_entry->driver_data & FEC_QUIRK_ENET_MAC) {
+		memcpy(&temp_mac, dev->dev_addr, ETH_ALEN);
+		writel(cpu_to_be32(temp_mac[0]), fep->hwp + FEC_ADDR_LOW);
+		writel(cpu_to_be32(temp_mac[1]), fep->hwp + FEC_ADDR_HIGH);
+	}
+
 	/* Clear any outstanding interrupt. */
 	writel(0xffc00000, fep->hwp + FEC_IEVENT);
 
@@ -1200,20 +1290,45 @@ fec_restart(struct net_device *dev, int duplex)
 	/* Set MII speed */
 	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
 
-#ifdef FEC_MIIGSK_ENR
-	if (fep->phy_interface == PHY_INTERFACE_MODE_RMII) {
-		/* disable the gasket and wait */
-		writel(0, fep->hwp + FEC_MIIGSK_ENR);
-		while (readl(fep->hwp + FEC_MIIGSK_ENR) & 4)
-			udelay(1);
+	/*
+	 * The phy interface and speed need to get configured
+	 * differently on enet-mac.
+	 */
+	if (id_entry->driver_data & FEC_QUIRK_ENET_MAC) {
+		val = readl(fep->hwp + FEC_R_CNTRL);
 
-		/* configure the gasket: RMII, 50 MHz, no loopback, no echo */
-		writel(1, fep->hwp + FEC_MIIGSK_CFGR);
+		/* MII or RMII */
+		if (fep->phy_interface == PHY_INTERFACE_MODE_RMII)
+			val |= (1 << 8);
+		else
+			val &= ~(1 << 8);
 
-		/* re-enable the gasket */
-		writel(2, fep->hwp + FEC_MIIGSK_ENR);
-	}
+		/* 10M or 100M */
+		if (fep->phy_dev && fep->phy_dev->speed == SPEED_100)
+			val &= ~(1 << 9);
+		else
+			val |= (1 << 9);
+
+		writel(val, fep->hwp + FEC_R_CNTRL);
+	} else {
+#ifdef FEC_MIIGSK_ENR
+		if (fep->phy_interface == PHY_INTERFACE_MODE_RMII) {
+			/* disable the gasket and wait */
+			writel(0, fep->hwp + FEC_MIIGSK_ENR);
+			while (readl(fep->hwp + FEC_MIIGSK_ENR) & 4)
+				udelay(1);
+
+			/*
+			 * configure the gasket:
+			 *   RMII, 50 MHz, no loopback, no echo
+			 */
+			writel(1, fep->hwp + FEC_MIIGSK_CFGR);
+
+			/* re-enable the gasket */
+			writel(2, fep->hwp + FEC_MIIGSK_ENR);
+		}
 #endif
+	}
 
 	/* And last, enable the transmit and receive processing */
 	writel(2, fep->hwp + FEC_ECNTRL);
@@ -1410,12 +1525,13 @@ static const struct dev_pm_ops fec_pm_ops = {
 
 static struct platform_driver fec_driver = {
 	.driver	= {
-		.name	= "fec",
+		.name	= DRIVER_NAME,
 		.owner	= THIS_MODULE,
 #ifdef CONFIG_PM
 		.pm	= &fec_pm_ops,
 #endif
 	},
+	.id_table = fec_devtype,
 	.probe	= fec_probe,
 	.remove	= __devexit_p(fec_drv_remove),
 };
diff --git a/drivers/net/fec.h b/drivers/net/fec.h
index 2c48b25..ace318d 100644
--- a/drivers/net/fec.h
+++ b/drivers/net/fec.h
@@ -14,7 +14,8 @@
 /****************************************************************************/
 
 #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
-    defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARCH_MXC)
+    defined(CONFIG_M520x) || defined(CONFIG_M532x) || \
+    defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)
 /*
  *	Just figures, Motorola would have to change the offsets for
  *	registers in the same peripheral device on different models
@@ -78,7 +79,7 @@
 /*
  *	Define the buffer descriptor structure.
  */
-#ifdef CONFIG_ARCH_MXC
+#if defined(CONFIG_ARCH_MXC) || defined(CONFIG_SOC_IMX28)
 struct bufdesc {
 	unsigned short cbd_datlen;	/* Data length */
 	unsigned short cbd_sc;	/* Control and status info */
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 04/10] net/fec: improve pm for better suspend/resume
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

The following commit made a fix to use fec_enet_open/fec_enet_close
over fec_enet_init/fec_stop for suspend/resume, because fec_enet_init
does not allow to have a working network interface at resume.

  e3fe8558c7fc182972c3d947d88744482111f304
  net/fec: fix pm to survive to suspend/resume

This fix works for i.mx/mxc fec controller, but fails on mx28 fec
which gets a different interrupt logic design. On i.mx fec, interrupt
can be triggered even bit ETHER_EN of ECR register is not set. But
on mx28 fec, ETHER_EN must be set to get interrupt work. Meanwhile,
MII interrupt is mandatory to resume the driver, because MDIO
read/write changed to interrupt mode by commit below.

  97b72e4320a9aaa4a7f1592ee7d2da7e2c9bd349
  fec: use interrupt for MDIO completion indication

fec_restart/fec_stop comes out as the solution working for both
cases.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
 drivers/net/fec.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fec.c b/drivers/net/fec.c
index 47a3c7b..8a1c51f 100644
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -1372,8 +1372,10 @@ fec_suspend(struct device *dev)
 
 	if (ndev) {
 		fep = netdev_priv(ndev);
-		if (netif_running(ndev))
-			fec_enet_close(ndev);
+		if (netif_running(ndev)) {
+			fec_stop(ndev);
+			netif_device_detach(ndev);
+		}
 		clk_disable(fep->clk);
 	}
 	return 0;
@@ -1388,8 +1390,10 @@ fec_resume(struct device *dev)
 	if (ndev) {
 		fep = netdev_priv(ndev);
 		clk_enable(fep->clk);
-		if (netif_running(ndev))
-			fec_enet_open(ndev);
+		if (netif_running(ndev)) {
+			fec_restart(ndev, fep->full_duplex);
+			netif_device_attach(ndev);
+		}
 	}
 	return 0;
 }
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 03/10] net/fec: add mac field into platform data and consolidate fec_get_mac
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

Add mac field into fec_platform_data and consolidate function
fec_get_mac to get mac address in following order.

 1) module parameter via kernel command line fec.macaddr=0x00,0x04,...
 2) from flash in case of CONFIG_M5272 or fec_platform_data mac
    field for others, which typically have mac stored in fuse
 3) fec mac address registers set by bootloader

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
Changes for v3:
 - Use module parameter than new kernel command line to pass
   mac address
 - Change variable name and comment to remove confusing word
   "default"
 - Fix copyright breakage in fec.h

 drivers/net/fec.c   |   81 ++++++++++++++++++++++++---------------------------
 include/linux/fec.h |    3 ++
 2 files changed, 41 insertions(+), 43 deletions(-)

diff --git a/drivers/net/fec.c b/drivers/net/fec.c
index 47f6b3b..47a3c7b 100644
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -59,15 +59,11 @@
 #define FEC_ALIGNMENT	0x3
 #endif
 
-/*
- * Define the fixed address of the FEC hardware.
- */
-#if defined(CONFIG_M5272)
-
-static unsigned char	fec_mac_default[] = {
-	0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-};
+static unsigned char macaddr[ETH_ALEN];
+module_param_array(macaddr, byte, NULL, 0);
+MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 
+#if defined(CONFIG_M5272)
 /*
  * Some hardware gets it MAC address out of local flash memory.
  * if this is non-zero then assume it is the address to get MAC from.
@@ -537,37 +533,50 @@ rx_processing_done:
 }
 
 /* ------------------------------------------------------------------------- */
-#ifdef CONFIG_M5272
 static void __inline__ fec_get_mac(struct net_device *dev)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
+	struct fec_platform_data *pdata = fep->pdev->dev.platform_data;
 	unsigned char *iap, tmpaddr[ETH_ALEN];
 
-	if (FEC_FLASHMAC) {
-		/*
-		 * Get MAC address from FLASH.
-		 * If it is all 1's or 0's, use the default.
-		 */
-		iap = (unsigned char *)FEC_FLASHMAC;
-		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
-		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
-			iap = fec_mac_default;
-		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
-		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
-			iap = fec_mac_default;
-	} else {
-		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
-		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
+	/*
+	 * try to get mac address in following order:
+	 *
+	 * 1) module parameter via kernel command line in form
+	 *    fec.macaddr=0x00,0x04,0x9f,0x01,0x30,0xe0
+	 */
+	iap = macaddr;
+
+	/*
+	 * 2) from flash or fuse (via platform data)
+	 */
+	if (!is_valid_ether_addr(iap)) {
+#ifdef CONFIG_M5272
+		if (FEC_FLASHMAC)
+			iap = (unsigned char *)FEC_FLASHMAC;
+#else
+		if (pdata)
+			memcpy(iap, pdata->mac, ETH_ALEN);
+#endif
+	}
+
+	/*
+	 * 3) FEC mac registers set by bootloader
+	 */
+	if (!is_valid_ether_addr(iap)) {
+		*((unsigned long *) &tmpaddr[0]) =
+			be32_to_cpu(readl(fep->hwp + FEC_ADDR_LOW));
+		*((unsigned short *) &tmpaddr[4]) =
+			be16_to_cpu(readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
 		iap = &tmpaddr[0];
 	}
 
 	memcpy(dev->dev_addr, iap, ETH_ALEN);
 
-	/* Adjust MAC if using default MAC address */
-	if (iap == fec_mac_default)
-		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->pdev->id;
+	/* Adjust MAC if using macaddr */
+	if (iap == macaddr)
+		 dev->dev_addr[ETH_ALEN-1] = macaddr[ETH_ALEN-1] + fep->pdev->id;
 }
-#endif
 
 /* ------------------------------------------------------------------------- */
 
@@ -1087,22 +1096,8 @@ static int fec_enet_init(struct net_device *dev)
 	fep->hwp = (void __iomem *)dev->base_addr;
 	fep->netdev = dev;
 
-	/* Set the Ethernet address */
-#ifdef CONFIG_M5272
+	/* Get the Ethernet address */
 	fec_get_mac(dev);
-#else
-	{
-		unsigned long l;
-		l = readl(fep->hwp + FEC_ADDR_LOW);
-		dev->dev_addr[0] = (unsigned char)((l & 0xFF000000) >> 24);
-		dev->dev_addr[1] = (unsigned char)((l & 0x00FF0000) >> 16);
-		dev->dev_addr[2] = (unsigned char)((l & 0x0000FF00) >> 8);
-		dev->dev_addr[3] = (unsigned char)((l & 0x000000FF) >> 0);
-		l = readl(fep->hwp + FEC_ADDR_HIGH);
-		dev->dev_addr[4] = (unsigned char)((l & 0xFF000000) >> 24);
-		dev->dev_addr[5] = (unsigned char)((l & 0x00FF0000) >> 16);
-	}
-#endif
 
 	/* Set receive and transmit descriptor base. */
 	fep->rx_bd_base = cbd_base;
diff --git a/include/linux/fec.h b/include/linux/fec.h
index 5d3523d..bcff455 100644
--- a/include/linux/fec.h
+++ b/include/linux/fec.h
@@ -3,6 +3,8 @@
  * Copyright (c) 2009 Orex Computed Radiography
  *   Baruch Siach <baruch@tkos.co.il>
  *
+ * Copyright (C) 2010 Freescale Semiconductor, Inc.
+ *
  * Header file for the FEC platform data
  *
  * This program is free software; you can redistribute it and/or modify
@@ -16,6 +18,7 @@
 
 struct fec_platform_data {
 	phy_interface_t phy;
+	unsigned char mac[ETH_ALEN];
 };
 
 #endif
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 02/10] net/fec: remove the use of "index" which is legacy
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

The "index" becomes legacy since fep->pdev->id starts working
to identify the instance.

Moreover, the call of fec_enet_init(ndev, 0) always passes 0
to fep->index. This makes the following code in fec_get_mac buggy.

	/* Adjust MAC if using default MAC address */
	if (iap == fec_mac_default)
		dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;

It may be the time to remove "index" and use fep->pdev->id instead.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
 drivers/net/fec.c |    9 +++------
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/fec.c b/drivers/net/fec.c
index 52e9ca8..47f6b3b 100644
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -186,7 +186,6 @@ struct fec_enet_private {
 	int     mii_timeout;
 	uint    phy_speed;
 	phy_interface_t	phy_interface;
-	int	index;
 	int	link;
 	int	full_duplex;
 	struct	completion mdio_done;
@@ -566,7 +565,7 @@ static void __inline__ fec_get_mac(struct net_device *dev)
 
 	/* Adjust MAC if using default MAC address */
 	if (iap == fec_mac_default)
-		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
+		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->pdev->id;
 }
 #endif
 
@@ -1067,9 +1066,8 @@ static const struct net_device_ops fec_netdev_ops = {
  /*
   * XXX:  We need to clean up on failure exits here.
   *
-  * index is only used in legacy code
   */
-static int fec_enet_init(struct net_device *dev, int index)
+static int fec_enet_init(struct net_device *dev)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
 	struct bufdesc *cbd_base;
@@ -1086,7 +1084,6 @@ static int fec_enet_init(struct net_device *dev, int index)
 
 	spin_lock_init(&fep->hw_lock);
 
-	fep->index = index;
 	fep->hwp = (void __iomem *)dev->base_addr;
 	fep->netdev = dev;
 
@@ -1316,7 +1313,7 @@ fec_probe(struct platform_device *pdev)
 	}
 	clk_enable(fep->clk);
 
-	ret = fec_enet_init(ndev, 0);
+	ret = fec_enet_init(ndev);
 	if (ret)
 		goto failed_init;
 
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 01/10] net/fec: fix MMFR_OP type in fec_enet_mdio_write
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig
In-Reply-To: <1294297998-26930-1-git-send-email-shawn.guo@freescale.com>

FEC_MMFR_OP_WRITE should be used than FEC_MMFR_OP_READ in
a mdio write operation.

It's probably a typo introduced by commit:

e6b043d512fa8d9a3801bf5d72bfa3b8fc3b3cc8
netdev/fec.c: add phylib supporting to enable carrier detection (v2)

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
 drivers/net/fec.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fec.c b/drivers/net/fec.c
index cce32d4..52e9ca8 100644
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -651,8 +651,8 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 	fep->mii_timeout = 0;
 	init_completion(&fep->mdio_done);
 
-	/* start a read op */
-	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
+	/* start a write op */
+	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
 		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
 		FEC_MMFR_TA | FEC_MMFR_DATA(value),
 		fep->hwp + FEC_MII_DATA);
-- 
1.7.1



^ permalink raw reply related

* [PATCH v4 00/10] net/fec: add dual fec support for i.MX28
From: Shawn Guo @ 2011-01-06  7:13 UTC (permalink / raw)
  To: davem, gerg, baruch, eric, bryan.wu, r64343, B32542,
	u.kleine-koenig

This patch series is to add dual fec support for mx28, which is
a mxs-based soc. Some code changes related to the following commits
are also made in this patch set for some reasons.

 e6b043d512fa8d9a3801bf5d72bfa3b8fc3b3cc8
 netdev/fec.c: add phylib supporting to enable carrier detection (v2)

 e3fe8558c7fc182972c3d947d88744482111f304
 net/fec: fix pm to survive to suspend/resume

It's been tested on mx28 evk and mx51 babbage. For mx28, it has
to work against the tree

 git://git.pengutronix.de/git/imx/linux-2.6.git imx-for-2.6.38

plus patch

 [PATCH v4] ARM: mxs: Change duart device to use amba-pl011

The 3 patches below preceding with * have changes since v3, and
the detailed change log can be found in individual patch.

 [PATCH v4 01/10] net/fec: fix MMFR_OP type in fec_enet_mdio_write
 [PATCH v4 02/10] net/fec: remove the use of "index" which is legacy
 [PATCH v4 03/10] net/fec: add mac field into platform data and consolidate fec_get_mac
 [PATCH v4 04/10] net/fec: improve pm for better suspend/resume
*[PATCH v4 05/10] net/fec: add dual fec support for mx28
*[PATCH v4 06/10] ARM: mx28: update clock and device name for dual fec support
 [PATCH v4 07/10] ARM: mx28: add the second fec device registration
*[PATCH v4 08/10] ARM: mxs: add ocotp read function
 [PATCH v4 09/10] ARM: mx28: read fec mac address from ocotp
 [PATCH v4 10/10] ARM: mxs: add initial pm support

Thanks for the review.

Regards,
Shawn


^ permalink raw reply

* Re: [PATCH v3 05/10] net/fec: add dual fec support for mx28
From: Uwe Kleine-König @ 2011-01-06  7:10 UTC (permalink / raw)
  To: Shawn Guo
  Cc: davem, gerg, baruch, eric, bryan.wu, r64343, B32542, lw, w.sang,
	s.hauer, netdev, linux-arm-kernel
In-Reply-To: <20110106041457.GC17891@freescale.com>

Hello Shawn,

On Thu, Jan 06, 2011 at 12:14:59PM +0800, Shawn Guo wrote:
> On Wed, Jan 05, 2011 at 05:34:49PM +0100, Uwe Kleine-König wrote:
> > On Wed, Jan 05, 2011 at 10:07:32PM +0800, Shawn Guo wrote:
> > > +#define DRIVER_NAME  "fec"
> > > +#define ENET_MAC_NAME        "enet-mac"
> > > +
> > > +static struct platform_device_id fec_devtype[] = {
> > > +     {
> > > +             .name = DRIVER_NAME,
> > > +     }, {
> > > +             .name = ENET_MAC_NAME,
> > > +     }
> > I'd done it differently:
> > 
> >         {
> >                 .name = "fec",
> >                 .driver_data = 0,
> >         }, {
> >                 .name = "imx28-fec",
> >                 .driver_data = HAS_ENET_MAC | ...,
> >         }
> > 
> > and then test the bits in driver_data (which you get using
> > platform_get_device_id() when you need to distinguish.
> > Comparing names doesn't scale, assume there are three further features
> > to distinguish, then you need to use strtok or index and get device
> > names like enet-mac-with-feature1-but-without-feature2-and-feature3.
> > 
> Makes sense.  The frame endian issue will be fixed in future revision,
> so I would define bit SWAP_FRAME for testing.
sounds sane
 
> > > +     /*
> > > +      * enet-mac design made an improper assumption that it's running
> > > +      * on a big endian system. As the result, driver has to swap
> > if he was really aware that he limits the performant use of the fec to
> > big endian systems, can you please make him stop designing hardware!?
> > 
> You over looked my power :)  BTW, he had left Freescale.
so everything seems OK with your power :-)

> > > +      * every frame going to and coming from the controller.
> > > +      */
> > > +     if (fec_is_enetmac)
> > > +             swap_buffer(bufaddr, skb->len);
> > > +
> > >       /* Save skb pointer */
> > >       fep->tx_skbuff[fep->skb_cur] = skb;
> > >
> > > @@ -487,6 +522,9 @@ fec_enet_rx(struct net_device *dev)
> > >               dma_unmap_single(NULL, bdp->cbd_bufaddr, bdp->cbd_datlen,
> > >                               DMA_FROM_DEVICE);
> > >
> > > +             if (fec_is_enetmac)
> > > +                     swap_buffer(data, pkt_len);
> > > +
> > >               /* This does 16 byte alignment, exactly what we need.
> > >                * The packet length includes FCS, but we don't want to
> > >                * include that when passing upstream as it messes up
> > > @@ -689,6 +727,7 @@ static int fec_enet_mii_probe(struct net_device *dev)
> > >       char mdio_bus_id[MII_BUS_ID_SIZE];
> > >       char phy_name[MII_BUS_ID_SIZE + 3];
> > >       int phy_id;
> > > +     int dev_id = fep->pdev->id;
> > >
> > >       fep->phy_dev = NULL;
> > >
> > > @@ -700,6 +739,8 @@ static int fec_enet_mii_probe(struct net_device *dev)
> > >                       continue;
> > >               if (fep->mii_bus->phy_map[phy_id]->phy_id == 0)
> > >                       continue;
> > > +             if (fec_is_enetmac && dev_id--)
> > > +                     continue;
> > >               strncpy(mdio_bus_id, fep->mii_bus->id, MII_BUS_ID_SIZE);
> > >               break;
> > >       }
> > > @@ -741,6 +782,28 @@ static int fec_enet_mii_init(struct platform_device *pdev)
> > >       struct fec_enet_private *fep = netdev_priv(dev);
> > >       int err = -ENXIO, i;
> > >
> > > +     /*
> > > +      * The dual fec interfaces are not equivalent with enet-mac.
> > > +      * Here are the differences:
> > > +      *
> > > +      *  - fec0 supports MII & RMII modes while fec1 only supports RMII
> > > +      *  - fec0 acts as the 1588 time master while fec1 is slave
> > > +      *  - external phys can only be configured by fec0
> > > +      *
> > > +      * That is to say fec1 can not work independently. It only works
> > > +      * when fec0 is working. The reason behind this design is that the
> > > +      * second interface is added primarily for Switch mode.
> > > +      *
> > > +      * Because of the last point above, both phys are attached on fec0
> > > +      * mdio interface in board design, and need to be configured by
> > > +      * fec0 mii_bus.
> > > +      */
> > > +     if (fec_is_enetmac && pdev->id) {
> > > +             /* fec1 uses fec0 mii_bus */
> > > +             fep->mii_bus = fec_mii_bus;
> > > +             return 0;
> > > +     }
> > > +
> > >       fep->mii_timeout = 0;
> > >
> > >       /*
> > > @@ -777,6 +840,10 @@ static int fec_enet_mii_init(struct platform_device *pdev)
> > >       if (mdiobus_register(fep->mii_bus))
> > >               goto err_out_free_mdio_irq;
> > >
> > > +     /* save fec0 mii_bus */
> > > +     if (fec_is_enetmac)
> > > +             fec_mii_bus = fep->mii_bus;
> > > +
> > >       return 0;
> > >
> > >  err_out_free_mdio_irq:
> > > @@ -1149,11 +1216,22 @@ fec_restart(struct net_device *dev, int duplex)
> > >  {
> > >       struct fec_enet_private *fep = netdev_priv(dev);
> > >       int i;
> > > +     u32 val, temp_mac[2];
> > >
> > >       /* Whack a reset.  We should wait for this. */
> > >       writel(1, fep->hwp + FEC_ECNTRL);
> > >       udelay(10);
> > >
> > > +     /*
> > > +      * enet-mac reset will reset mac address registers too,
> > > +      * so need to reconfigure it.
> > > +      */
> > > +     if (fec_is_enetmac) {
> > > +             memcpy(&temp_mac, dev->dev_addr, ETH_ALEN);
> 		  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > +             writel(cpu_to_be32(temp_mac[0]), fep->hwp + FEC_ADDR_LOW);
> > > +             writel(cpu_to_be32(temp_mac[1]), fep->hwp + FEC_ADDR_HIGH);
> > where is the value saved to temp_mac[]?  For me it looks you write
> > uninitialized data into the mac registers.
> 
> memcpy above.
oops.  right.  I looked for something like

	temp_mac[0] = dev->dev_addr[0] << $shiftfor0 | ...

which AFAIK is what you want here.  memcpy is sensible to (at least)
endian issues.  If you ask me factor out the setting of the mac address
in probe to a function and use that here, too.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply

* Bad TCP timestamps on non-PC platforms
From: Alex Dubov @ 2011-01-06  6:55 UTC (permalink / raw)
  To: netdev

Greetings.
I'm dealing with 2.6.37-rc7 kernel on MPC8548 platform.

It so appears, that recent kernels have sysctl_tcp_timestamps set to "1"
by default.

On embedded platforms, where real time clock is initialized lately or
absent outright, this causes TSval field of outgoing TCP packets to be
set to some garbage value, in my case in the vicinity of 0xffffffff. As a
result, other Linux machines silently drop such packets, preventing normal
completion of network boot or any other TCP dependent operation.

Therefore, I feel, two changes are necessary (I can send in a patch):
1. Make sysctl_tcp_timestamps value config-time selectable (it must be
disabled by default on machines without RTC).
2. When re-enabling tcp_timestamps through sysctl, reset the timestamp
counter to the current system clock value.

And an optional, tricky one:
3. Postpone TCP timestamp counter initialization until RTC is actually
available (if RTC is connected to i2c bus, TCP is initialized well ahead
of it).



      

^ permalink raw reply

* Re: [RFC] sched: CHOKe packet scheduler (v0.2)
From: Stephen Hemminger @ 2011-01-06  6:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1294286850.2723.65.camel@edumazet-laptop>

On Thu, 06 Jan 2011 05:07:30 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> - CHOKe should be able to use an external flow classifier (like say...
> SFQ) to compute a token and compare two skbs by this token instead of
> custom rxhash or whatever.

I don't think you can apply a filter to a non-classful qdisc?



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox