Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:39 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270042356.26743.30.camel@bigi>



On Wed, 2010-03-31 at 09:32 -0400, jamal wrote:

> I did not touch pfkey. That behavior remains there. 

Sorry - I lied. I did touch pfkey. Here was my reasoning.

RFC 2367 says flushing behavior should be:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush event to ALL listeners

This is not realistic today in the presence of selinux policies
which may reject the flush etc. So we make the sequence become:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush response to originater from #1
4) if there were no errors then:
kernel -> user space: flush event to ALL listeners

This was in the logs.

cheers,
jamal


^ permalink raw reply

* Re: [r8169] WARNING: at net/sched/sch_generic.c
From: Neil Horman @ 2010-03-31 13:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Sergey Senozhatsky, netdev, Francois Romieu, David S. Miller,
	linux-kernel
In-Reply-To: <1270038569.2103.24.camel@edumazet-laptop>

On Wed, Mar 31, 2010 at 02:29:29PM +0200, Eric Dumazet wrote:
> Le mercredi 31 mars 2010 à 15:14 +0300, Sergey Senozhatsky a écrit :
> 
> 
> > PKT_SIZE="pkt_size 2048"
> > 
> 
> If you use 1024 bytes pktgen messages, do you still have the problem ?
> 
+1 I wouldn't be suprised if using something over the nominal 1522 byte frame
length on r8169 caused tx errors.  The driver doesn't seem to support Jumbo
frames, so my guess is you have to keep the packet size below 1522 bytes.
Neil

> 

^ permalink raw reply

* Re: [PATCH] can: Add driver for SJA1000 based PCI CAN interface cards by esd
From: Wolfgang Grandegger @ 2010-03-31 13:35 UTC (permalink / raw)
  To: Matthias Fuchs
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201003311428.05749.matthias.fuchs-iOnpLzIbIdM@public.gmane.org>

Hi Matthias,

Matthias Fuchs wrote:
> This patch adds support for SJA1000 based PCI CAN interface cards
> from electronic system design gmbh.
> 
> These boards are supported:
> 
>         CAN-PCI/200 (PCI)
>         CAN-PCI/266 (PCI)
>         CAN-PMC266 (PMC module)
>         CAN-PCIe/2000 (PCI Express)
>         CAN-CPCI/200 (Compact PCI, 3U)
>         CAN-PCI104 (PCI104)
> 
> This driver is part of the SocketCAN SVN repository since
> April 2009.
> 
> Signed-off-by: Matthias Fuchs <matthias.fuchs-iOnpLzIbIdM@public.gmane.org>

Since a while we have a generic PCI PLX driver in the mainline kernel,
which could support the esd cards as well, I believe.

Wolfgang.

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:32 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331132622.GA13908@gondor.apana.org.au>

On Wed, 2010-03-31 at 21:26 +0800, Herbert Xu wrote:


> In fact the previous behaviour is also consistent with RFC2367:

I did not touch pfkey. That behavior remains there. 

cheers,
jamal



^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:28 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331131131.GA13793@gondor.apana.org.au>

On Wed, 2010-03-31 at 21:11 +0800, Herbert Xu wrote:

> I disagree.  A flush event is a signal that someone has sent a
> flush command.  

Sorry - I respectfully disagree.

> In any case we've had this semantics for years
> and I haven't heard a good reason why this should be changed.

It generates unnecessary noise and it is a deviation like i mentioned.

> > This is a consistent definition of the semantics everywhere tables
> > are flushed (not just in Linux)..
> 
> Please give specific examples in the kernel.

Something i can do safely right now without messing my connection; 
Issue iproute commands in one window, observe events in another

-sudo ip route add 192.168.11.100 dev eth0 table 15
generates an event
-sudo ip route flush table 15
generates an event
-sudo ip route flush table 15
No event

But pick anything else in the other netlink knowledgeable subsystem
and youd see similar behavior.

If there was an app depending on this behavior - thats a separate reason
(but thats not the arguement you are making).

cheers,
jamal


^ permalink raw reply

* Re: [PATCH 3/7] flow: allocate hash table for online cpus only
From: Timo Teräs @ 2010-03-31 13:27 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Herbert Xu, netdev
In-Reply-To: <201003312302.51683.rusty@rustcorp.com.au>

Rusty Russell wrote:
> On Tue, 30 Mar 2010 10:42:55 pm Herbert Xu wrote:
>> On Mon, Mar 29, 2010 at 05:12:40PM +0300, Timo Teras wrote:
>>> Instead of unconditionally allocating hash table for all possible
>>> cpu's, allocate it only for online cpu's and release related
>>> memory if cpu goes down.
>>>
>>> Signed-off-by: Timo Teras <timo.teras@iki.fi>
>> Hmm that's where we started but then Rusty changed it back in 2004:
>>
>> So I'd like to hear his opinion on changing it back again.
> 
> It was pretty unique at the time, it no longer is, so the arguments are less
> compelling IMHO.
> 
> However, we can now use a dynamic percpu variable and get it as a real
> per-cpu thing (which currently means it *will* be for every available cpu,
> not just online ones).  Haven't thought about it, but that change might be
> worth considering instead?

I did convert most of the static percpu variables to a struct which
is allocated dynamically using alloc_percpu. See:
 http://marc.info/?l=linux-netdev&m=127003066905912&w=2

This patch is on top of that, to avoid allocating the larger hash
table unconditionally as amount of possible cpu's can be large.
If you take a look at the actual patch to add back the hash allocation
for only 'online' cpu's, it's not that complicated IMHO:
 http://marc.info/?l=linux-netdev&m=126987200927472&w=2


^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: Herbert Xu @ 2010-03-31 13:26 UTC (permalink / raw)
  To: jamal; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331131131.GA13793@gondor.apana.org.au>

On Wed, Mar 31, 2010 at 09:11:31PM +0800, Herbert Xu wrote:
> 
> > This is a consistent definition of the semantics everywhere tables
> > are flushed (not just in Linux)..
> 
> Please give specific examples in the kernel.

In fact the previous behaviour is also consistent with RFC2367:

3.1.9 SADB_FLUSH

   The SADB_FLUSH message causes the kernel to delete all entries in its
   key table for a certain sadb_msg_satype.  Only the base header is
   required for a flush message.  If sadb_msg_satype is filled in with a
   specific value, only associations of that type are deleted.  If it is
   filled in with SADB_SATYPE_UNSPEC, ALL associations are deleted.

     The messaging behavior for SADB_FLUSH is:

           Send an SADB_FLUSH message from a user process to the kernel.

           <base>

           The kernel will return an SADB_FLUSH message to all listening
           sockets.

           <base>

           The reply message happens only after the actual flushing
           of security associations has been attempted.

There is no special treatment for an empty DB.

Please send a revert.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: Herbert Xu @ 2010-03-31 13:11 UTC (permalink / raw)
  To: jamal; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270040773.26743.12.camel@bigi>

On Wed, Mar 31, 2010 at 09:06:13AM -0400, jamal wrote:
> On Wed, 2010-03-31 at 19:03 +0800, Herbert Xu wrote:
> 
> > This seems to be bogus to me.  Just because the DB was empty
> > before the flush doesn't mean that the flush didn't happen.
> 
> Herbert, If a tree falls in a forest and no one is around to hear it,
> does it make a sound? ;->
> A flush event is meant to be a signal to user space that what
> was once a non-empty table is now empty. 

I disagree.  A flush event is a signal that someone has sent a
flush command.  In any case we've had this semantics for years
and I haven't heard a good reason why this should be changed.

> This is a consistent definition of the semantics everywhere tables
> are flushed (not just in Linux)..

Please give specific examples in the kernel.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:06 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331110345.GC12845@gondor.apana.org.au>

On Wed, 2010-03-31 at 19:03 +0800, Herbert Xu wrote:

> This seems to be bogus to me.  Just because the DB was empty
> before the flush doesn't mean that the flush didn't happen.

Herbert, If a tree falls in a forest and no one is around to hear it,
does it make a sound? ;->
A flush event is meant to be a signal to user space that what
was once a non-empty table is now empty. 
This is a consistent definition of the semantics everywhere tables
are flushed (not just in Linux)..

What makes the SPD and SAD speacial?

cheers,
jamal




^ permalink raw reply

* Severe regression in bnx2 driver with bonding in post 2.6.30 kernels
From: Stuart Shelton @ 2010-03-31 12:55 UTC (permalink / raw)
  To: mchan, netdev


Hi all,

The Broadcom NetXtreme II driver appears to have a severe regression in 
all kernels post 2.6.30 - I've observed problems with 2.6.31, 2.6.32. 
and 2.6.33.

The hardware impacted is an IBM Bladecenter LS21 Blade, model 7971.  We 
have a large number of these, and all are affected.

We use generic channel-bonding, with the following options in modprobe.conf:

alias bond0 bonding
options bond0 mode=0 miimon=100

With any kernel prior to 2.6.31, the dmesg output reads:

Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
   alloc irq_desc for 17 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
...
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
   alloc irq_desc for 18 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth1 to eg1
udev: renamed network interface eth0 to eg0
...
   alloc irq_desc for 32 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
   alloc irq_desc for 33 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bonding: bond0: enslaving eg0 as an active interface with a down link.
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg1: using MSI
bonding: bond0: enslaving eg1 as an active interface with a down link.
bonding: bond0: link status definitely up for interface eg0.
bonding: bond0: link status definitely up for interface eg1.
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON


... however, with kernels from 2.6.31 and later, the dmesg output reads:

Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
   alloc irq_desc for 17 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
   alloc irq_desc for 18 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth1 to eg1
udev: renamed network interface eth0 to eg0
...
   alloc irq_desc for 32 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
   alloc irq_desc for 33 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg0 NIC SerDes Link is Down
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: Chip reset did not complete
bnx2: eg1 NIC SerDes Link is Down
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
NET: Registered protocol family 17
bnx2 0000:02:05.0: PCI INT A disabled
bnx2 0000:02:04.0: PCI INT A disabled
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth0 to eg0
udev: renamed network interface eth1 to eg1
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:04.0: irq 33 for MSI/MSI-X
bnx2: eg1 NIC SerDes Link is Down
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 33 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: PCI INT A disabled
bnx2 0000:02:04.0: PCI INT A disabled
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
udev: renamed network interface eth0 to eg0
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth0 to eg1
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete

... (this later ouput showing the initial attempt to raise the 
interfaces at boot, and then me manually removing and re-inserting the 
bnx2 driver).  Alongside this, the console outputs "SIOCSIFFLAGS: Device 
or resource busy".

On these more recent kernels, the SIOCSIFFLAGS line is always output, 
but about 50% of the time the network interface is raised.  When this 
fails, then sometimes removing and re-inserting the bnx2 driver can 
result in usable non-bonded interfaces - but as often as not the NICs 
won't be usable even in a standard non-bonded configuration.

With a simple reboot back to a 2.6.30 or earlier kernel, the problem 
goes away (even though the firmware file on disk is the same as that 
used with the later kernels).  Ever blade we have is affected, so this 
is not a hardware problem (or at least, if it is, then it's a very 
common one!).  I thought that the problem might only occur when bonding 
is used - but I can't now recall what made me think this, and I've not 
been able to get the server down-time to extensively test the issue further.

Any advice/guidance greatly appreciated,

Stuart

^ permalink raw reply

* [net-next PATCH] be2net: Adding PCI SRIOV support
From: Sarveshwar Bandi @ 2010-03-31 12:56 UTC (permalink / raw)
  To: davem; +Cc: netdev

- Patch adds support to enable PCI SRIOV in the driver and changes to handle initialization of PCI virtual functions.
- Function handler to change mac addresses for VF from its corresponding PF.

Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
---
 drivers/net/benet/be.h      |    9 ++
 drivers/net/benet/be_cmds.c |    4 +
 drivers/net/benet/be_cmds.h |    2 
 drivers/net/benet/be_hw.h   |    3 +
 drivers/net/benet/be_main.c |  222 ++++++++++++++++++++++++++++++++++---------
 5 files changed, 191 insertions(+), 49 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 8f07525..20842c5 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -83,6 +83,8 @@ #define RX_FRAGS_REFILL_WM	(RX_Q_LEN - M
 
 #define FW_VER_LEN		32
 
+#define BE_MAX_VF		32
+
 struct be_dma_mem {
 	void *va;
 	dma_addr_t dma;
@@ -280,8 +282,15 @@ struct be_adapter {
 	u8 port_type;
 	u8 transceiver;
 	u8 generation;		/* BladeEngine ASIC generation */
+
+	bool sriov_enabled;
+	u32 vf_if_handle[BE_MAX_VF];
+	u32 vf_pmac_id[BE_MAX_VF];
+	u8 base_eq_id;
 };
 
+#define be_physfn(adapter) (!adapter->pdev->is_virtfn)
+
 /* BladeEngine Generation numbers */
 #define BE_GEN2 2
 #define BE_GEN3 3
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 50e6259..9f53d9e 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -843,7 +843,8 @@ int be_cmd_q_destroy(struct be_adapter *
  * Uses mbox
  */
 int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags, u32 en_flags,
-		u8 *mac, bool pmac_invalid, u32 *if_handle, u32 *pmac_id)
+		u8 *mac, bool pmac_invalid, u32 *if_handle, u32 *pmac_id,
+		u32 domain)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_if_create *req;
@@ -860,6 +861,7 @@ int be_cmd_if_create(struct be_adapter *
 	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
 		OPCODE_COMMON_NTWK_INTERFACE_CREATE, sizeof(*req));
 
+	req->hdr.domain = domain;
 	req->capability_flags = cpu_to_le32(cap_flags);
 	req->enable_flags = cpu_to_le32(en_flags);
 	req->pmac_invalid = pmac_invalid;
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index cce61f9..763dc19 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -878,7 +878,7 @@ extern int be_cmd_pmac_add(struct be_ada
 extern int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id, u32 pmac_id);
 extern int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags,
 			u32 en_flags, u8 *mac, bool pmac_invalid,
-			u32 *if_handle, u32 *pmac_id);
+			u32 *if_handle, u32 *pmac_id, u32 domain);
 extern int be_cmd_if_destroy(struct be_adapter *adapter, u32 if_handle);
 extern int be_cmd_eq_create(struct be_adapter *adapter,
 			struct be_queue_info *eq, int eq_delay);
diff --git a/drivers/net/benet/be_hw.h b/drivers/net/benet/be_hw.h
index 2d4a4b8..063026d 100644
--- a/drivers/net/benet/be_hw.h
+++ b/drivers/net/benet/be_hw.h
@@ -99,6 +99,9 @@ #define DB_MCCQ_RING_ID_MASK		0x7FF	/* b
 /* Number of entries posted */
 #define DB_MCCQ_NUM_POSTED_SHIFT	(16)	/* bits 16 - 29 */
 
+/********** SRIOV VF PCICFG OFFSET ********/
+#define SRIOV_VF_PCICFG_OFFSET		(4096)
+
 /* Flashrom related descriptors */
 #define IMAGE_TYPE_FIRMWARE		160
 #define IMAGE_TYPE_BOOTCODE		224
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 50ea056..881c87d 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -26,8 +26,11 @@ MODULE_AUTHOR("ServerEngines Corporation
 MODULE_LICENSE("GPL");
 
 static unsigned int rx_frag_size = 2048;
+static unsigned int num_vfs;
 module_param(rx_frag_size, uint, S_IRUGO);
+module_param(num_vfs, uint, S_IRUGO);
 MODULE_PARM_DESC(rx_frag_size, "Size of a fragment that holds rcvd data.");
+MODULE_PARM_DESC(num_vfs, "Number of PCI VFs to initialize");
 
 static DEFINE_PCI_DEVICE_TABLE(be_dev_ids) = {
 	{ PCI_DEVICE(BE_VENDOR_ID, BE_DEVICE_ID1) },
@@ -138,12 +141,19 @@ static int be_mac_addr_set(struct net_de
 	if (!is_valid_ether_addr(addr->sa_data))
 		return -EADDRNOTAVAIL;
 
+	/* MAC addr configuration will be done in hardware for VFs
+	 * by their corresponding PFs. Just copy to netdev addr here
+	 */
+	if (!be_physfn(adapter))
+		goto netdev_addr;
+
 	status = be_cmd_pmac_del(adapter, adapter->if_handle, adapter->pmac_id);
 	if (status)
 		return status;
 
 	status = be_cmd_pmac_add(adapter, (u8 *)addr->sa_data,
 			adapter->if_handle, &adapter->pmac_id);
+netdev_addr:
 	if (!status)
 		memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
 
@@ -576,6 +586,9 @@ static void be_vlan_add_vid(struct net_d
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
 
+	if (!be_physfn(adapter))
+		return;
+
 	adapter->vlan_tag[vid] = 1;
 	adapter->vlans_added++;
 	if (adapter->vlans_added <= (adapter->max_vlans + 1))
@@ -586,6 +599,9 @@ static void be_vlan_rem_vid(struct net_d
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
 
+	if (!be_physfn(adapter))
+		return;
+
 	adapter->vlan_tag[vid] = 0;
 	vlan_group_set_device(adapter->vlan_grp, vid, NULL);
 	adapter->vlans_added--;
@@ -623,6 +639,28 @@ done:
 	return;
 }
 
+static int be_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
+{
+	struct be_adapter *adapter = netdev_priv(netdev);
+	int status;
+
+	if (!adapter->sriov_enabled)
+		return -EPERM;
+
+	if (!is_valid_ether_addr(mac) || (vf >= num_vfs))
+		return -EINVAL;
+
+	status = be_cmd_pmac_del(adapter, adapter->vf_if_handle[vf],
+				adapter->vf_pmac_id[vf]);
+
+	status = be_cmd_pmac_add(adapter, mac, adapter->vf_if_handle[vf],
+				&adapter->vf_pmac_id[vf]);
+	if (!status)
+		dev_err(&adapter->pdev->dev, "MAC %pM set on VF %d Failed\n",
+				mac, vf);
+	return status;
+}
+
 static void be_rx_rate_update(struct be_adapter *adapter)
 {
 	struct be_drvr_stats *stats = drvr_stats(adapter);
@@ -1281,6 +1319,8 @@ static int be_tx_queues_create(struct be
 	/* Ask BE to create Tx Event queue */
 	if (be_cmd_eq_create(adapter, eq, adapter->tx_eq.cur_eqd))
 		goto tx_eq_free;
+	adapter->base_eq_id = adapter->tx_eq.q.id;
+
 	/* Alloc TX eth compl queue */
 	cq = &adapter->tx_obj.cq;
 	if (be_queue_alloc(adapter, cq, TX_CQ_LEN,
@@ -1408,7 +1448,7 @@ rx_eq_free:
 /* There are 8 evt ids per func. Retruns the evt id's bit number */
 static inline int be_evt_bit_get(struct be_adapter *adapter, u32 eq_id)
 {
-	return eq_id % 8;
+	return eq_id - adapter->base_eq_id;
 }
 
 static irqreturn_t be_intx(int irq, void *dev)
@@ -1586,6 +1626,28 @@ static void be_msix_enable(struct be_ada
 	return;
 }
 
+static void be_sriov_enable(struct be_adapter *adapter)
+{
+#ifdef CONFIG_PCI_IOV
+	int status;
+	if (be_physfn(adapter) && num_vfs) {
+		status = pci_enable_sriov(adapter->pdev, num_vfs);
+		adapter->sriov_enabled = status ? false : true;
+	}
+#endif
+	return;
+}
+
+static void be_sriov_disable(struct be_adapter *adapter)
+{
+#ifdef CONFIG_PCI_IOV
+	if (adapter->sriov_enabled) {
+		pci_disable_sriov(adapter->pdev);
+		adapter->sriov_enabled = false;
+	}
+#endif
+}
+
 static inline int be_msix_vec_get(struct be_adapter *adapter, u32 eq_id)
 {
 	return adapter->msix_entries[
@@ -1643,6 +1705,9 @@ static int be_irq_register(struct be_ada
 		status = be_msix_register(adapter);
 		if (status == 0)
 			goto done;
+		/* INTx is not supported for VF */
+		if (!be_physfn(adapter))
+			return status;
 	}
 
 	/* INTx */
@@ -1716,14 +1781,17 @@ static int be_open(struct net_device *ne
 		goto ret_sts;
 	be_link_status_update(adapter, link_up);
 
-	status = be_vid_config(adapter);
+	if (be_physfn(adapter))
+		status = be_vid_config(adapter);
 	if (status)
 		goto ret_sts;
 
-	status = be_cmd_set_flow_control(adapter,
-					adapter->tx_fc, adapter->rx_fc);
-	if (status)
-		goto ret_sts;
+	if (be_physfn(adapter)) {
+		status = be_cmd_set_flow_control(adapter,
+				adapter->tx_fc, adapter->rx_fc);
+		if (status)
+			goto ret_sts;
+	}
 
 	schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
 ret_sts:
@@ -1771,22 +1839,48 @@ static int be_setup_wol(struct be_adapte
 static int be_setup(struct be_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
-	u32 cap_flags, en_flags;
+	u32 cap_flags, en_flags, vf = 0;
 	int status;
+	u8 mac[ETH_ALEN];
+
+	cap_flags = en_flags = BE_IF_FLAGS_UNTAGGED | BE_IF_FLAGS_BROADCAST;
 
-	cap_flags = BE_IF_FLAGS_UNTAGGED | BE_IF_FLAGS_BROADCAST |
-			BE_IF_FLAGS_MCAST_PROMISCUOUS |
-			BE_IF_FLAGS_PROMISCUOUS |
-			BE_IF_FLAGS_PASS_L3L4_ERRORS;
-	en_flags = BE_IF_FLAGS_UNTAGGED | BE_IF_FLAGS_BROADCAST |
-			BE_IF_FLAGS_PASS_L3L4_ERRORS;
+	if (be_physfn(adapter)) {
+		cap_flags |= BE_IF_FLAGS_MCAST_PROMISCUOUS |
+				BE_IF_FLAGS_PROMISCUOUS |
+				BE_IF_FLAGS_PASS_L3L4_ERRORS;
+		en_flags |= BE_IF_FLAGS_PASS_L3L4_ERRORS;
+	}
 
 	status = be_cmd_if_create(adapter, cap_flags, en_flags,
 			netdev->dev_addr, false/* pmac_invalid */,
-			&adapter->if_handle, &adapter->pmac_id);
+			&adapter->if_handle, &adapter->pmac_id, 0);
 	if (status != 0)
 		goto do_none;
 
+	if (be_physfn(adapter)) {
+		while (vf < num_vfs) {
+			cap_flags = en_flags = BE_IF_FLAGS_UNTAGGED
+					| BE_IF_FLAGS_BROADCAST;
+			status = be_cmd_if_create(adapter, cap_flags, en_flags,
+					mac, true, &adapter->vf_if_handle[vf],
+					NULL, vf+1);
+			if (status) {
+				dev_err(&adapter->pdev->dev,
+				"Interface Create failed for VF %d\n", vf);
+				goto if_destroy;
+			}
+			vf++;
+		} while (vf < num_vfs);
+	} else if (!be_physfn(adapter)) {
+		status = be_cmd_mac_addr_query(adapter, mac,
+			MAC_ADDRESS_TYPE_NETWORK, false, adapter->if_handle);
+		if (!status) {
+			memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
+			memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
+		}
+	}
+
 	status = be_tx_queues_create(adapter);
 	if (status != 0)
 		goto if_destroy;
@@ -1808,6 +1902,9 @@ rx_qs_destroy:
 tx_qs_destroy:
 	be_tx_queues_destroy(adapter);
 if_destroy:
+	for (vf = 0; vf < num_vfs; vf++)
+		if (adapter->vf_if_handle[vf])
+			be_cmd_if_destroy(adapter, adapter->vf_if_handle[vf]);
 	be_cmd_if_destroy(adapter, adapter->if_handle);
 do_none:
 	return status;
@@ -2088,6 +2185,7 @@ static struct net_device_ops be_netdev_o
 	.ndo_vlan_rx_register	= be_vlan_register,
 	.ndo_vlan_rx_add_vid	= be_vlan_add_vid,
 	.ndo_vlan_rx_kill_vid	= be_vlan_rem_vid,
+	.ndo_set_vf_mac		= be_set_vf_mac
 };
 
 static void be_netdev_init(struct net_device *netdev)
@@ -2129,37 +2227,48 @@ static void be_unmap_pci_bars(struct be_
 		iounmap(adapter->csr);
 	if (adapter->db)
 		iounmap(adapter->db);
-	if (adapter->pcicfg)
+	if (adapter->pcicfg && be_physfn(adapter))
 		iounmap(adapter->pcicfg);
 }
 
 static int be_map_pci_bars(struct be_adapter *adapter)
 {
 	u8 __iomem *addr;
-	int pcicfg_reg;
+	int pcicfg_reg, db_reg;
 
-	addr = ioremap_nocache(pci_resource_start(adapter->pdev, 2),
-			pci_resource_len(adapter->pdev, 2));
-	if (addr == NULL)
-		return -ENOMEM;
-	adapter->csr = addr;
-
-	addr = ioremap_nocache(pci_resource_start(adapter->pdev, 4),
-			128 * 1024);
-	if (addr == NULL)
-		goto pci_map_err;
-	adapter->db = addr;
+	if (be_physfn(adapter)) {
+		addr = ioremap_nocache(pci_resource_start(adapter->pdev, 2),
+				pci_resource_len(adapter->pdev, 2));
+		if (addr == NULL)
+			return -ENOMEM;
+		adapter->csr = addr;
+	}
 
-	if (adapter->generation == BE_GEN2)
+	if (adapter->generation == BE_GEN2) {
 		pcicfg_reg = 1;
-	else
+		db_reg = 4;
+	} else {
 		pcicfg_reg = 0;
-
-	addr = ioremap_nocache(pci_resource_start(adapter->pdev, pcicfg_reg),
-			pci_resource_len(adapter->pdev, pcicfg_reg));
+		if (be_physfn(adapter))
+			db_reg = 4;
+		else
+			db_reg = 0;
+	}
+	addr = ioremap_nocache(pci_resource_start(adapter->pdev, db_reg),
+				pci_resource_len(adapter->pdev, db_reg));
 	if (addr == NULL)
 		goto pci_map_err;
-	adapter->pcicfg = addr;
+	adapter->db = addr;
+
+	if (be_physfn(adapter)) {
+		addr = ioremap_nocache(
+				pci_resource_start(adapter->pdev, pcicfg_reg),
+				pci_resource_len(adapter->pdev, pcicfg_reg));
+		if (addr == NULL)
+			goto pci_map_err;
+		adapter->pcicfg = addr;
+	} else
+		adapter->pcicfg = adapter->db + SRIOV_VF_PCICFG_OFFSET;
 
 	return 0;
 pci_map_err:
@@ -2273,6 +2382,8 @@ static void __devexit be_remove(struct p
 
 	be_ctrl_cleanup(adapter);
 
+	be_sriov_disable(adapter);
+
 	be_msix_disable(adapter);
 
 	pci_set_drvdata(pdev, NULL);
@@ -2297,16 +2408,20 @@ static int be_get_config(struct be_adapt
 		return status;
 
 	memset(mac, 0, ETH_ALEN);
-	status = be_cmd_mac_addr_query(adapter, mac,
+
+	if (be_physfn(adapter)) {
+		status = be_cmd_mac_addr_query(adapter, mac,
 			MAC_ADDRESS_TYPE_NETWORK, true /*permanent */, 0);
-	if (status)
-		return status;
 
-	if (!is_valid_ether_addr(mac))
-		return -EADDRNOTAVAIL;
+		if (status)
+			return status;
 
-	memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
-	memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
+		if (!is_valid_ether_addr(mac))
+			return -EADDRNOTAVAIL;
+
+		memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
+		memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
+	}
 
 	if (adapter->cap & 0x400)
 		adapter->max_vlans = BE_NUM_VLANS_SUPPORTED/4;
@@ -2323,6 +2438,7 @@ static int __devinit be_probe(struct pci
 	struct be_adapter *adapter;
 	struct net_device *netdev;
 
+
 	status = pci_enable_device(pdev);
 	if (status)
 		goto do_none;
@@ -2371,24 +2487,28 @@ static int __devinit be_probe(struct pci
 		}
 	}
 
+	be_sriov_enable(adapter);
+
 	status = be_ctrl_init(adapter);
 	if (status)
 		goto free_netdev;
 
 	/* sync up with fw's ready state */
-	status = be_cmd_POST(adapter);
-	if (status)
-		goto ctrl_clean;
+	if (be_physfn(adapter)) {
+		status = be_cmd_POST(adapter);
+		if (status)
+			goto ctrl_clean;
+
+		status = be_cmd_reset_function(adapter);
+		if (status)
+			goto ctrl_clean;
+	}
 
 	/* tell fw we're ready to fire cmds */
 	status = be_cmd_fw_init(adapter);
 	if (status)
 		goto ctrl_clean;
 
-	status = be_cmd_reset_function(adapter);
-	if (status)
-		goto ctrl_clean;
-
 	status = be_stats_init(adapter);
 	if (status)
 		goto ctrl_clean;
@@ -2418,6 +2538,7 @@ ctrl_clean:
 	be_ctrl_cleanup(adapter);
 free_netdev:
 	be_msix_disable(adapter);
+	be_sriov_disable(adapter);
 	free_netdev(adapter->netdev);
 	pci_set_drvdata(pdev, NULL);
 rel_reg:
@@ -2614,6 +2735,13 @@ static int __init be_init_module(void)
 		rx_frag_size = 2048;
 	}
 
+	if (num_vfs > 32) {
+		printk(KERN_WARNING DRV_NAME
+			" : Module param num_vfs must not be greater than 32."
+			"Using 32\n");
+		num_vfs = 32;
+	}
+
 	return pci_register_driver(&be_driver);
 }
 module_init(be_init_module);
-- 
1.4.0


^ permalink raw reply related

* [PATCH] virtio-net: move sg off stack
From: Michael S. Tsirkin @ 2010-03-31 12:41 UTC (permalink / raw)
  To: David S. Miller, Rusty Russell, Jiri Pirko, Michael S. Tsirkin,
	Shirley Ma, ne
In-Reply-To: <20100331.031636.04083770.davem@davemloft.net>

Move sg structure off stack and into virtnet_info structure.
This helps remove extra sg_init_table calls as well as reduce
stack usage.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Compile-tested only for now.
Shirley, Rusty Russell, does this fix CONFIG_DEBUG_SG=y for you?

 drivers/net/virtio_net.c |   52 ++++++++++++++++++++++-----------------------
 1 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3f5be35..93dcde2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -39,8 +39,7 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX    2
 
-struct virtnet_info
-{
+struct virtnet_info {
 	struct virtio_device *vdev;
 	struct virtqueue *rvq, *svq, *cvq;
 	struct net_device *dev;
@@ -61,6 +60,10 @@ struct virtnet_info
 
 	/* Chain pages by the private ptr. */
 	struct page *pages;
+
+	/* fragments + linear part + virtio header */
+	struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
+	struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
 };
 
 struct skb_vnet_hdr {
@@ -323,10 +326,8 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	struct skb_vnet_hdr *hdr;
-	struct scatterlist sg[2];
 	int err;
 
-	sg_init_table(sg, 2);
 	skb = netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN);
 	if (unlikely(!skb))
 		return -ENOMEM;
@@ -334,11 +335,11 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
 	skb_put(skb, MAX_PACKET_LEN);
 
 	hdr = skb_vnet_hdr(skb);
-	sg_set_buf(sg, &hdr->hdr, sizeof hdr->hdr);
+	sg_set_buf(vi->rx_sg, &hdr->hdr, sizeof hdr->hdr);
 
-	skb_to_sgvec(skb, sg + 1, 0, skb->len);
+	skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
 
-	err = vi->rvq->vq_ops->add_buf(vi->rvq, sg, 0, 2, skb);
+	err = vi->rvq->vq_ops->add_buf(vi->rvq, vi->rx_sg, 0, 2, skb);
 	if (err < 0)
 		dev_kfree_skb(skb);
 
@@ -347,13 +348,11 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
 
 static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 {
-	struct scatterlist sg[MAX_SKB_FRAGS + 2];
 	struct page *first, *list = NULL;
 	char *p;
 	int i, err, offset;
 
-	sg_init_table(sg, MAX_SKB_FRAGS + 2);
-	/* page in sg[MAX_SKB_FRAGS + 1] is list tail */
+	/* page in vi->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
 	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
 		first = get_a_page(vi, gfp);
 		if (!first) {
@@ -361,7 +360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 				give_pages(vi, list);
 			return -ENOMEM;
 		}
-		sg_set_buf(&sg[i], page_address(first), PAGE_SIZE);
+		sg_set_buf(&vi->rx_sg[i], page_address(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
 		first->private = (unsigned long)list;
@@ -375,17 +374,17 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 	}
 	p = page_address(first);
 
-	/* sg[0], sg[1] share the same page */
-	/* a separated sg[0] for  virtio_net_hdr only during to QEMU bug*/
-	sg_set_buf(&sg[0], p, sizeof(struct virtio_net_hdr));
+	/* vi->rx_sg[0], vi->rx_sg[1] share the same page */
+	/* a separated vi->rx_sg[0] for virtio_net_hdr only due to QEMU bug */
+	sg_set_buf(&vi->rx_sg[0], p, sizeof(struct virtio_net_hdr));
 
-	/* sg[1] for data packet, from offset */
+	/* vi->rx_sg[1] for data packet, from offset */
 	offset = sizeof(struct padded_vnet_hdr);
-	sg_set_buf(&sg[1], p + offset, PAGE_SIZE - offset);
+	sg_set_buf(&vi->rx_sg[1], p + offset, PAGE_SIZE - offset);
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = vi->rvq->vq_ops->add_buf(vi->rvq, sg, 0, MAX_SKB_FRAGS + 2,
+	err = vi->rvq->vq_ops->add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
 				       first);
 	if (err < 0)
 		give_pages(vi, first);
@@ -396,16 +395,15 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
 {
 	struct page *page;
-	struct scatterlist sg;
 	int err;
 
 	page = get_a_page(vi, gfp);
 	if (!page)
 		return -ENOMEM;
 
-	sg_init_one(&sg, page_address(page), PAGE_SIZE);
+	sg_init_one(&vi->rx_sg, page_address(page), PAGE_SIZE);
 
-	err = vi->rvq->vq_ops->add_buf(vi->rvq, &sg, 0, 1, page);
+	err = vi->rvq->vq_ops->add_buf(vi->rvq, &vi->rx_sg, 0, 1, page);
 	if (err < 0)
 		give_pages(vi, page);
 
@@ -514,12 +512,9 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 {
-	struct scatterlist sg[2+MAX_SKB_FRAGS];
 	struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
 	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
 
-	sg_init_table(sg, 2+MAX_SKB_FRAGS);
-
 	pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -553,12 +548,13 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 
 	/* Encode metadata header at front. */
 	if (vi->mergeable_rx_bufs)
-		sg_set_buf(sg, &hdr->mhdr, sizeof hdr->mhdr);
+		sg_set_buf(vi->tx_sg, &hdr->mhdr, sizeof hdr->mhdr);
 	else
-		sg_set_buf(sg, &hdr->hdr, sizeof hdr->hdr);
+		sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
 
-	hdr->num_sg = skb_to_sgvec(skb, sg+1, 0, skb->len) + 1;
-	return vi->svq->vq_ops->add_buf(vi->svq, sg, hdr->num_sg, 0, skb);
+	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
+	return vi->svq->vq_ops->add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
+					0, skb);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -941,6 +937,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	vdev->priv = vi;
 	vi->pages = NULL;
 	INIT_DELAYED_WORK(&vi->refill, refill_work);
+	sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
+	sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
 
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
-- 
1.7.0.2.280.gc6f05

^ permalink raw reply related

* [PATCH] net/pcmcia/3c589_cs: using netdev_info and friends where appropriate
From: linux @ 2010-03-31 12:42 UTC (permalink / raw)
  To: Joe Perches
  Cc: David S. Miller, Ken Kawasaki, Dominik Brodowski, Magnus Damm,
	Ben Hutchings, netdev, linux-kernel, linux
In-Reply-To: <1269969733.4558.171.camel@Joe-Laptop.home>

From: Alexander Kurz <linux@kbdbabel.org>

Signed-off-by: Alexander Kurz <linux@kbdbabel.org>
---
 drivers/net/pcmcia/3c589_cs.c |  286 +++++++++++++++++++++--------------------
 1 files changed, 147 insertions(+), 139 deletions(-)

diff --git a/drivers/net/pcmcia/3c589_cs.c b/drivers/net/pcmcia/3c589_cs.c
index 091e0b0..580977f 100644
--- a/drivers/net/pcmcia/3c589_cs.c
+++ b/drivers/net/pcmcia/3c589_cs.c
@@ -1,20 +1,20 @@
 /*======================================================================
 
     A PCMCIA ethernet driver for the 3com 3c589 card.
-    
+
     Copyright (C) 1999 David A. Hinds -- dahinds@users.sourceforge.net
 
     3c589_cs.c 1.162 2001/10/13 00:08:50
 
     The network driver code is based on Donald Becker's 3c589 code:
-    
+
     Written 1994 by Donald Becker.
     Copyright 1993 United States Government as represented by the
     Director, National Security Agency.  This software may be used and
     distributed according to the terms of the GNU General Public License,
     incorporated herein by reference.
     Donald Becker may be reached at becker@scyld.com
-    
+
     Updated for 2.5.x by Alan Cox <alan@lxorguk.ukuu.org.uk>
 
 ======================================================================*/
@@ -69,31 +69,54 @@
 /* The top five bits written to EL3_CMD are a command, the lower
    11 bits are the parameter, if applicable. */
 enum c509cmd {
-    TotalReset = 0<<11, SelectWindow = 1<<11, StartCoax = 2<<11,
-    RxDisable = 3<<11, RxEnable = 4<<11, RxReset = 5<<11, RxDiscard = 8<<11,
-    TxEnable = 9<<11, TxDisable = 10<<11, TxReset = 11<<11,
-    FakeIntr = 12<<11, AckIntr = 13<<11, SetIntrEnb = 14<<11,
-    SetStatusEnb = 15<<11, SetRxFilter = 16<<11, SetRxThreshold = 17<<11,
-    SetTxThreshold = 18<<11, SetTxStart = 19<<11, StatsEnable = 21<<11,
-    StatsDisable = 22<<11, StopCoax = 23<<11,
+	TotalReset	= 0<<11,
+	SelectWindow	= 1<<11,
+	StartCoax	= 2<<11,
+	RxDisable	= 3<<11,
+	RxEnable	= 4<<11,
+	RxReset		= 5<<11,
+	RxDiscard	= 8<<11,
+	TxEnable	= 9<<11,
+	TxDisable	= 10<<11,
+	TxReset		= 11<<11,
+	FakeIntr	= 12<<11,
+	AckIntr		= 13<<11,
+	SetIntrEnb	= 14<<11,
+	SetStatusEnb	= 15<<11,
+	SetRxFilter	= 16<<11,
+	SetRxThreshold	= 17<<11,
+	SetTxThreshold	= 18<<11,
+	SetTxStart	= 19<<11,
+	StatsEnable	= 21<<11,
+	StatsDisable	= 22<<11,
+	StopCoax	= 23<<11
 };
 
 enum c509status {
-    IntLatch = 0x0001, AdapterFailure = 0x0002, TxComplete = 0x0004,
-    TxAvailable = 0x0008, RxComplete = 0x0010, RxEarly = 0x0020,
-    IntReq = 0x0040, StatsFull = 0x0080, CmdBusy = 0x1000
+	IntLatch	= 0x0001,
+	AdapterFailure	= 0x0002,
+	TxComplete	= 0x0004,
+	TxAvailable	= 0x0008,
+	RxComplete	= 0x0010,
+	RxEarly		= 0x0020,
+	IntReq		= 0x0040,
+	StatsFull	= 0x0080,
+	CmdBusy		= 0x1000
 };
 
 /* The SetRxFilter command accepts the following classes: */
 enum RxFilter {
-    RxStation = 1, RxMulticast = 2, RxBroadcast = 4, RxProm = 8
+	RxStation	= 1,
+	RxMulticast	= 2,
+	RxBroadcast	= 4,
+	RxProm		= 8
 };
 
 /* Register window 1 offsets, the window used in normal operation. */
 #define TX_FIFO		0x00
 #define RX_FIFO		0x00
-#define RX_STATUS 	0x08
-#define TX_STATUS 	0x0B
+#define RX_STATUS	0x08
+#define TX_STATUS	0x0B
 #define TX_FREE		0x0C	/* Remaining free bytes in Tx buffer. */
 
 #define WN0_IRQ		0x08	/* Window 0: Set IRQ line in bits 12-15. */
@@ -106,13 +129,13 @@ enum RxFilter {
 
 struct el3_private {
 	struct pcmcia_device	*p_dev;
-    dev_node_t 		node;
-    /* For transceiver monitoring */
-    struct timer_list	media;
-    u16			media_status;
-    u16			fast_poll;
-    unsigned long	last_irq;
-    spinlock_t		lock;
+	dev_node_t		node;
+	/* For transceiver monitoring */
+	struct timer_list	media;
+	u16			media_status;
+	u16			fast_poll;
+	unsigned long		last_irq;
+	spinlock_t		lock;
 };
 
 static const char *if_names[] = { "auto", "10baseT", "10base2", "AUI" };
@@ -164,15 +187,15 @@ static void tc589_detach(struct pcmcia_device *p_dev);
 ======================================================================*/
 
 static const struct net_device_ops el3_netdev_ops = {
-	.ndo_open 		= el3_open,
-	.ndo_stop 		= el3_close,
+	.ndo_open		= el3_open,
+	.ndo_stop		= el3_close,
 	.ndo_start_xmit		= el3_start_xmit,
-	.ndo_tx_timeout 	= el3_tx_timeout,
+	.ndo_tx_timeout		= el3_tx_timeout,
 	.ndo_set_config		= el3_config,
 	.ndo_get_stats		= el3_get_stats,
 	.ndo_set_multicast_list = set_multicast_list,
 	.ndo_change_mtu		= eth_change_mtu,
-	.ndo_set_mac_address 	= eth_mac_addr,
+	.ndo_set_mac_address	= eth_mac_addr,
 	.ndo_validate_addr	= eth_validate_addr,
 };
 
@@ -236,7 +259,7 @@ static void tc589_detach(struct pcmcia_device *link)
     tc589_config() is scheduled to run after a CARD_INSERTION event
     is received, to configure the PCMCIA socket, and to make the
     ethernet device available to the system.
-    
+
 ======================================================================*/
 
 static int tc589_config(struct pcmcia_device *link)
@@ -249,7 +272,7 @@ static int tc589_config(struct pcmcia_device *link)
     char *ram_split[] = {"5:3", "3:1", "1:1", "3:5"};
     u8 *buf;
     size_t len;
-    
+
     dev_dbg(&link->dev, "3c589_config\n");
 
     phys_addr = (__be16 *)dev->dev_addr;
@@ -278,7 +301,7 @@ static int tc589_config(struct pcmcia_device *link)
     ret = pcmcia_request_configuration(link, &link->conf);
     if (ret)
 	    goto failed;
-	
+
     dev->irq = link->irq.AssignedIRQ;
     dev->base_addr = link->io.BasePort1;
     ioaddr = dev->base_addr;
@@ -312,7 +335,7 @@ static int tc589_config(struct pcmcia_device *link)
 	dev->if_port = if_port;
     else
 	printk(KERN_ERR "3c589_cs: invalid if_port requested\n");
-    
+
     link->dev_node = &lp->node;
     SET_NETDEV_DEV(dev, &link->dev);
 
@@ -324,13 +347,12 @@ static int tc589_config(struct pcmcia_device *link)
 
     strcpy(lp->node.dev_name, dev->name);
 
-    printk(KERN_INFO "%s: 3Com 3c%s, io %#3lx, irq %d, "
-	   "hw_addr %pM\n",
-	   dev->name, (multi ? "562" : "589"), dev->base_addr, dev->irq,
-	   dev->dev_addr);
-    printk(KERN_INFO "  %dK FIFO split %s Rx:Tx, %s xcvr\n",
-	   (fifo & 7) ? 32 : 8, ram_split[(fifo >> 16) & 3],
-	   if_names[dev->if_port]);
+    netdev_info(dev, "3Com 3c%s, io %#3lx, irq %d, hw_addr %pM\n",
+		(multi ? "562" : "589"), dev->base_addr, dev->irq,
+		dev->dev_addr);
+    netdev_info(dev, "  %dK FIFO split %s Rx:Tx, %s xcvr\n",
+		(fifo & 7) ? 32 : 8, ram_split[(fifo >> 16) & 3],
+		if_names[dev->if_port]);
     return 0;
 
 failed:
@@ -343,7 +365,7 @@ failed:
     After a card is removed, tc589_release() will unregister the net
     device, and release the PCMCIA configuration.  If the device is
     still open, this will be postponed until it is closed.
-    
+
 ======================================================================*/
 
 static void tc589_release(struct pcmcia_device *link)
@@ -365,7 +387,7 @@ static int tc589_resume(struct pcmcia_device *link)
 {
 	struct net_device *dev = link->priv;
 
- 	if (link->open) {
+	if (link->open) {
 		tc589_reset(dev);
 		netif_device_attach(dev);
 	}
@@ -385,8 +407,7 @@ static void tc589_wait_for_completion(struct net_device *dev, int cmd)
     while (--i > 0)
 	if (!(inw(dev->base_addr + EL3_STATUS) & 0x1000)) break;
     if (i == 0)
-	printk(KERN_WARNING "%s: command 0x%04x did not complete!\n",
-	       dev->name, cmd);
+	netdev_warn(dev, "command 0x%04x did not complete!\n", cmd);
 }
 
 /*
@@ -412,7 +433,7 @@ static void tc589_set_xcvr(struct net_device *dev, int if_port)
 {
     struct el3_private *lp = netdev_priv(dev);
     unsigned int ioaddr = dev->base_addr;
-    
+
     EL3WINDOW(0);
     switch (if_port) {
     case 0: case 1: outw(0, ioaddr + 6); break;
@@ -435,14 +456,13 @@ static void dump_status(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
     EL3WINDOW(1);
-    printk(KERN_INFO "  irq status %04x, rx status %04x, tx status "
-	   "%02x  tx free %04x\n", inw(ioaddr+EL3_STATUS),
-	   inw(ioaddr+RX_STATUS), inb(ioaddr+TX_STATUS),
-	   inw(ioaddr+TX_FREE));
+    netdev_info(dev, "  irq status %04x, rx status %04x, tx status %02x  tx free %04x\n",
+		inw(ioaddr+EL3_STATUS), inw(ioaddr+RX_STATUS),
+		inb(ioaddr+TX_STATUS), inw(ioaddr+TX_FREE));
     EL3WINDOW(4);
-    printk(KERN_INFO "  diagnostics: fifo %04x net %04x ethernet %04x"
-	   " media %04x\n", inw(ioaddr+0x04), inw(ioaddr+0x06),
-	   inw(ioaddr+0x08), inw(ioaddr+0x0a));
+    netdev_info(dev, "  diagnostics: fifo %04x net %04x ethernet %04x media %04x\n",
+		inw(ioaddr+0x04), inw(ioaddr+0x06), inw(ioaddr+0x08),
+		inw(ioaddr+0x0a));
     EL3WINDOW(1);
 }
 
@@ -451,18 +471,18 @@ static void tc589_reset(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
     int i;
-    
+
     EL3WINDOW(0);
-    outw(0x0001, ioaddr + 4);			/* Activate board. */ 
+    outw(0x0001, ioaddr + 4);			/* Activate board. */
     outw(0x3f00, ioaddr + 8);			/* Set the IRQ line. */
-    
+
     /* Set the station address in window 2. */
     EL3WINDOW(2);
     for (i = 0; i < 6; i++)
 	outb(dev->dev_addr[i], ioaddr + i);
 
     tc589_set_xcvr(dev, dev->if_port);
-    
+
     /* Switch to the stats window, and clear all stats by reading. */
     outw(StatsDisable, ioaddr + EL3_CMD);
     EL3WINDOW(6);
@@ -470,7 +490,7 @@ static void tc589_reset(struct net_device *dev)
 	inb(ioaddr+i);
     inw(ioaddr + 10);
     inw(ioaddr + 12);
-    
+
     /* Switch to register set 1 for normal use. */
     EL3WINDOW(1);
 
@@ -504,8 +524,7 @@ static int el3_config(struct net_device *dev, struct ifmap *map)
     if ((map->port != (u_char)(-1)) && (map->port != dev->if_port)) {
 	if (map->port <= 3) {
 	    dev->if_port = map->port;
-	    printk(KERN_INFO "%s: switched to %s port\n",
-		   dev->name, if_names[dev->if_port]);
+	    netdev_info(dev, "switched to %s port\n", if_names[dev->if_port]);
 	    tc589_set_xcvr(dev, dev->if_port);
 	} else
 	    return -EINVAL;
@@ -517,13 +536,13 @@ static int el3_open(struct net_device *dev)
 {
     struct el3_private *lp = netdev_priv(dev);
     struct pcmcia_device *link = lp->p_dev;
-    
+
     if (!pcmcia_dev_present(link))
 	return -ENODEV;
 
     link->open++;
     netif_start_queue(dev);
-    
+
     tc589_reset(dev);
     init_timer(&lp->media);
     lp->media.function = &media_check;
@@ -533,15 +552,15 @@ static int el3_open(struct net_device *dev)
 
     dev_dbg(&link->dev, "%s: opened, status %4.4x.\n",
 	  dev->name, inw(dev->base_addr + EL3_STATUS));
-    
+
     return 0;
 }
 
 static void el3_tx_timeout(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
-    
-    printk(KERN_WARNING "%s: Transmit timed out!\n", dev->name);
+
+    netdev_warn(dev, "Transmit timed out!\n");
     dump_status(dev);
     dev->stats.tx_errors++;
     dev->trans_start = jiffies;
@@ -555,19 +574,18 @@ static void pop_tx_status(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
     int i;
-    
+
     /* Clear the Tx status stack. */
     for (i = 32; i > 0; i--) {
 	u_char tx_status = inb(ioaddr + TX_STATUS);
 	if (!(tx_status & 0x84)) break;
 	/* reset transmitter on jabber error or underrun */
 	if (tx_status & 0x30)
-	    tc589_wait_for_completion(dev, TxReset);
+		tc589_wait_for_completion(dev, TxReset);
 	if (tx_status & 0x38) {
-	    pr_debug("%s: transmit error: status 0x%02x\n",
-		  dev->name, tx_status);
-	    outw(TxEnable, ioaddr + EL3_CMD);
-	    dev->stats.tx_aborted_errors++;
+		netdev_dbg(dev, "transmit error: status 0x%02x\n", tx_status);
+		outw(TxEnable, ioaddr + EL3_CMD);
+		dev->stats.tx_aborted_errors++;
 	}
 	outb(0x00, ioaddr + TX_STATUS); /* Pop the status stack. */
     }
@@ -580,11 +598,10 @@ static netdev_tx_t el3_start_xmit(struct sk_buff *skb,
     struct el3_private *priv = netdev_priv(dev);
     unsigned long flags;
 
-    pr_debug("%s: el3_start_xmit(length = %ld) called, "
-	  "status %4.4x.\n", dev->name, (long)skb->len,
-	  inw(ioaddr + EL3_STATUS));
+    netdev_dbg(dev, "el3_start_xmit(length = %ld) called, status %4.4x.\n",
+	       (long)skb->len, inw(ioaddr + EL3_STATUS));
 
-    spin_lock_irqsave(&priv->lock, flags);    
+    spin_lock_irqsave(&priv->lock, flags);
 
     dev->stats.tx_bytes += skb->len;
 
@@ -602,9 +619,9 @@ static netdev_tx_t el3_start_xmit(struct sk_buff *skb,
     }
 
     pop_tx_status(dev);
-    spin_unlock_irqrestore(&priv->lock, flags);    
+    spin_unlock_irqrestore(&priv->lock, flags);
     dev_kfree_skb(skb);
-    
+
     return NETDEV_TX_OK;
 }
 
@@ -616,37 +633,32 @@ static irqreturn_t el3_interrupt(int irq, void *dev_id)
     unsigned int ioaddr;
     __u16 status;
     int i = 0, handled = 1;
-    
+
     if (!netif_device_present(dev))
 	return IRQ_NONE;
 
     ioaddr = dev->base_addr;
 
-    pr_debug("%s: interrupt, status %4.4x.\n",
-	  dev->name, inw(ioaddr + EL3_STATUS));
+    netdev_dbg(dev, "interrupt, status %4.4x.\n", inw(ioaddr + EL3_STATUS));
 
-    spin_lock(&lp->lock);    
+    spin_lock(&lp->lock);
     while ((status = inw(ioaddr + EL3_STATUS)) &
 	(IntLatch | RxComplete | StatsFull)) {
 	if ((status & 0xe000) != 0x2000) {
-	    pr_debug("%s: interrupt from dead card\n", dev->name);
-	    handled = 0;
-	    break;
+		netdev_dbg(dev, "interrupt from dead card\n");
+		handled = 0;
+		break;
 	}
-	
 	if (status & RxComplete)
-	    el3_rx(dev);
-	
+		el3_rx(dev);
 	if (status & TxAvailable) {
-	    pr_debug("    TX room bit was handled.\n");
-	    /* There's room in the FIFO for a full-sized packet. */
-	    outw(AckIntr | TxAvailable, ioaddr + EL3_CMD);
-	    netif_wake_queue(dev);
+		netdev_dbg(dev, "    TX room bit was handled.\n");
+		/* There's room in the FIFO for a full-sized packet. */
+		outw(AckIntr | TxAvailable, ioaddr + EL3_CMD);
+		netif_wake_queue(dev);
 	}
-	
 	if (status & TxComplete)
-	    pop_tx_status(dev);
-
+		pop_tx_status(dev);
 	if (status & (AdapterFailure | RxEarly | StatsFull)) {
 	    /* Handle all uncommon interrupts. */
 	    if (status & StatsFull)		/* Empty statistics. */
@@ -660,8 +672,8 @@ static irqreturn_t el3_interrupt(int irq, void *dev_id)
 		EL3WINDOW(4);
 		fifo_diag = inw(ioaddr + 4);
 		EL3WINDOW(1);
-		printk(KERN_WARNING "%s: adapter failure, FIFO diagnostic"
-		       " register %04x.\n", dev->name, fifo_diag);
+		netdev_warn(dev, "adapter failure, FIFO diagnostic register %04x.\n",
+			    fifo_diag);
 		if (fifo_diag & 0x0400) {
 		    /* Tx overrun */
 		    tc589_wait_for_completion(dev, TxReset);
@@ -676,22 +688,20 @@ static irqreturn_t el3_interrupt(int irq, void *dev_id)
 		outw(AckIntr | AdapterFailure, ioaddr + EL3_CMD);
 	    }
 	}
-	
 	if (++i > 10) {
-	    printk(KERN_ERR "%s: infinite loop in interrupt, "
-		   "status %4.4x.\n", dev->name, status);
-	    /* Clear all interrupts */
-	    outw(AckIntr | 0xFF, ioaddr + EL3_CMD);
-	    break;
+		netdev_err(dev, "infinite loop in interrupt, status %4.4x.\n",
+			   status);
+		/* Clear all interrupts */
+		outw(AckIntr | 0xFF, ioaddr + EL3_CMD);
+		break;
 	}
 	/* Acknowledge the IRQ. */
 	outw(AckIntr | IntReq | IntLatch, ioaddr + EL3_CMD);
     }
-
     lp->last_irq = jiffies;
-    spin_unlock(&lp->lock);    
-    pr_debug("%s: exiting interrupt, status %4.4x.\n",
-	  dev->name, inw(ioaddr + EL3_STATUS));
+    spin_unlock(&lp->lock);
+    netdev_dbg(dev, "exiting interrupt, status %4.4x.\n",
+	       inw(ioaddr + EL3_STATUS));
     return IRQ_RETVAL(handled);
 }
 
@@ -710,7 +720,7 @@ static void media_check(unsigned long arg)
     if ((inw(ioaddr + EL3_STATUS) & IntLatch) &&
 	(inb(ioaddr + EL3_TIMER) == 0xff)) {
 	if (!lp->fast_poll)
-	    printk(KERN_WARNING "%s: interrupt(s) dropped!\n", dev->name);
+		netdev_warn(dev, "interrupt(s) dropped!\n");
 
 	local_irq_save(flags);
 	el3_interrupt(dev->irq, dev);
@@ -727,7 +737,7 @@ static void media_check(unsigned long arg)
 
     /* lp->lock guards the EL3 window. Window should always be 1 except
        when the lock is held */
-    spin_lock_irqsave(&lp->lock, flags);    
+    spin_lock_irqsave(&lp->lock, flags);
     EL3WINDOW(4);
     media = inw(ioaddr+WN4_MEDIA) & 0xc810;
 
@@ -747,32 +757,30 @@ static void media_check(unsigned long arg)
     if (media != lp->media_status) {
 	if ((media & lp->media_status & 0x8000) &&
 	    ((lp->media_status ^ media) & 0x0800))
-	    printk(KERN_INFO "%s: %s link beat\n", dev->name,
-		   (lp->media_status & 0x0800 ? "lost" : "found"));
+		netdev_info(dev, "%s link beat\n",
+			    (lp->media_status & 0x0800 ? "lost" : "found"));
 	else if ((media & lp->media_status & 0x4000) &&
 		 ((lp->media_status ^ media) & 0x0010))
-	    printk(KERN_INFO "%s: coax cable %s\n", dev->name,
-		   (lp->media_status & 0x0010 ? "ok" : "problem"));
+		netdev_info(dev, "coax cable %s\n",
+			    (lp->media_status & 0x0010 ? "ok" : "problem"));
 	if (dev->if_port == 0) {
 	    if (media & 0x8000) {
 		if (media & 0x0800)
-		    printk(KERN_INFO "%s: flipped to 10baseT\n",
-			   dev->name);
+			netdev_info(dev, "flipped to 10baseT\n");
 		else
-		    tc589_set_xcvr(dev, 2);
+			tc589_set_xcvr(dev, 2);
 	    } else if (media & 0x4000) {
 		if (media & 0x0010)
 		    tc589_set_xcvr(dev, 1);
 		else
-		    printk(KERN_INFO "%s: flipped to 10base2\n",
-			   dev->name);
+		    netdev_info(dev, "flipped to 10base2\n");
 	    }
 	}
 	lp->media_status = media;
     }
-    
+
     EL3WINDOW(1);
-    spin_unlock_irqrestore(&lp->lock, flags);    
+    spin_unlock_irqrestore(&lp->lock, flags);
 
 reschedule:
     lp->media.expires = jiffies + HZ;
@@ -786,7 +794,7 @@ static struct net_device_stats *el3_get_stats(struct net_device *dev)
     struct pcmcia_device *link = lp->p_dev;
 
     if (pcmcia_dev_present(link)) {
-    	spin_lock_irqsave(&lp->lock, flags);
+	spin_lock_irqsave(&lp->lock, flags);
 	update_stats(dev);
 	spin_unlock_irqrestore(&lp->lock, flags);
     }
@@ -798,21 +806,21 @@ static struct net_device_stats *el3_get_stats(struct net_device *dev)
   single-threaded if the device is active. This is expected to be a rare
   operation, and it's simpler for the rest of the driver to assume that
   window 1 is always valid rather than use a special window-state variable.
-  
+
   Caller must hold the lock for this
 */
 static void update_stats(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
 
-    pr_debug("%s: updating the statistics.\n", dev->name);
+    netdev_dbg(dev, "updating the statistics.\n");
     /* Turn off statistics updates while reading. */
     outw(StatsDisable, ioaddr + EL3_CMD);
     /* Switch to the stats window, and read everything. */
     EL3WINDOW(6);
-    dev->stats.tx_carrier_errors 	+= inb(ioaddr + 0);
+    dev->stats.tx_carrier_errors	+= inb(ioaddr + 0);
     dev->stats.tx_heartbeat_errors	+= inb(ioaddr + 1);
-    /* Multiple collisions. */	   	inb(ioaddr + 2);
+    /* Multiple collisions. */		inb(ioaddr + 2);
     dev->stats.collisions		+= inb(ioaddr + 3);
     dev->stats.tx_window_errors		+= inb(ioaddr + 4);
     dev->stats.rx_fifo_errors		+= inb(ioaddr + 5);
@@ -821,7 +829,7 @@ static void update_stats(struct net_device *dev)
     /* Tx deferrals */			inb(ioaddr + 8);
     /* Rx octets */			inw(ioaddr + 10);
     /* Tx octets */			inw(ioaddr + 12);
-    
+
     /* Back to window 1, and turn statistics back on. */
     EL3WINDOW(1);
     outw(StatsEnable, ioaddr + EL3_CMD);
@@ -832,9 +840,9 @@ static int el3_rx(struct net_device *dev)
     unsigned int ioaddr = dev->base_addr;
     int worklimit = 32;
     short rx_status;
-    
-    pr_debug("%s: in rx_packet(), status %4.4x, rx_status %4.4x.\n",
-	  dev->name, inw(ioaddr+EL3_STATUS), inw(ioaddr+RX_STATUS));
+
+    netdev_dbg(dev, "in rx_packet(), status %4.4x, rx_status %4.4x.\n",
+	       inw(ioaddr+EL3_STATUS), inw(ioaddr+RX_STATUS));
     while (!((rx_status = inw(ioaddr + RX_STATUS)) & 0x8000) &&
 		    worklimit > 0) {
 	worklimit--;
@@ -852,11 +860,11 @@ static int el3_rx(struct net_device *dev)
 	} else {
 	    short pkt_len = rx_status & 0x7ff;
 	    struct sk_buff *skb;
-	    
+
 	    skb = dev_alloc_skb(pkt_len+5);
-	    
-	    pr_debug("    Receiving packet size %d status %4.4x.\n",
-		  pkt_len, rx_status);
+
+	    netdev_dbg(dev, "    Receiving packet size %d status %4.4x.\n",
+		       pkt_len, rx_status);
 	    if (skb != NULL) {
 		skb_reserve(skb, 2);
 		insl(ioaddr+RX_FIFO, skb_put(skb, pkt_len),
@@ -866,8 +874,8 @@ static int el3_rx(struct net_device *dev)
 		dev->stats.rx_packets++;
 		dev->stats.rx_bytes += pkt_len;
 	    } else {
-		pr_debug("%s: couldn't allocate a sk_buff of"
-		      " size %d.\n", dev->name, pkt_len);
+		netdev_dbg(dev, "couldn't allocate a sk_buff of size %d.\n",
+			   pkt_len);
 		dev->stats.rx_dropped++;
 	    }
 	}
@@ -875,7 +883,7 @@ static int el3_rx(struct net_device *dev)
 	tc589_wait_for_completion(dev, RxDiscard);
     }
     if (worklimit == 0)
-	printk(KERN_WARNING "%s: too much work in el3_rx!\n", dev->name);
+	netdev_warn(dev, "too much work in el3_rx!\n");
     return 0;
 }
 
@@ -906,17 +914,17 @@ static int el3_close(struct net_device *dev)
     struct el3_private *lp = netdev_priv(dev);
     struct pcmcia_device *link = lp->p_dev;
     unsigned int ioaddr = dev->base_addr;
-    
+
     dev_dbg(&link->dev, "%s: shutting down ethercard.\n", dev->name);
 
     if (pcmcia_dev_present(link)) {
 	/* Turn off statistics ASAP.  We update dev->stats below. */
 	outw(StatsDisable, ioaddr + EL3_CMD);
-	
+
 	/* Disable the receiver and transmitter. */
 	outw(RxDisable, ioaddr + EL3_CMD);
 	outw(TxDisable, ioaddr + EL3_CMD);
-	
+
 	if (dev->if_port == 2)
 	    /* Turn off thinnet power.  Green! */
 	    outw(StopCoax, ioaddr + EL3_CMD);
@@ -925,12 +933,12 @@ static int el3_close(struct net_device *dev)
 	    EL3WINDOW(4);
 	    outw(0, ioaddr + WN4_MEDIA);
 	}
-	
+
 	/* Switching back to window 0 disables the IRQ. */
 	EL3WINDOW(0);
 	/* But we explicitly zero the IRQ line select anyway. */
 	outw(0x0f00, ioaddr + WN0_IRQ);
-        
+
 	/* Check if the card still exists */
 	if ((inw(ioaddr+EL3_STATUS) & 0xe000) == 0x2000)
 	    update_stats(dev);
@@ -939,7 +947,7 @@ static int el3_close(struct net_device *dev)
     link->open--;
     netif_stop_queue(dev);
     del_timer_sync(&lp->media);
-    
+
     return 0;
 }
 
@@ -961,7 +969,7 @@ static struct pcmcia_driver tc589_driver = {
 	},
 	.probe		= tc589_probe,
 	.remove		= tc589_detach,
-        .id_table       = tc589_ids,
+	.id_table	= tc589_ids,
 	.suspend	= tc589_suspend,
 	.resume		= tc589_resume,
 };
-- 
1.7.0

^ permalink raw reply related

* Re: [PATCH 3/7] flow: allocate hash table for online cpus only
From: Rusty Russell @ 2010-03-31 12:32 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev
In-Reply-To: <20100330121255.GC5731@gondor.apana.org.au>

On Tue, 30 Mar 2010 10:42:55 pm Herbert Xu wrote:
> On Mon, Mar 29, 2010 at 05:12:40PM +0300, Timo Teras wrote:
> > Instead of unconditionally allocating hash table for all possible
> > cpu's, allocate it only for online cpu's and release related
> > memory if cpu goes down.
> > 
> > Signed-off-by: Timo Teras <timo.teras@iki.fi>
> 
> Hmm that's where we started but then Rusty changed it back in 2004:
> 
> commit 0a32dc4d8e83c48f7535d66731eb35d1916b39a8
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date:   Wed Jan 21 18:14:37 2004 -0800
> 
>     [NET]: Simplify net/flow.c per-cpu handling.
> 
>     The cpu handling in net/core/flow.c is complex: it tries to allocate
>     flow cache as each CPU comes up.  It might as well allocate them for
>     each possible CPU at boot.
> 
> So I'd like to hear his opinion on changing it back again.

It was pretty unique at the time, it no longer is, so the arguments are less
compelling IMHO.

However, we can now use a dynamic percpu variable and get it as a real
per-cpu thing (which currently means it *will* be for every available cpu,
not just online ones).  Haven't thought about it, but that change might be
worth considering instead?

Thanks!
Rusty.

^ permalink raw reply

* [PATCH] can: Add driver for SJA1000 based PCI CAN interface cards by esd
From: Matthias Fuchs @ 2010-03-31 12:28 UTC (permalink / raw)
  To: netdev; +Cc: Socketcan-core

This patch adds support for SJA1000 based PCI CAN interface cards
from electronic system design gmbh.

These boards are supported:

        CAN-PCI/200 (PCI)
        CAN-PCI/266 (PCI)
        CAN-PMC266 (PMC module)
        CAN-PCIe/2000 (PCI Express)
        CAN-CPCI/200 (Compact PCI, 3U)
        CAN-PCI104 (PCI104)

This driver is part of the SocketCAN SVN repository since
April 2009.

Signed-off-by: Matthias Fuchs <matthias.fuchs@esd.eu>
---
 drivers/net/can/sja1000/Kconfig   |    9 +
 drivers/net/can/sja1000/Makefile  |    1 +
 drivers/net/can/sja1000/esd_pci.c |  315 +++++++++++++++++++++++++++++++++++++
 3 files changed, 325 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/can/sja1000/esd_pci.c

diff --git a/drivers/net/can/sja1000/Kconfig b/drivers/net/can/sja1000/Kconfig
index 9e277d6..c01eb4e 100644
--- a/drivers/net/can/sja1000/Kconfig
+++ b/drivers/net/can/sja1000/Kconfig
@@ -37,6 +37,15 @@ config CAN_EMS_PCI
 	  CPC-PCIe and CPC-104P cards from EMS Dr. Thomas Wuensche
 	  (http://www.ems-wuensche.de).
 
+config CAN_ESD_PCI
+	tristate "ESD PCI Cards"
+	depends on PCI
+	---help---
+	  This driver supports the esd PCI CAN cards CAN-PCI/200,
+	  CAN-PCI/266, CAN-PMC/266 (PMC), CAN-CPCI/200 (CompactPCI),
+	  CAN-PCIe2000 (PCI Express) and CAN-PCI104/200 (PCI104)
+	  from the esd electronic system design gmbh (http://www.esd.eu).
+
 config CAN_KVASER_PCI
 	tristate "Kvaser PCIcanx and Kvaser PCIcan PCI Cards"
 	depends on PCI
diff --git a/drivers/net/can/sja1000/Makefile b/drivers/net/can/sja1000/Makefile
index ce92455..a8dcbe0 100644
--- a/drivers/net/can/sja1000/Makefile
+++ b/drivers/net/can/sja1000/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_CAN_SJA1000_ISA) += sja1000_isa.o
 obj-$(CONFIG_CAN_SJA1000_PLATFORM) += sja1000_platform.o
 obj-$(CONFIG_CAN_SJA1000_OF_PLATFORM) += sja1000_of_platform.o
 obj-$(CONFIG_CAN_EMS_PCI) += ems_pci.o
+obj-$(CONFIG_CAN_ESD_PCI) += esd_pci.o
 obj-$(CONFIG_CAN_KVASER_PCI) += kvaser_pci.o
 obj-$(CONFIG_CAN_PLX_PCI) += plx_pci.o
 
diff --git a/drivers/net/can/sja1000/esd_pci.c b/drivers/net/can/sja1000/esd_pci.c
new file mode 100644
index 0000000..98389f2
--- /dev/null
+++ b/drivers/net/can/sja1000/esd_pci.c
@@ -0,0 +1,315 @@
+/*
+ * Copyright (C) 2007 Wolfgang Grandegger <wg@grandegger.com>
+ * Copyright (C) 2008 Sascha Hauer <s.hauer@pengutronix.de>, Pengutronix
+ * Copyright (C) 2009 Matthias Fuchs <matthias.fuchs@esd.eu>, esd gmbh
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/netdevice.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/can.h>
+#include <linux/can/dev.h>
+#include <linux/io.h>
+
+#include "sja1000.h"
+
+#define DRV_NAME  "esd_pci"
+
+MODULE_AUTHOR("Matthias Fuchs <matthias.fuchs@esd.eu");
+MODULE_DESCRIPTION("Socket-CAN driver for esd PCI/PMC/CPCI/PCIe/PCI104 " \
+		   "CAN cards");
+MODULE_SUPPORTED_DEVICE("esd CAN-PCI/200, CAN-PCI/266, CAN-PMC266, " \
+			"CAN-PCIe/2000, CAN-CPCI/200, CAN-PCI104");
+MODULE_LICENSE("GPL v2");
+
+/* Maximum number of interfaces supported on one card. */
+#define ESD_PCI_MAX_CAN 2
+
+struct esd_pci {
+	struct pci_dev *pci_dev;
+	struct net_device *dev[ESD_PCI_MAX_CAN];
+	void __iomem *conf_addr;
+	void __iomem *base_addr;
+};
+
+#define ESD_PCI_CAN_CLOCK	(16000000 / 2)
+
+#define ESD_PCI_OCR		(OCR_TX0_PUSHPULL | OCR_TX1_PUSHPULL)
+#define ESD_PCI_CDR		0
+
+#define CHANNEL_OFFSET		0x100
+
+#define INTCSR_OFFSET		0x4c /* Offset in PLX9050 conf registers */
+#define INTCSR_LINTI1		(1 << 0)
+#define INTCSR_PCI		(1 << 6)
+
+#define INTCSR9056_OFFSET	0x68 /* Offset in PLX9056 conf registers */
+#define INTCSR9056_LINTI	(1 << 11)
+#define INTCSR9056_PCI		(1 << 8)
+
+/* PCI subsystem IDs of esd's SJA1000 based CAN cards */
+
+/* CAN-PCI/200: PCI, 33MHz only, bridge: PLX9050 */
+#define ESD_PCI_SUB_SYS_ID_PCI200	0x0004
+
+/* CAN-PCI/266: PCI, 33/66MHz, bridge: PLX9056 */
+#define ESD_PCI_SUB_SYS_ID_PCI266	0x0009
+
+/* CAN-PMC/266: PMC module, 33/66MHz, bridge: PLX9056 */
+#define ESD_PCI_SUB_SYS_ID_PMC266	0x000e
+
+/* CAN-CPCI/200: Compact PCI, 33MHz only, bridge: PLX9030 */
+#define ESD_PCI_SUB_SYS_ID_CPCI200	0x010b
+
+/* CAN-PCIE/2000: PCI Express 1x, bridge: PEX8311 = PEX8111 + PLX9056 */
+#define ESD_PCI_SUB_SYS_ID_PCIE2000	0x0200
+
+/* CAN-PCI/104: PCI104 module, 33MHz only, bridge: PLX9030 */
+#define ESD_PCI_SUB_SYS_ID_PCI104200	0x0501
+
+static DEFINE_PCI_DEVICE_TABLE(esd_pci_tbl) = {
+	{PCI_VENDOR_ID_PLX, PCI_DEVICE_ID_PLX_9050,
+	 PCI_VENDOR_ID_ESDGMBH, ESD_PCI_SUB_SYS_ID_PCI200},
+	{PCI_VENDOR_ID_PLX, PCI_DEVICE_ID_PLX_9056,
+	 PCI_VENDOR_ID_ESDGMBH, ESD_PCI_SUB_SYS_ID_PCI266},
+	{PCI_VENDOR_ID_PLX, PCI_DEVICE_ID_PLX_9056,
+	 PCI_VENDOR_ID_ESDGMBH, ESD_PCI_SUB_SYS_ID_PMC266},
+	{PCI_VENDOR_ID_PLX, PCI_DEVICE_ID_PLX_9030,
+	 PCI_VENDOR_ID_ESDGMBH, ESD_PCI_SUB_SYS_ID_CPCI200},
+	{PCI_VENDOR_ID_PLX, PCI_DEVICE_ID_PLX_9056,
+	 PCI_VENDOR_ID_ESDGMBH, ESD_PCI_SUB_SYS_ID_PCIE2000},
+	{PCI_VENDOR_ID_PLX, PCI_DEVICE_ID_PLX_9030,
+	 PCI_VENDOR_ID_ESDGMBH, ESD_PCI_SUB_SYS_ID_PCI104200},
+	{0,}
+};
+
+#define ESD_PCI_BASE_SIZE  0x200
+
+MODULE_DEVICE_TABLE(pci, esd_pci_tbl);
+
+static u8 esd_pci_read_reg(const struct sja1000_priv *priv, int port)
+{
+	return readb(priv->reg_base + port);
+}
+
+static void esd_pci_write_reg(const struct sja1000_priv *priv, int port, u8 val)
+{
+	writeb(val, priv->reg_base + port);
+}
+
+static void esd_pci_del_chan(struct pci_dev *pdev, struct net_device *ndev)
+{
+	dev_info(&pdev->dev, "Removing device %s\n", ndev->name);
+
+	unregister_sja1000dev(ndev);
+
+	free_sja1000dev(ndev);
+}
+
+static struct net_device * __devinit esd_pci_add_chan(struct pci_dev *pdev,
+						      void __iomem *base_addr)
+{
+	struct net_device *ndev;
+	struct sja1000_priv *priv;
+	int err;
+
+	ndev = alloc_sja1000dev(0);
+	if (ndev == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	priv = netdev_priv(ndev);
+
+	priv->reg_base = base_addr;
+
+	priv->read_reg = esd_pci_read_reg;
+	priv->write_reg = esd_pci_write_reg;
+
+	priv->can.clock.freq = ESD_PCI_CAN_CLOCK;
+
+	priv->ocr = ESD_PCI_OCR;
+	priv->cdr = ESD_PCI_CDR;
+
+	/* Set and enable PCI interrupts */
+	priv->irq_flags = IRQF_SHARED;
+	ndev->irq = pdev->irq;
+
+	dev_dbg(&pdev->dev, "reg_base=0x%p irq=%d\n",
+			priv->reg_base, ndev->irq);
+
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	err = register_sja1000dev(ndev);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to register (err=%d)\n", err);
+		goto failure;
+	}
+
+	return ndev;
+
+failure:
+	free_sja1000dev(ndev);
+	return ERR_PTR(err);
+}
+
+static int __devinit esd_pci_init_one(struct pci_dev *pdev,
+				       const struct pci_device_id *ent)
+{
+	struct esd_pci *board;
+	int err;
+	void __iomem *base_addr;
+	void __iomem *conf_addr;
+
+	dev_info(&pdev->dev,
+		 "Initializing device %04x:%04x %04x:%04x\n",
+		 pdev->vendor, pdev->device,
+		 pdev->subsystem_vendor, pdev->subsystem_device);
+
+	board = kzalloc(sizeof(*board), GFP_KERNEL);
+	if (!board)
+		return -ENOMEM;
+
+	err = pci_enable_device(pdev);
+	if (err)
+		goto failure;
+
+	err = pci_request_regions(pdev, DRV_NAME);
+	if (err)
+		goto failure;
+
+	conf_addr = pci_iomap(pdev, 0, ESD_PCI_BASE_SIZE);
+	if (conf_addr == NULL) {
+		err = -ENODEV;
+		goto failure_release_pci;
+	}
+
+	board->conf_addr = conf_addr;
+
+	base_addr = pci_iomap(pdev, 2, ESD_PCI_BASE_SIZE);
+	if (base_addr == NULL) {
+		err = -ENODEV;
+		goto failure_iounmap_conf;
+	}
+
+	board->base_addr = base_addr;
+
+	board->dev[0] = esd_pci_add_chan(pdev, base_addr);
+	if (IS_ERR(board->dev[0]))
+		goto failure_iounmap_base;
+
+	/* Check if second channel is available */
+	writeb(MOD_RM, base_addr + CHANNEL_OFFSET + REG_MOD);
+	writeb(CDR_CBP, base_addr + CHANNEL_OFFSET + REG_CDR);
+	writeb(MOD_RM, base_addr + CHANNEL_OFFSET + REG_MOD);
+	if (readb(base_addr + CHANNEL_OFFSET + REG_MOD) == 0x21) {
+		writeb(MOD_SM | MOD_AFM | MOD_STM | MOD_LOM | MOD_RM,
+		       base_addr + CHANNEL_OFFSET + REG_MOD);
+		if (readb(base_addr + CHANNEL_OFFSET + REG_MOD) == 0x3f) {
+			writeb(MOD_RM, base_addr + CHANNEL_OFFSET + REG_MOD);
+			board->dev[1] =
+				esd_pci_add_chan(pdev,
+						 base_addr + CHANNEL_OFFSET);
+			if (IS_ERR(board->dev[1]))
+				goto failure_unreg_dev0;
+		} else
+			writeb(MOD_RM, base_addr + CHANNEL_OFFSET + REG_MOD);
+	} else
+		writeb(MOD_RM, base_addr + CHANNEL_OFFSET + REG_MOD);
+
+	if ((pdev->device == PCI_DEVICE_ID_PLX_9050) ||
+	    (pdev->device == PCI_DEVICE_ID_PLX_9030)) {
+		/* Enable interrupts in PLX9050 */
+		writel(INTCSR_LINTI1 | INTCSR_PCI,
+		       board->conf_addr + INTCSR_OFFSET);
+	} else {
+		/* Enable interrupts in PLX9056*/
+		writel(INTCSR9056_LINTI | INTCSR9056_PCI,
+		       board->conf_addr + INTCSR9056_OFFSET);
+	}
+
+	pci_set_drvdata(pdev, board);
+
+	return 0;
+
+failure_unreg_dev0:
+	esd_pci_del_chan(pdev, board->dev[0]);
+
+failure_iounmap_base:
+	pci_iounmap(pdev, board->base_addr);
+
+failure_iounmap_conf:
+	pci_iounmap(pdev, board->conf_addr);
+
+failure_release_pci:
+	pci_release_regions(pdev);
+
+failure:
+	kfree(board);
+
+	return err;
+}
+
+static void esd_pci_remove_one(struct pci_dev *pdev)
+{
+	struct esd_pci *board = pci_get_drvdata(pdev);
+	int i;
+
+	if ((pdev->device == PCI_DEVICE_ID_PLX_9050) ||
+	    (pdev->device == PCI_DEVICE_ID_PLX_9030)) {
+		/* Disable interrupts in PLX9050*/
+		writel(0, board->conf_addr + INTCSR_OFFSET);
+	} else {
+		/* Disable interrupts in PLX9056*/
+		writel(0, board->conf_addr + INTCSR9056_OFFSET);
+	}
+
+	for (i = 0; i < ESD_PCI_MAX_CAN; i++) {
+		if (!board->dev[i])
+			break;
+		esd_pci_del_chan(pdev, board->dev[i]);
+	}
+
+	pci_iounmap(pdev, board->base_addr);
+	pci_iounmap(pdev, board->conf_addr);
+
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+
+	kfree(board);
+}
+
+static struct pci_driver esd_pci_driver = {
+	.name = DRV_NAME,
+	.id_table = esd_pci_tbl,
+	.probe = esd_pci_init_one,
+	.remove = esd_pci_remove_one,
+};
+
+static int __init esd_pci_init(void)
+{
+	return pci_register_driver(&esd_pci_driver);
+}
+
+static void __exit esd_pci_exit(void)
+{
+	pci_unregister_driver(&esd_pci_driver);
+}
+
+module_init(esd_pci_init);
+module_exit(esd_pci_exit);
-- 
1.6.1


^ permalink raw reply related

* Re: [r8169] WARNING: at net/sched/sch_generic.c
From: Eric Dumazet @ 2010-03-31 12:29 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Neil Horman, netdev, Francois Romieu, David S. Miller,
	linux-kernel
In-Reply-To: <20100331121426.GB3294@swordfish.minsk.epam.com>

Le mercredi 31 mars 2010 à 15:14 +0300, Sergey Senozhatsky a écrit :


> PKT_SIZE="pkt_size 2048"
> 

If you use 1024 bytes pktgen messages, do you still have the problem ?



^ permalink raw reply

* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection (v2)
From: Sascha Hauer @ 2010-03-31 12:26 UTC (permalink / raw)
  To: Bryan Wu
  Cc: gerg, amit.kucheria, w.sang, kernel-team, netdev,
	linux-arm-kernel, linux-kernel
In-Reply-To: <1270037444-8470-1-git-send-email-bryan.wu@canonical.com>

On Wed, Mar 31, 2010 at 08:10:44PM +0800, Bryan Wu wrote:
> BugLink: http://bugs.launchpad.net/bugs/457878
> 
> v2:
>  - remove duplicated phy_speed caculation
>  - fix the phy_speed caculation according to the DataSheet
> 
> v1:
>  - removed old MII phy control code
>  - add phylib supporting
>  - add ethtool interface to make user space NetworkManager works
> 
> Tested on Freescale i.MX51 Babbage board.
> 
> This patch is based on a patch from Frederic Rodo <fred.rodo@gmail.com>
> 
> Cc: Frederic Rodo <fred.rodo@gmail.com>
> Signed-off-by: Bryan Wu <bryan.wu@canonical.com>

Acked-by: Sascha Hauer <s.hauer@pengutronix.de>

Sascha

> ---
>  drivers/net/Kconfig |    1 +
>  drivers/net/fec.c   | 1128 ++++++++++++---------------------------------------
>  2 files changed, 252 insertions(+), 877 deletions(-)
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 0ba5b8e..41f6a70 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -1916,6 +1916,7 @@ config FEC
>  	bool "FEC ethernet controller (of ColdFire and some i.MX CPUs)"
>  	depends on M523x || M527x || M5272 || M528x || M520x || M532x || \
>  		MACH_MX27 || ARCH_MX35 || ARCH_MX25 || ARCH_MX5
> +	select PHYLIB
>  	help
>  	  Say Y here if you want to use the built-in 10/100 Fast ethernet
>  	  controller on some Motorola ColdFire and Freescale i.MX processors.
> diff --git a/drivers/net/fec.c b/drivers/net/fec.c
> index 9f98c1c..848eb19 100644
> --- a/drivers/net/fec.c
> +++ b/drivers/net/fec.c
> @@ -40,6 +40,7 @@
>  #include <linux/irq.h>
>  #include <linux/clk.h>
>  #include <linux/platform_device.h>
> +#include <linux/phy.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -61,7 +62,6 @@
>   * Define the fixed address of the FEC hardware.
>   */
>  #if defined(CONFIG_M5272)
> -#define HAVE_mii_link_interrupt
>  
>  static unsigned char	fec_mac_default[] = {
>  	0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> @@ -86,23 +86,6 @@ static unsigned char	fec_mac_default[] = {
>  #endif
>  #endif /* CONFIG_M5272 */
>  
> -/* Forward declarations of some structures to support different PHYs */
> -
> -typedef struct {
> -	uint mii_data;
> -	void (*funct)(uint mii_reg, struct net_device *dev);
> -} phy_cmd_t;
> -
> -typedef struct {
> -	uint id;
> -	char *name;
> -
> -	const phy_cmd_t *config;
> -	const phy_cmd_t *startup;
> -	const phy_cmd_t *ack_int;
> -	const phy_cmd_t *shutdown;
> -} phy_info_t;
> -
>  /* The number of Tx and Rx buffers.  These are allocated from the page
>   * pool.  The code may assume these are power of two, so it it best
>   * to keep them that size.
> @@ -189,29 +172,21 @@ struct fec_enet_private {
>  	uint	tx_full;
>  	/* hold while accessing the HW like ringbuffer for tx/rx but not MAC */
>  	spinlock_t hw_lock;
> -	/* hold while accessing the mii_list_t() elements */
> -	spinlock_t mii_lock;
> -
> -	uint	phy_id;
> -	uint	phy_id_done;
> -	uint	phy_status;
> -	uint	phy_speed;
> -	phy_info_t const	*phy;
> -	struct work_struct phy_task;
>  
> -	uint	sequence_done;
> -	uint	mii_phy_task_queued;
> +	struct  platform_device *pdev;
>  
> -	uint	phy_addr;
> +	int	opened;
>  
> +	/* Phylib and MDIO interface */
> +	struct  mii_bus *mii_bus;
> +	struct  phy_device *phy_dev;
> +	int     mii_timeout;
> +	uint    phy_speed;
>  	int	index;
> -	int	opened;
>  	int	link;
> -	int	old_link;
>  	int	full_duplex;
>  };
>  
> -static void fec_enet_mii(struct net_device *dev);
>  static irqreturn_t fec_enet_interrupt(int irq, void * dev_id);
>  static void fec_enet_tx(struct net_device *dev);
>  static void fec_enet_rx(struct net_device *dev);
> @@ -219,67 +194,20 @@ static int fec_enet_close(struct net_device *dev);
>  static void fec_restart(struct net_device *dev, int duplex);
>  static void fec_stop(struct net_device *dev);
>  
> +/* FEC MII MMFR bits definition */
> +#define FEC_MMFR_ST		(1 << 30)
> +#define FEC_MMFR_OP_READ	(2 << 28)
> +#define FEC_MMFR_OP_WRITE	(1 << 28)
> +#define FEC_MMFR_PA(v)		((v & 0x1f) << 23)
> +#define FEC_MMFR_RA(v)		((v & 0x1f) << 18)
> +#define FEC_MMFR_TA		(2 << 16)
> +#define FEC_MMFR_DATA(v)	(v & 0xffff)
>  
> -/* MII processing.  We keep this as simple as possible.  Requests are
> - * placed on the list (if there is room).  When the request is finished
> - * by the MII, an optional function may be called.
> - */
> -typedef struct mii_list {
> -	uint	mii_regval;
> -	void	(*mii_func)(uint val, struct net_device *dev);
> -	struct	mii_list *mii_next;
> -} mii_list_t;
> -
> -#define		NMII	20
> -static mii_list_t	mii_cmds[NMII];
> -static mii_list_t	*mii_free;
> -static mii_list_t	*mii_head;
> -static mii_list_t	*mii_tail;
> -
> -static int	mii_queue(struct net_device *dev, int request,
> -				void (*func)(uint, struct net_device *));
> -
> -/* Make MII read/write commands for the FEC */
> -#define mk_mii_read(REG)	(0x60020000 | ((REG & 0x1f) << 18))
> -#define mk_mii_write(REG, VAL)	(0x50020000 | ((REG & 0x1f) << 18) | \
> -						(VAL & 0xffff))
> -#define mk_mii_end	0
> +#define FEC_MII_TIMEOUT		10000
>  
>  /* Transmitter timeout */
>  #define TX_TIMEOUT (2 * HZ)
>  
> -/* Register definitions for the PHY */
> -
> -#define MII_REG_CR          0  /* Control Register                         */
> -#define MII_REG_SR          1  /* Status Register                          */
> -#define MII_REG_PHYIR1      2  /* PHY Identification Register 1            */
> -#define MII_REG_PHYIR2      3  /* PHY Identification Register 2            */
> -#define MII_REG_ANAR        4  /* A-N Advertisement Register               */
> -#define MII_REG_ANLPAR      5  /* A-N Link Partner Ability Register        */
> -#define MII_REG_ANER        6  /* A-N Expansion Register                   */
> -#define MII_REG_ANNPTR      7  /* A-N Next Page Transmit Register          */
> -#define MII_REG_ANLPRNPR    8  /* A-N Link Partner Received Next Page Reg. */
> -
> -/* values for phy_status */
> -
> -#define PHY_CONF_ANE	0x0001  /* 1 auto-negotiation enabled */
> -#define PHY_CONF_LOOP	0x0002  /* 1 loopback mode enabled */
> -#define PHY_CONF_SPMASK	0x00f0  /* mask for speed */
> -#define PHY_CONF_10HDX	0x0010  /* 10 Mbit half duplex supported */
> -#define PHY_CONF_10FDX	0x0020  /* 10 Mbit full duplex supported */
> -#define PHY_CONF_100HDX	0x0040  /* 100 Mbit half duplex supported */
> -#define PHY_CONF_100FDX	0x0080  /* 100 Mbit full duplex supported */
> -
> -#define PHY_STAT_LINK	0x0100  /* 1 up - 0 down */
> -#define PHY_STAT_FAULT	0x0200  /* 1 remote fault */
> -#define PHY_STAT_ANC	0x0400  /* 1 auto-negotiation complete	*/
> -#define PHY_STAT_SPMASK	0xf000  /* mask for speed */
> -#define PHY_STAT_10HDX	0x1000  /* 10 Mbit half duplex selected	*/
> -#define PHY_STAT_10FDX	0x2000  /* 10 Mbit full duplex selected	*/
> -#define PHY_STAT_100HDX	0x4000  /* 100 Mbit half duplex selected */
> -#define PHY_STAT_100FDX	0x8000  /* 100 Mbit full duplex selected */
> -
> -
>  static int
>  fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
> @@ -406,12 +334,6 @@ fec_enet_interrupt(int irq, void * dev_id)
>  			ret = IRQ_HANDLED;
>  			fec_enet_tx(dev);
>  		}
> -
> -		if (int_events & FEC_ENET_MII) {
> -			ret = IRQ_HANDLED;
> -			fec_enet_mii(dev);
> -		}
> -
>  	} while (int_events);
>  
>  	return ret;
> @@ -607,827 +529,311 @@ rx_processing_done:
>  	spin_unlock(&fep->hw_lock);
>  }
>  
> -/* called from interrupt context */
> -static void
> -fec_enet_mii(struct net_device *dev)
> -{
> -	struct	fec_enet_private *fep;
> -	mii_list_t	*mip;
> -
> -	fep = netdev_priv(dev);
> -	spin_lock(&fep->mii_lock);
> -
> -	if ((mip = mii_head) == NULL) {
> -		printk("MII and no head!\n");
> -		goto unlock;
> -	}
> -
> -	if (mip->mii_func != NULL)
> -		(*(mip->mii_func))(readl(fep->hwp + FEC_MII_DATA), dev);
> -
> -	mii_head = mip->mii_next;
> -	mip->mii_next = mii_free;
> -	mii_free = mip;
> -
> -	if ((mip = mii_head) != NULL)
> -		writel(mip->mii_regval, fep->hwp + FEC_MII_DATA);
> -
> -unlock:
> -	spin_unlock(&fep->mii_lock);
> -}
> -
> -static int
> -mii_queue_unlocked(struct net_device *dev, int regval,
> -		void (*func)(uint, struct net_device *))
> +/* ------------------------------------------------------------------------- */
> +#ifdef CONFIG_M5272
> +static void __inline__ fec_get_mac(struct net_device *dev)
>  {
> -	struct fec_enet_private *fep;
> -	mii_list_t	*mip;
> -	int		retval;
> -
> -	/* Add PHY address to register command */
> -	fep = netdev_priv(dev);
> +	struct fec_enet_private *fep = netdev_priv(dev);
> +	unsigned char *iap, tmpaddr[ETH_ALEN];
>  
> -	regval |= fep->phy_addr << 23;
> -	retval = 0;
> -
> -	if ((mip = mii_free) != NULL) {
> -		mii_free = mip->mii_next;
> -		mip->mii_regval = regval;
> -		mip->mii_func = func;
> -		mip->mii_next = NULL;
> -		if (mii_head) {
> -			mii_tail->mii_next = mip;
> -			mii_tail = mip;
> -		} else {
> -			mii_head = mii_tail = mip;
> -			writel(regval, fep->hwp + FEC_MII_DATA);
> -		}
> +	if (FEC_FLASHMAC) {
> +		/*
> +		 * Get MAC address from FLASH.
> +		 * If it is all 1's or 0's, use the default.
> +		 */
> +		iap = (unsigned char *)FEC_FLASHMAC;
> +		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
> +		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
> +			iap = fec_mac_default;
> +		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
> +		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
> +			iap = fec_mac_default;
>  	} else {
> -		retval = 1;
> +		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
> +		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
> +		iap = &tmpaddr[0];
>  	}
>  
> -	return retval;
> -}
> -
> -static int
> -mii_queue(struct net_device *dev, int regval,
> -		void (*func)(uint, struct net_device *))
> -{
> -	struct fec_enet_private *fep;
> -	unsigned long   flags;
> -	int             retval;
> -	fep = netdev_priv(dev);
> -	spin_lock_irqsave(&fep->mii_lock, flags);
> -	retval = mii_queue_unlocked(dev, regval, func);
> -	spin_unlock_irqrestore(&fep->mii_lock, flags);
> -	return retval;
> -}
> -
> -static void mii_do_cmd(struct net_device *dev, const phy_cmd_t *c)
> -{
> -	if(!c)
> -		return;
> +	memcpy(dev->dev_addr, iap, ETH_ALEN);
>  
> -	for (; c->mii_data != mk_mii_end; c++)
> -		mii_queue(dev, c->mii_data, c->funct);
> +	/* Adjust MAC if using default MAC address */
> +	if (iap == fec_mac_default)
> +		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
>  }
> +#endif
>  
> -static void mii_parse_sr(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_STAT_LINK | PHY_STAT_FAULT | PHY_STAT_ANC);
> -
> -	if (mii_reg & 0x0004)
> -		status |= PHY_STAT_LINK;
> -	if (mii_reg & 0x0010)
> -		status |= PHY_STAT_FAULT;
> -	if (mii_reg & 0x0020)
> -		status |= PHY_STAT_ANC;
> -	*s = status;
> -}
> +/* ------------------------------------------------------------------------- */
>  
> -static void mii_parse_cr(uint mii_reg, struct net_device *dev)
> +/*
> + * Phy section
> + */
> +static void fec_enet_adjust_link(struct net_device *dev)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_CONF_ANE | PHY_CONF_LOOP);
> -
> -	if (mii_reg & 0x1000)
> -		status |= PHY_CONF_ANE;
> -	if (mii_reg & 0x4000)
> -		status |= PHY_CONF_LOOP;
> -	*s = status;
> -}
> +	struct phy_device *phy_dev = fep->phy_dev;
> +	unsigned long flags;
>  
> -static void mii_parse_anar(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_CONF_SPMASK);
> -
> -	if (mii_reg & 0x0020)
> -		status |= PHY_CONF_10HDX;
> -	if (mii_reg & 0x0040)
> -		status |= PHY_CONF_10FDX;
> -	if (mii_reg & 0x0080)
> -		status |= PHY_CONF_100HDX;
> -	if (mii_reg & 0x00100)
> -		status |= PHY_CONF_100FDX;
> -	*s = status;
> -}
> +	int status_change = 0;
>  
> -/* ------------------------------------------------------------------------- */
> -/* The Level one LXT970 is used by many boards				     */
> +	spin_lock_irqsave(&fep->hw_lock, flags);
>  
> -#define MII_LXT970_MIRROR    16  /* Mirror register           */
> -#define MII_LXT970_IER       17  /* Interrupt Enable Register */
> -#define MII_LXT970_ISR       18  /* Interrupt Status Register */
> -#define MII_LXT970_CONFIG    19  /* Configuration Register    */
> -#define MII_LXT970_CSR       20  /* Chip Status Register      */
> +	/* Prevent a state halted on mii error */
> +	if (fep->mii_timeout && phy_dev->state == PHY_HALTED) {
> +		phy_dev->state = PHY_RESUMING;
> +		goto spin_unlock;
> +	}
>  
> -static void mii_parse_lxt970_csr(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	/* Duplex link change */
> +	if (phy_dev->link) {
> +		if (fep->full_duplex != phy_dev->duplex) {
> +			fec_restart(dev, phy_dev->duplex);
> +			status_change = 1;
> +		}
> +	}
>  
> -	status = *s & ~(PHY_STAT_SPMASK);
> -	if (mii_reg & 0x0800) {
> -		if (mii_reg & 0x1000)
> -			status |= PHY_STAT_100FDX;
> +	/* Link on or off change */
> +	if (phy_dev->link != fep->link) {
> +		fep->link = phy_dev->link;
> +		if (phy_dev->link)
> +			fec_restart(dev, phy_dev->duplex);
>  		else
> -			status |= PHY_STAT_100HDX;
> -	} else {
> -		if (mii_reg & 0x1000)
> -			status |= PHY_STAT_10FDX;
> -		else
> -			status |= PHY_STAT_10HDX;
> +			fec_stop(dev);
> +		status_change = 1;
>  	}
> -	*s = status;
> -}
> -
> -static phy_cmd_t const phy_cmd_lxt970_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_startup[] = { /* enable interrupts */
> -		{ mk_mii_write(MII_LXT970_IER, 0x0002), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_ack_int[] = {
> -		/* read SR and ISR to acknowledge */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_LXT970_ISR), NULL },
> -
> -		/* find out the current status */
> -		{ mk_mii_read(MII_LXT970_CSR), mii_parse_lxt970_csr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_LXT970_IER, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_lxt970 = {
> -	.id = 0x07810000,
> -	.name = "LXT970",
> -	.config = phy_cmd_lxt970_config,
> -	.startup = phy_cmd_lxt970_startup,
> -	.ack_int = phy_cmd_lxt970_ack_int,
> -	.shutdown = phy_cmd_lxt970_shutdown
> -};
>  
> -/* ------------------------------------------------------------------------- */
> -/* The Level one LXT971 is used on some of my custom boards                  */
> -
> -/* register definitions for the 971 */
> +spin_unlock:
> +	spin_unlock_irqrestore(&fep->hw_lock, flags);
>  
> -#define MII_LXT971_PCR       16  /* Port Control Register     */
> -#define MII_LXT971_SR2       17  /* Status Register 2         */
> -#define MII_LXT971_IER       18  /* Interrupt Enable Register */
> -#define MII_LXT971_ISR       19  /* Interrupt Status Register */
> -#define MII_LXT971_LCR       20  /* LED Control Register      */
> -#define MII_LXT971_TCR       30  /* Transmit Control Register */
> +	if (status_change)
> +		phy_print_status(phy_dev);
> +}
>  
>  /*
> - * I had some nice ideas of running the MDIO faster...
> - * The 971 should support 8MHz and I tried it, but things acted really
> - * weird, so 2.5 MHz ought to be enough for anyone...
> + * NOTE: a MII transaction is during around 25 us, so polling it...
>   */
> -
> -static void mii_parse_lxt971_sr2(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	struct fec_enet_private *fep = bus->priv;
> +	int timeout = FEC_MII_TIMEOUT;
>  
> -	status = *s & ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
> +	fep->mii_timeout = 0;
>  
> -	if (mii_reg & 0x0400) {
> -		fep->link = 1;
> -		status |= PHY_STAT_LINK;
> -	} else {
> -		fep->link = 0;
> -	}
> -	if (mii_reg & 0x0080)
> -		status |= PHY_STAT_ANC;
> -	if (mii_reg & 0x4000) {
> -		if (mii_reg & 0x0200)
> -			status |= PHY_STAT_100FDX;
> -		else
> -			status |= PHY_STAT_100HDX;
> -	} else {
> -		if (mii_reg & 0x0200)
> -			status |= PHY_STAT_10FDX;
> -		else
> -			status |= PHY_STAT_10HDX;
> +	/* clear MII end of transfer bit*/
> +	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
> +
> +	/* start a read op */
> +	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> +		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> +		FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
> +
> +	/* wait for end of transfer */
> +	while (!(readl(fep->hwp + FEC_IEVENT) & FEC_ENET_MII)) {
> +		cpu_relax();
> +		if (timeout-- < 0) {
> +			fep->mii_timeout = 1;
> +			printk(KERN_ERR "FEC: MDIO read timeout\n");
> +			return -ETIMEDOUT;
> +		}
>  	}
> -	if (mii_reg & 0x0008)
> -		status |= PHY_STAT_FAULT;
>  
> -	*s = status;
> +	/* return value */
> +	return FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
>  }
>  
> -static phy_cmd_t const phy_cmd_lxt971_config[] = {
> -		/* limit to 10MBit because my prototype board
> -		 * doesn't work with 100. */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_LXT971_IER, 0x00f2), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_write(MII_LXT971_LCR, 0xd422), NULL }, /* LED config */
> -		/* Somehow does the 971 tell me that the link is down
> -		 * the first read after power-up.
> -		 * read here to get a valid value in ack_int */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_ack_int[] = {
> -		/* acknowledge the int before reading status ! */
> -		{ mk_mii_read(MII_LXT971_ISR), NULL },
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_LXT971_IER, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_lxt971 = {
> -	.id = 0x0001378e,
> -	.name = "LXT971",
> -	.config = phy_cmd_lxt971_config,
> -	.startup = phy_cmd_lxt971_startup,
> -	.ack_int = phy_cmd_lxt971_ack_int,
> -	.shutdown = phy_cmd_lxt971_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* The Quality Semiconductor QS6612 is used on the RPX CLLF                  */
> -
> -/* register definitions */
> -
> -#define MII_QS6612_MCR       17  /* Mode Control Register      */
> -#define MII_QS6612_FTR       27  /* Factory Test Register      */
> -#define MII_QS6612_MCO       28  /* Misc. Control Register     */
> -#define MII_QS6612_ISR       29  /* Interrupt Source Register  */
> -#define MII_QS6612_IMR       30  /* Interrupt Mask Register    */
> -#define MII_QS6612_PCR       31  /* 100BaseTx PHY Control Reg. */
> -
> -static void mii_parse_qs6612_pcr(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> +			   u16 value)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	struct fec_enet_private *fep = bus->priv;
> +	int timeout = FEC_MII_TIMEOUT;
>  
> -	status = *s & ~(PHY_STAT_SPMASK);
> +	fep->mii_timeout = 0;
>  
> -	switch((mii_reg >> 2) & 7) {
> -	case 1: status |= PHY_STAT_10HDX; break;
> -	case 2: status |= PHY_STAT_100HDX; break;
> -	case 5: status |= PHY_STAT_10FDX; break;
> -	case 6: status |= PHY_STAT_100FDX; break;
> -}
> -
> -	*s = status;
> -}
> -
> -static phy_cmd_t const phy_cmd_qs6612_config[] = {
> -		/* The PHY powers up isolated on the RPX,
> -		 * so send a command to allow operation.
> -		 */
> -		{ mk_mii_write(MII_QS6612_PCR, 0x0dc0), NULL },
> -
> -		/* parse cr and anar to get some info */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_QS6612_IMR, 0x003a), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_ack_int[] = {
> -		/* we need to read ISR, SR and ANER to acknowledge */
> -		{ mk_mii_read(MII_QS6612_ISR), NULL },
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_REG_ANER), NULL },
> -
> -		/* read pcr to get info */
> -		{ mk_mii_read(MII_QS6612_PCR), mii_parse_qs6612_pcr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_QS6612_IMR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_qs6612 = {
> -	.id = 0x00181440,
> -	.name = "QS6612",
> -	.config = phy_cmd_qs6612_config,
> -	.startup = phy_cmd_qs6612_startup,
> -	.ack_int = phy_cmd_qs6612_ack_int,
> -	.shutdown = phy_cmd_qs6612_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* AMD AM79C874 phy                                                          */
> +	/* clear MII end of transfer bit*/
> +	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>  
> -/* register definitions for the 874 */
> +	/* start a read op */
> +	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> +		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> +		FEC_MMFR_TA | FEC_MMFR_DATA(value),
> +		fep->hwp + FEC_MII_DATA);
> +
> +	/* wait for end of transfer */
> +	while (!(readl(fep->hwp + FEC_IEVENT) & FEC_ENET_MII)) {
> +		cpu_relax();
> +		if (timeout-- < 0) {
> +			fep->mii_timeout = 1;
> +			printk(KERN_ERR "FEC: MDIO write timeout\n");
> +			return -ETIMEDOUT;
> +		}
> +	}
>  
> -#define MII_AM79C874_MFR       16  /* Miscellaneous Feature Register */
> -#define MII_AM79C874_ICSR      17  /* Interrupt/Status Register      */
> -#define MII_AM79C874_DR        18  /* Diagnostic Register            */
> -#define MII_AM79C874_PMLR      19  /* Power and Loopback Register    */
> -#define MII_AM79C874_MCR       21  /* ModeControl Register           */
> -#define MII_AM79C874_DC        23  /* Disconnect Counter             */
> -#define MII_AM79C874_REC       24  /* Recieve Error Counter          */
> +	return 0;
> +}
>  
> -static void mii_parse_am79c874_dr(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_reset(struct mii_bus *bus)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_STAT_SPMASK | PHY_STAT_ANC);
> -
> -	if (mii_reg & 0x0080)
> -		status |= PHY_STAT_ANC;
> -	if (mii_reg & 0x0400)
> -		status |= ((mii_reg & 0x0800) ? PHY_STAT_100FDX : PHY_STAT_100HDX);
> -	else
> -		status |= ((mii_reg & 0x0800) ? PHY_STAT_10FDX : PHY_STAT_10HDX);
> -
> -	*s = status;
> +	return 0;
>  }
>  
> -static phy_cmd_t const phy_cmd_am79c874_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_AM79C874_ICSR, 0xff00), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_ack_int[] = {
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
> -		/* we only need to read ISR to acknowledge */
> -		{ mk_mii_read(MII_AM79C874_ICSR), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_AM79C874_ICSR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_am79c874 = {
> -	.id = 0x00022561,
> -	.name = "AM79C874",
> -	.config = phy_cmd_am79c874_config,
> -	.startup = phy_cmd_am79c874_startup,
> -	.ack_int = phy_cmd_am79c874_ack_int,
> -	.shutdown = phy_cmd_am79c874_shutdown
> -};
> -
> -
> -/* ------------------------------------------------------------------------- */
> -/* Kendin KS8721BL phy                                                       */
> -
> -/* register definitions for the 8721 */
> -
> -#define MII_KS8721BL_RXERCR	21
> -#define MII_KS8721BL_ICSR	27
> -#define	MII_KS8721BL_PHYCR	31
> -
> -static phy_cmd_t const phy_cmd_ks8721bl_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_KS8721BL_ICSR, 0xff00), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_ack_int[] = {
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		/* we only need to read ISR to acknowledge */
> -		{ mk_mii_read(MII_KS8721BL_ICSR), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_KS8721BL_ICSR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_ks8721bl = {
> -	.id = 0x00022161,
> -	.name = "KS8721BL",
> -	.config = phy_cmd_ks8721bl_config,
> -	.startup = phy_cmd_ks8721bl_startup,
> -	.ack_int = phy_cmd_ks8721bl_ack_int,
> -	.shutdown = phy_cmd_ks8721bl_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* register definitions for the DP83848 */
> -
> -#define MII_DP8384X_PHYSTST    16  /* PHY Status Register */
> -
> -static void mii_parse_dp8384x_sr2(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mii_probe(struct net_device *dev)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -
> -	*s &= ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
> -
> -	/* Link up */
> -	if (mii_reg & 0x0001) {
> -		fep->link = 1;
> -		*s |= PHY_STAT_LINK;
> -	} else
> -		fep->link = 0;
> -	/* Status of link */
> -	if (mii_reg & 0x0010)   /* Autonegotioation complete */
> -		*s |= PHY_STAT_ANC;
> -	if (mii_reg & 0x0002) {   /* 10MBps? */
> -		if (mii_reg & 0x0004)   /* Full Duplex? */
> -			*s |= PHY_STAT_10FDX;
> -		else
> -			*s |= PHY_STAT_10HDX;
> -	} else {                  /* 100 Mbps? */
> -		if (mii_reg & 0x0004)   /* Full Duplex? */
> -			*s |= PHY_STAT_100FDX;
> -		else
> -			*s |= PHY_STAT_100HDX;
> -	}
> -	if (mii_reg & 0x0008)
> -		*s |= PHY_STAT_FAULT;
> -}
> -
> -static phy_info_t phy_info_dp83848= {
> -	0x020005c9,
> -	"DP83848",
> +	struct phy_device *phy_dev = NULL;
> +	int phy_addr;
>  
> -	(const phy_cmd_t []) {  /* config */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_DP8384X_PHYSTST), mii_parse_dp8384x_sr2 },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) {  /* startup - enable interrupts */
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* ack_int - never happens, no interrupt */
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) {  /* shutdown */
> -		{ mk_mii_end, }
> -	},
> -};
> +	/* find the first phy */
> +	for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++) {
> +		if (fep->mii_bus->phy_map[phy_addr]) {
> +			phy_dev = fep->mii_bus->phy_map[phy_addr];
> +			break;
> +		}
> +	}
>  
> -static phy_info_t phy_info_lan8700 = {
> -	0x0007C0C,
> -	"LAN8700",
> -	(const phy_cmd_t []) { /* config */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* startup */
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* act_int */
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* shutdown */
> -		{ mk_mii_end, }
> -	},
> -};
> -/* ------------------------------------------------------------------------- */
> +	if (!phy_dev) {
> +		printk(KERN_ERR "%s: no PHY found\n", dev->name);
> +		return -ENODEV;
> +	}
>  
> -static phy_info_t const * const phy_info[] = {
> -	&phy_info_lxt970,
> -	&phy_info_lxt971,
> -	&phy_info_qs6612,
> -	&phy_info_am79c874,
> -	&phy_info_ks8721bl,
> -	&phy_info_dp83848,
> -	&phy_info_lan8700,
> -	NULL
> -};
> +	/* attach the mac to the phy */
> +	phy_dev = phy_connect(dev, dev_name(&phy_dev->dev),
> +			     &fec_enet_adjust_link, 0,
> +			     PHY_INTERFACE_MODE_MII);
> +	if (IS_ERR(phy_dev)) {
> +		printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
> +		return PTR_ERR(phy_dev);
> +	}
>  
> -/* ------------------------------------------------------------------------- */
> -#ifdef HAVE_mii_link_interrupt
> -static irqreturn_t
> -mii_link_interrupt(int irq, void * dev_id);
> +	/* mask with MAC supported features */
> +	phy_dev->supported &= PHY_BASIC_FEATURES;
> +	phy_dev->advertising = phy_dev->supported;
>  
> -/*
> - *	This is specific to the MII interrupt setup of the M5272EVB.
> - */
> -static void __inline__ fec_request_mii_intr(struct net_device *dev)
> -{
> -	if (request_irq(66, mii_link_interrupt, IRQF_DISABLED, "fec(MII)", dev) != 0)
> -		printk("FEC: Could not allocate fec(MII) IRQ(66)!\n");
> -}
> +	fep->phy_dev = phy_dev;
> +	fep->link = 0;
> +	fep->full_duplex = 0;
>  
> -static void __inline__ fec_disable_phy_intr(struct net_device *dev)
> -{
> -	free_irq(66, dev);
> +	return 0;
>  }
> -#endif
>  
> -#ifdef CONFIG_M5272
> -static void __inline__ fec_get_mac(struct net_device *dev)
> +static int fec_enet_mii_init(struct platform_device *pdev)
>  {
> +	struct net_device *dev = platform_get_drvdata(pdev);
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	unsigned char *iap, tmpaddr[ETH_ALEN];
> +	int err = -ENXIO, i;
>  
> -	if (FEC_FLASHMAC) {
> -		/*
> -		 * Get MAC address from FLASH.
> -		 * If it is all 1's or 0's, use the default.
> -		 */
> -		iap = (unsigned char *)FEC_FLASHMAC;
> -		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
> -		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
> -			iap = fec_mac_default;
> -		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
> -		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
> -			iap = fec_mac_default;
> -	} else {
> -		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
> -		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
> -		iap = &tmpaddr[0];
> -	}
> -
> -	memcpy(dev->dev_addr, iap, ETH_ALEN);
> -
> -	/* Adjust MAC if using default MAC address */
> -	if (iap == fec_mac_default)
> -		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
> -}
> -#endif
> +	fep->mii_timeout = 0;
>  
> -/* ------------------------------------------------------------------------- */
> -
> -static void mii_display_status(struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> +	/*
> +	 * Set MII speed to 2.5 MHz (= clk_get_rate() / 2 * phy_speed)
> +	 */
> +	fep->phy_speed = DIV_ROUND_UP(clk_get_rate(fep->clk), 5000000) << 1;
> +	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
>  
> -	if (!fep->link && !fep->old_link) {
> -		/* Link is still down - don't print anything */
> -		return;
> +	fep->mii_bus = mdiobus_alloc();
> +	if (fep->mii_bus == NULL) {
> +		err = -ENOMEM;
> +		goto err_out;
>  	}
>  
> -	printk("%s: status: ", dev->name);
> -
> -	if (!fep->link) {
> -		printk("link down");
> -	} else {
> -		printk("link up");
> -
> -		switch(*s & PHY_STAT_SPMASK) {
> -		case PHY_STAT_100FDX: printk(", 100MBit Full Duplex"); break;
> -		case PHY_STAT_100HDX: printk(", 100MBit Half Duplex"); break;
> -		case PHY_STAT_10FDX: printk(", 10MBit Full Duplex"); break;
> -		case PHY_STAT_10HDX: printk(", 10MBit Half Duplex"); break;
> -		default:
> -			printk(", Unknown speed/duplex");
> -		}
> -
> -		if (*s & PHY_STAT_ANC)
> -			printk(", auto-negotiation complete");
> +	fep->mii_bus->name = "fec_enet_mii_bus";
> +	fep->mii_bus->read = fec_enet_mdio_read;
> +	fep->mii_bus->write = fec_enet_mdio_write;
> +	fep->mii_bus->reset = fec_enet_mdio_reset;
> +	snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id);
> +	fep->mii_bus->priv = fep;
> +	fep->mii_bus->parent = &pdev->dev;
> +
> +	fep->mii_bus->irq = kmalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
> +	if (!fep->mii_bus->irq) {
> +		err = -ENOMEM;
> +		goto err_out_free_mdiobus;
>  	}
>  
> -	if (*s & PHY_STAT_FAULT)
> -		printk(", remote fault");
> -
> -	printk(".\n");
> -}
> -
> -static void mii_display_config(struct work_struct *work)
> -{
> -	struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
> -	struct net_device *dev = fep->netdev;
> -	uint status = fep->phy_status;
> +	for (i = 0; i < PHY_MAX_ADDR; i++)
> +		fep->mii_bus->irq[i] = PHY_POLL;
>  
> -	/*
> -	** When we get here, phy_task is already removed from
> -	** the workqueue.  It is thus safe to allow to reuse it.
> -	*/
> -	fep->mii_phy_task_queued = 0;
> -	printk("%s: config: auto-negotiation ", dev->name);
> -
> -	if (status & PHY_CONF_ANE)
> -		printk("on");
> -	else
> -		printk("off");
> +	platform_set_drvdata(dev, fep->mii_bus);
>  
> -	if (status & PHY_CONF_100FDX)
> -		printk(", 100FDX");
> -	if (status & PHY_CONF_100HDX)
> -		printk(", 100HDX");
> -	if (status & PHY_CONF_10FDX)
> -		printk(", 10FDX");
> -	if (status & PHY_CONF_10HDX)
> -		printk(", 10HDX");
> -	if (!(status & PHY_CONF_SPMASK))
> -		printk(", No speed/duplex selected?");
> +	if (mdiobus_register(fep->mii_bus))
> +		goto err_out_free_mdio_irq;
>  
> -	if (status & PHY_CONF_LOOP)
> -		printk(", loopback enabled");
> +	if (fec_enet_mii_probe(dev) != 0)
> +		goto err_out_unregister_bus;
>  
> -	printk(".\n");
> +	return 0;
>  
> -	fep->sequence_done = 1;
> +err_out_unregister_bus:
> +	mdiobus_unregister(fep->mii_bus);
> +err_out_free_mdio_irq:
> +	kfree(fep->mii_bus->irq);
> +err_out_free_mdiobus:
> +	mdiobus_free(fep->mii_bus);
> +err_out:
> +	return err;
>  }
>  
> -static void mii_relink(struct work_struct *work)
> +static void fec_enet_mii_remove(struct fec_enet_private *fep)
>  {
> -	struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
> -	struct net_device *dev = fep->netdev;
> -	int duplex;
> -
> -	/*
> -	** When we get here, phy_task is already removed from
> -	** the workqueue.  It is thus safe to allow to reuse it.
> -	*/
> -	fep->mii_phy_task_queued = 0;
> -	fep->link = (fep->phy_status & PHY_STAT_LINK) ? 1 : 0;
> -	mii_display_status(dev);
> -	fep->old_link = fep->link;
> -
> -	if (fep->link) {
> -		duplex = 0;
> -		if (fep->phy_status
> -		    & (PHY_STAT_100FDX | PHY_STAT_10FDX))
> -			duplex = 1;
> -		fec_restart(dev, duplex);
> -	} else
> -		fec_stop(dev);
> +	if (fep->phy_dev)
> +		phy_disconnect(fep->phy_dev);
> +	mdiobus_unregister(fep->mii_bus);
> +	kfree(fep->mii_bus->irq);
> +	mdiobus_free(fep->mii_bus);
>  }
>  
> -/* mii_queue_relink is called in interrupt context from mii_link_interrupt */
> -static void mii_queue_relink(uint mii_reg, struct net_device *dev)
> +static int fec_enet_get_settings(struct net_device *dev,
> +				  struct ethtool_cmd *cmd)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
>  
> -	/*
> -	 * We cannot queue phy_task twice in the workqueue.  It
> -	 * would cause an endless loop in the workqueue.
> -	 * Fortunately, if the last mii_relink entry has not yet been
> -	 * executed now, it will do the job for the current interrupt,
> -	 * which is just what we want.
> -	 */
> -	if (fep->mii_phy_task_queued)
> -		return;
> +	if (!phydev)
> +		return -ENODEV;
>  
> -	fep->mii_phy_task_queued = 1;
> -	INIT_WORK(&fep->phy_task, mii_relink);
> -	schedule_work(&fep->phy_task);
> +	return phy_ethtool_gset(phydev, cmd);
>  }
>  
> -/* mii_queue_config is called in interrupt context from fec_enet_mii */
> -static void mii_queue_config(uint mii_reg, struct net_device *dev)
> +static int fec_enet_set_settings(struct net_device *dev,
> +				 struct ethtool_cmd *cmd)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
>  
> -	if (fep->mii_phy_task_queued)
> -		return;
> +	if (!phydev)
> +		return -ENODEV;
>  
> -	fep->mii_phy_task_queued = 1;
> -	INIT_WORK(&fep->phy_task, mii_display_config);
> -	schedule_work(&fep->phy_task);
> +	return phy_ethtool_sset(phydev, cmd);
>  }
>  
> -phy_cmd_t const phy_cmd_relink[] = {
> -	{ mk_mii_read(MII_REG_CR), mii_queue_relink },
> -	{ mk_mii_end, }
> -	};
> -phy_cmd_t const phy_cmd_config[] = {
> -	{ mk_mii_read(MII_REG_CR), mii_queue_config },
> -	{ mk_mii_end, }
> -	};
> -
> -/* Read remainder of PHY ID. */
> -static void
> -mii_discover_phy3(uint mii_reg, struct net_device *dev)
> +static void fec_enet_get_drvinfo(struct net_device *dev,
> +				 struct ethtool_drvinfo *info)
>  {
> -	struct fec_enet_private *fep;
> -	int i;
> -
> -	fep = netdev_priv(dev);
> -	fep->phy_id |= (mii_reg & 0xffff);
> -	printk("fec: PHY @ 0x%x, ID 0x%08x", fep->phy_addr, fep->phy_id);
> -
> -	for(i = 0; phy_info[i]; i++) {
> -		if(phy_info[i]->id == (fep->phy_id >> 4))
> -			break;
> -	}
> -
> -	if (phy_info[i])
> -		printk(" -- %s\n", phy_info[i]->name);
> -	else
> -		printk(" -- unknown PHY!\n");
> +	struct fec_enet_private *fep = netdev_priv(dev);
>  
> -	fep->phy = phy_info[i];
> -	fep->phy_id_done = 1;
> +	strcpy(info->driver, fep->pdev->dev.driver->name);
> +	strcpy(info->version, "Revision: 1.0");
> +	strcpy(info->bus_info, dev_name(&dev->dev));
>  }
>  
> -/* Scan all of the MII PHY addresses looking for someone to respond
> - * with a valid ID.  This usually happens quickly.
> - */
> -static void
> -mii_discover_phy(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep;
> -	uint phytype;
> -
> -	fep = netdev_priv(dev);
> -
> -	if (fep->phy_addr < 32) {
> -		if ((phytype = (mii_reg & 0xffff)) != 0xffff && phytype != 0) {
> -
> -			/* Got first part of ID, now get remainder */
> -			fep->phy_id = phytype << 16;
> -			mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR2),
> -							mii_discover_phy3);
> -		} else {
> -			fep->phy_addr++;
> -			mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR1),
> -							mii_discover_phy);
> -		}
> -	} else {
> -		printk("FEC: No PHY device found.\n");
> -		/* Disable external MII interface */
> -		writel(0, fep->hwp + FEC_MII_SPEED);
> -		fep->phy_speed = 0;
> -#ifdef HAVE_mii_link_interrupt
> -		fec_disable_phy_intr(dev);
> -#endif
> -	}
> -}
> +static struct ethtool_ops fec_enet_ethtool_ops = {
> +	.get_settings		= fec_enet_get_settings,
> +	.set_settings		= fec_enet_set_settings,
> +	.get_drvinfo		= fec_enet_get_drvinfo,
> +	.get_link		= ethtool_op_get_link,
> +};
>  
> -/* This interrupt occurs when the PHY detects a link change */
> -#ifdef HAVE_mii_link_interrupt
> -static irqreturn_t
> -mii_link_interrupt(int irq, void * dev_id)
> +static int fec_enet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
>  {
> -	struct	net_device *dev = dev_id;
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
> +
> +	if (!netif_running(dev))
> +		return -EINVAL;
>  
> -	mii_do_cmd(dev, fep->phy->ack_int);
> -	mii_do_cmd(dev, phy_cmd_relink);  /* restart and display status */
> +	if (!phydev)
> +		return -ENODEV;
>  
> -	return IRQ_HANDLED;
> +	return phy_mii_ioctl(phydev, if_mii(rq), cmd);
>  }
> -#endif
>  
>  static void fec_enet_free_buffers(struct net_device *dev)
>  {
> @@ -1509,35 +915,8 @@ fec_enet_open(struct net_device *dev)
>  	if (ret)
>  		return ret;
>  
> -	fep->sequence_done = 0;
> -	fep->link = 0;
> -
> -	fec_restart(dev, 1);
> -
> -	if (fep->phy) {
> -		mii_do_cmd(dev, fep->phy->ack_int);
> -		mii_do_cmd(dev, fep->phy->config);
> -		mii_do_cmd(dev, phy_cmd_config);  /* display configuration */
> -
> -		/* Poll until the PHY tells us its configuration
> -		 * (not link state).
> -		 * Request is initiated by mii_do_cmd above, but answer
> -		 * comes by interrupt.
> -		 * This should take about 25 usec per register at 2.5 MHz,
> -		 * and we read approximately 5 registers.
> -		 */
> -		while(!fep->sequence_done)
> -			schedule();
> -
> -		mii_do_cmd(dev, fep->phy->startup);
> -	}
> -
> -	/* Set the initial link state to true. A lot of hardware
> -	 * based on this device does not implement a PHY interrupt,
> -	 * so we are never notified of link change.
> -	 */
> -	fep->link = 1;
> -
> +	/* schedule a link state check */
> +	phy_start(fep->phy_dev);
>  	netif_start_queue(dev);
>  	fep->opened = 1;
>  	return 0;
> @@ -1550,6 +929,7 @@ fec_enet_close(struct net_device *dev)
>  
>  	/* Don't know what to do yet. */
>  	fep->opened = 0;
> +	phy_stop(fep->phy_dev);
>  	netif_stop_queue(dev);
>  	fec_stop(dev);
>  
> @@ -1666,6 +1046,7 @@ static const struct net_device_ops fec_netdev_ops = {
>  	.ndo_validate_addr	= eth_validate_addr,
>  	.ndo_tx_timeout		= fec_timeout,
>  	.ndo_set_mac_address	= fec_set_mac_address,
> +	.ndo_do_ioctl           = fec_enet_ioctl,
>  };
>  
>   /*
> @@ -1689,7 +1070,6 @@ static int fec_enet_init(struct net_device *dev, int index)
>  	}
>  
>  	spin_lock_init(&fep->hw_lock);
> -	spin_lock_init(&fep->mii_lock);
>  
>  	fep->index = index;
>  	fep->hwp = (void __iomem *)dev->base_addr;
> @@ -1716,20 +1096,10 @@ static int fec_enet_init(struct net_device *dev, int index)
>  	fep->rx_bd_base = cbd_base;
>  	fep->tx_bd_base = cbd_base + RX_RING_SIZE;
>  
> -#ifdef HAVE_mii_link_interrupt
> -	fec_request_mii_intr(dev);
> -#endif
>  	/* The FEC Ethernet specific entries in the device structure */
>  	dev->watchdog_timeo = TX_TIMEOUT;
>  	dev->netdev_ops = &fec_netdev_ops;
> -
> -	for (i=0; i<NMII-1; i++)
> -		mii_cmds[i].mii_next = &mii_cmds[i+1];
> -	mii_free = mii_cmds;
> -
> -	/* Set MII speed to 2.5 MHz */
> -	fep->phy_speed = ((((clk_get_rate(fep->clk) / 2 + 4999999)
> -					/ 2500000) / 2) & 0x3F) << 1;
> +	dev->ethtool_ops = &fec_enet_ethtool_ops;
>  
>  	/* Initialize the receive buffer descriptors. */
>  	bdp = fep->rx_bd_base;
> @@ -1760,13 +1130,6 @@ static int fec_enet_init(struct net_device *dev, int index)
>  
>  	fec_restart(dev, 0);
>  
> -	/* Queue up command to detect the PHY and initialize the
> -	 * remainder of the interface.
> -	 */
> -	fep->phy_id_done = 0;
> -	fep->phy_addr = 0;
> -	mii_queue(dev, mk_mii_read(MII_REG_PHYIR1), mii_discover_phy);
> -
>  	return 0;
>  }
>  
> @@ -1835,8 +1198,7 @@ fec_restart(struct net_device *dev, int duplex)
>  	writel(0, fep->hwp + FEC_R_DES_ACTIVE);
>  
>  	/* Enable interrupts we wish to service */
> -	writel(FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII,
> -			fep->hwp + FEC_IMASK);
> +	writel(FEC_ENET_TXF | FEC_ENET_RXF, fep->hwp + FEC_IMASK);
>  }
>  
>  static void
> @@ -1859,7 +1221,6 @@ fec_stop(struct net_device *dev)
>  	/* Clear outstanding MII command interrupts. */
>  	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>  
> -	writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
>  	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
>  }
>  
> @@ -1891,6 +1252,7 @@ fec_probe(struct platform_device *pdev)
>  	memset(fep, 0, sizeof(*fep));
>  
>  	ndev->base_addr = (unsigned long)ioremap(r->start, resource_size(r));
> +	fep->pdev = pdev;
>  
>  	if (!ndev->base_addr) {
>  		ret = -ENOMEM;
> @@ -1926,13 +1288,24 @@ fec_probe(struct platform_device *pdev)
>  	if (ret)
>  		goto failed_init;
>  
> +	ret = fec_enet_mii_init(pdev);
> +	if (ret)
> +		goto failed_mii_init;
> +
>  	ret = register_netdev(ndev);
>  	if (ret)
>  		goto failed_register;
>  
> +	printk(KERN_INFO "%s: Freescale FEC PHY driver [%s] "
> +		"(mii_bus:phy_addr=%s, irq=%d)\n", ndev->name,
> +		fep->phy_dev->drv->name, dev_name(&fep->phy_dev->dev),
> +		fep->phy_dev->irq);
> +
>  	return 0;
>  
>  failed_register:
> +	fec_enet_mii_remove(fep);
> +failed_mii_init:
>  failed_init:
>  	clk_disable(fep->clk);
>  	clk_put(fep->clk);
> @@ -1959,6 +1332,7 @@ fec_drv_remove(struct platform_device *pdev)
>  	platform_set_drvdata(pdev, NULL);
>  
>  	fec_stop(ndev);
> +	fec_enet_mii_remove(fep);
>  	clk_disable(fep->clk);
>  	clk_put(fep->clk);
>  	iounmap((void __iomem *)ndev->base_addr);
> -- 
> 1.7.0.1
> 
> 

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* [PATCH nf-next-2.6] xt_hashlimit: RCU conversion
From: Eric Dumazet @ 2010-03-31 12:23 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Jorrit Kronjee, netfilter-devel, netdev
In-Reply-To: <1269514928.3626.13.camel@edumazet-laptop>

xt_hashlimit uses a central lock per hash table and suffers from
contention on some workloads. (Multiqueue NIC or if RPS is enabled)

After RCU conversion, central lock is only used when a writer wants to
add or delete an entry.

For 'readers', updating an existing entry, they use an individual lock
per entry.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/netfilter/xt_hashlimit.c |  115 ++++++++++++++++++---------------
 1 file changed, 66 insertions(+), 49 deletions(-)
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 5470bb0..f245de6 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -81,12 +81,14 @@ struct dsthash_ent {
 	struct dsthash_dst dst;
 
 	/* modified structure members in the end */
+	spinlock_t lock;
 	unsigned long expires;		/* precalculated expiry time */
 	struct {
 		unsigned long prev;	/* last modification */
 		u_int32_t credit;
 		u_int32_t credit_cap, cost;
 	} rateinfo;
+	struct rcu_head rcu;
 };
 
 struct xt_hashlimit_htable {
@@ -143,54 +145,30 @@ dsthash_find(const struct xt_hashlimit_htable *ht,
 	u_int32_t hash = hash_dst(ht, dst);
 
 	if (!hlist_empty(&ht->hash[hash])) {
-		hlist_for_each_entry(ent, pos, &ht->hash[hash], node)
-			if (dst_cmp(ent, dst))
+		hlist_for_each_entry_rcu(ent, pos, &ht->hash[hash], node)
+			if (dst_cmp(ent, dst)) {
+				spin_lock(&ent->lock);
 				return ent;
+			}
 	}
 	return NULL;
 }
 
-/* allocate dsthash_ent, initialize dst, put in htable and lock it */
-static struct dsthash_ent *
-dsthash_alloc_init(struct xt_hashlimit_htable *ht,
-		   const struct dsthash_dst *dst)
+static void dsthash_free_rcu(struct rcu_head *head)
 {
-	struct dsthash_ent *ent;
+	struct dsthash_ent *ent = container_of(head, struct dsthash_ent, rcu);
 
-	/* initialize hash with random val at the time we allocate
-	 * the first hashtable entry */
-	if (!ht->rnd_initialized) {
-		get_random_bytes(&ht->rnd, sizeof(ht->rnd));
-		ht->rnd_initialized = true;
-	}
-
-	if (ht->cfg.max && ht->count >= ht->cfg.max) {
-		/* FIXME: do something. question is what.. */
-		if (net_ratelimit())
-			pr_err("max count of %u reached\n", ht->cfg.max);
-		return NULL;
-	}
-
-	ent = kmem_cache_alloc(hashlimit_cachep, GFP_ATOMIC);
-	if (!ent) {
-		if (net_ratelimit())
-			pr_err("cannot allocate dsthash_ent\n");
-		return NULL;
-	}
-	memcpy(&ent->dst, dst, sizeof(ent->dst));
-
-	hlist_add_head(&ent->node, &ht->hash[hash_dst(ht, dst)]);
-	ht->count++;
-	return ent;
+	kmem_cache_free(hashlimit_cachep, ent);
 }
 
-static inline void
-dsthash_free(struct xt_hashlimit_htable *ht, struct dsthash_ent *ent)
+static void dsthash_free(struct xt_hashlimit_htable *ht, struct dsthash_ent *ent)
 {
-	hlist_del(&ent->node);
-	kmem_cache_free(hashlimit_cachep, ent);
+	hlist_del_rcu(&ent->node);
+	call_rcu_bh(&ent->rcu, dsthash_free_rcu);
 	ht->count--;
 }
+
+
 static void htable_gc(unsigned long htlong);
 
 static int htable_create(struct net *net, struct xt_hashlimit_mtinfo1 *minfo,
@@ -500,6 +478,49 @@ hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo,
 	return 0;
 }
 
+/* allocate dsthash_ent, initialize dst, put in htable and lock it */
+static struct dsthash_ent *
+dsthash_alloc_init(struct xt_hashlimit_htable *ht,
+		   const struct dsthash_dst *dst)
+{
+	struct dsthash_ent *ent;
+
+	spin_lock(&ht->lock);
+	/* initialize hash with random val at the time we allocate
+	 * the first hashtable entry */
+	if (unlikely(!ht->rnd_initialized)) {
+		get_random_bytes(&ht->rnd, sizeof(ht->rnd));
+		ht->rnd_initialized = true;
+	}
+
+	if (ht->cfg.max && ht->count >= ht->cfg.max) {
+		/* FIXME: do something. question is what.. */
+		if (net_ratelimit())
+			pr_err("max count of %u reached\n", ht->cfg.max);
+		ent = NULL;
+	} else
+		ent = kmem_cache_alloc(hashlimit_cachep, GFP_ATOMIC);
+	if (!ent) {
+		if (net_ratelimit())
+			pr_err("cannot allocate dsthash_ent\n");
+	} else {
+		memcpy(&ent->dst, dst, sizeof(ent->dst));
+		spin_lock_init(&ent->lock);
+
+		ent->expires = jiffies + msecs_to_jiffies(ht->cfg.expire);
+		ent->rateinfo.prev = jiffies;
+		ent->rateinfo.credit = user2credits(ht->cfg.avg *
+		                      ht->cfg.burst);
+		ent->rateinfo.credit_cap = user2credits(ht->cfg.avg *
+		                          ht->cfg.burst);
+		ent->rateinfo.cost = user2credits(ht->cfg.avg);
+		spin_lock(&ent->lock);
+		hlist_add_head_rcu(&ent->node, &ht->hash[hash_dst(ht, dst)]);
+		ht->count++;
+	}
+	spin_unlock(&ht->lock);
+	return ent;
+}
 static bool
 hashlimit_mt(const struct sk_buff *skb, const struct xt_match_param *par)
 {
@@ -512,22 +533,14 @@ hashlimit_mt(const struct sk_buff *skb, const struct xt_match_param *par)
 	if (hashlimit_init_dst(hinfo, &dst, skb, par->thoff) < 0)
 		goto hotdrop;
 
-	spin_lock_bh(&hinfo->lock);
+	rcu_read_lock_bh();
 	dh = dsthash_find(hinfo, &dst);
 	if (dh == NULL) {
 		dh = dsthash_alloc_init(hinfo, &dst);
 		if (dh == NULL) {
-			spin_unlock_bh(&hinfo->lock);
+			rcu_read_unlock_bh();
 			goto hotdrop;
 		}
-
-		dh->expires = jiffies + msecs_to_jiffies(hinfo->cfg.expire);
-		dh->rateinfo.prev = jiffies;
-		dh->rateinfo.credit = user2credits(hinfo->cfg.avg *
-		                      hinfo->cfg.burst);
-		dh->rateinfo.credit_cap = user2credits(hinfo->cfg.avg *
-		                          hinfo->cfg.burst);
-		dh->rateinfo.cost = user2credits(hinfo->cfg.avg);
 	} else {
 		/* update expiration timeout */
 		dh->expires = now + msecs_to_jiffies(hinfo->cfg.expire);
@@ -537,11 +550,13 @@ hashlimit_mt(const struct sk_buff *skb, const struct xt_match_param *par)
 	if (dh->rateinfo.credit >= dh->rateinfo.cost) {
 		/* below the limit */
 		dh->rateinfo.credit -= dh->rateinfo.cost;
-		spin_unlock_bh(&hinfo->lock);
+		spin_unlock(&dh->lock);
+		rcu_read_unlock_bh();
 		return !(info->cfg.mode & XT_HASHLIMIT_INVERT);
 	}
 
-	spin_unlock_bh(&hinfo->lock);
+	spin_unlock(&dh->lock);
+	rcu_read_unlock_bh();
 	/* default match is underlimit - so over the limit, we need to invert */
 	return info->cfg.mode & XT_HASHLIMIT_INVERT;
 
@@ -817,9 +832,11 @@ err1:
 
 static void __exit hashlimit_mt_exit(void)
 {
-	kmem_cache_destroy(hashlimit_cachep);
 	xt_unregister_matches(hashlimit_mt_reg, ARRAY_SIZE(hashlimit_mt_reg));
 	unregister_pernet_subsys(&hashlimit_net_ops);
+
+	rcu_barrier_bh();
+	kmem_cache_destroy(hashlimit_cachep);
 }
 
 module_init(hashlimit_mt_init);



^ permalink raw reply related

* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection (v2)
From: Amit Kucheria @ 2010-03-31 12:21 UTC (permalink / raw)
  To: Bryan Wu
  Cc: netdev, s.hauer, w.sang, linux-kernel, kernel-team, gerg,
	linux-arm-kernel
In-Reply-To: <1270037444-8470-1-git-send-email-bryan.wu@canonical.com>

On 10 Mar 31, Bryan Wu wrote:
> BugLink: http://bugs.launchpad.net/bugs/457878
> 
> v2:
>  - remove duplicated phy_speed caculation
>  - fix the phy_speed caculation according to the DataSheet
> 
> v1:
>  - removed old MII phy control code
>  - add phylib supporting
>  - add ethtool interface to make user space NetworkManager works
> 
> Tested on Freescale i.MX51 Babbage board.
> 
> This patch is based on a patch from Frederic Rodo <fred.rodo@gmail.com>
> 
> Cc: Frederic Rodo <fred.rodo@gmail.com>
> Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
Acked-by: Amit Kucheria <amit.kucheria@canonical.com>

> ---
>  drivers/net/Kconfig |    1 +
>  drivers/net/fec.c   | 1128 ++++++++++++---------------------------------------
>  2 files changed, 252 insertions(+), 877 deletions(-)
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 0ba5b8e..41f6a70 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -1916,6 +1916,7 @@ config FEC
>  	bool "FEC ethernet controller (of ColdFire and some i.MX CPUs)"
>  	depends on M523x || M527x || M5272 || M528x || M520x || M532x || \
>  		MACH_MX27 || ARCH_MX35 || ARCH_MX25 || ARCH_MX5
> +	select PHYLIB
>  	help
>  	  Say Y here if you want to use the built-in 10/100 Fast ethernet
>  	  controller on some Motorola ColdFire and Freescale i.MX processors.
> diff --git a/drivers/net/fec.c b/drivers/net/fec.c
> index 9f98c1c..848eb19 100644
> --- a/drivers/net/fec.c
> +++ b/drivers/net/fec.c
> @@ -40,6 +40,7 @@
>  #include <linux/irq.h>
>  #include <linux/clk.h>
>  #include <linux/platform_device.h>
> +#include <linux/phy.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -61,7 +62,6 @@
>   * Define the fixed address of the FEC hardware.
>   */
>  #if defined(CONFIG_M5272)
> -#define HAVE_mii_link_interrupt
>  
>  static unsigned char	fec_mac_default[] = {
>  	0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> @@ -86,23 +86,6 @@ static unsigned char	fec_mac_default[] = {
>  #endif
>  #endif /* CONFIG_M5272 */
>  
> -/* Forward declarations of some structures to support different PHYs */
> -
> -typedef struct {
> -	uint mii_data;
> -	void (*funct)(uint mii_reg, struct net_device *dev);
> -} phy_cmd_t;
> -
> -typedef struct {
> -	uint id;
> -	char *name;
> -
> -	const phy_cmd_t *config;
> -	const phy_cmd_t *startup;
> -	const phy_cmd_t *ack_int;
> -	const phy_cmd_t *shutdown;
> -} phy_info_t;
> -
>  /* The number of Tx and Rx buffers.  These are allocated from the page
>   * pool.  The code may assume these are power of two, so it it best
>   * to keep them that size.
> @@ -189,29 +172,21 @@ struct fec_enet_private {
>  	uint	tx_full;
>  	/* hold while accessing the HW like ringbuffer for tx/rx but not MAC */
>  	spinlock_t hw_lock;
> -	/* hold while accessing the mii_list_t() elements */
> -	spinlock_t mii_lock;
> -
> -	uint	phy_id;
> -	uint	phy_id_done;
> -	uint	phy_status;
> -	uint	phy_speed;
> -	phy_info_t const	*phy;
> -	struct work_struct phy_task;
>  
> -	uint	sequence_done;
> -	uint	mii_phy_task_queued;
> +	struct  platform_device *pdev;
>  
> -	uint	phy_addr;
> +	int	opened;
>  
> +	/* Phylib and MDIO interface */
> +	struct  mii_bus *mii_bus;
> +	struct  phy_device *phy_dev;
> +	int     mii_timeout;
> +	uint    phy_speed;
>  	int	index;
> -	int	opened;
>  	int	link;
> -	int	old_link;
>  	int	full_duplex;
>  };
>  
> -static void fec_enet_mii(struct net_device *dev);
>  static irqreturn_t fec_enet_interrupt(int irq, void * dev_id);
>  static void fec_enet_tx(struct net_device *dev);
>  static void fec_enet_rx(struct net_device *dev);
> @@ -219,67 +194,20 @@ static int fec_enet_close(struct net_device *dev);
>  static void fec_restart(struct net_device *dev, int duplex);
>  static void fec_stop(struct net_device *dev);
>  
> +/* FEC MII MMFR bits definition */
> +#define FEC_MMFR_ST		(1 << 30)
> +#define FEC_MMFR_OP_READ	(2 << 28)
> +#define FEC_MMFR_OP_WRITE	(1 << 28)
> +#define FEC_MMFR_PA(v)		((v & 0x1f) << 23)
> +#define FEC_MMFR_RA(v)		((v & 0x1f) << 18)
> +#define FEC_MMFR_TA		(2 << 16)
> +#define FEC_MMFR_DATA(v)	(v & 0xffff)
>  
> -/* MII processing.  We keep this as simple as possible.  Requests are
> - * placed on the list (if there is room).  When the request is finished
> - * by the MII, an optional function may be called.
> - */
> -typedef struct mii_list {
> -	uint	mii_regval;
> -	void	(*mii_func)(uint val, struct net_device *dev);
> -	struct	mii_list *mii_next;
> -} mii_list_t;
> -
> -#define		NMII	20
> -static mii_list_t	mii_cmds[NMII];
> -static mii_list_t	*mii_free;
> -static mii_list_t	*mii_head;
> -static mii_list_t	*mii_tail;
> -
> -static int	mii_queue(struct net_device *dev, int request,
> -				void (*func)(uint, struct net_device *));
> -
> -/* Make MII read/write commands for the FEC */
> -#define mk_mii_read(REG)	(0x60020000 | ((REG & 0x1f) << 18))
> -#define mk_mii_write(REG, VAL)	(0x50020000 | ((REG & 0x1f) << 18) | \
> -						(VAL & 0xffff))
> -#define mk_mii_end	0
> +#define FEC_MII_TIMEOUT		10000
>  
>  /* Transmitter timeout */
>  #define TX_TIMEOUT (2 * HZ)
>  
> -/* Register definitions for the PHY */
> -
> -#define MII_REG_CR          0  /* Control Register                         */
> -#define MII_REG_SR          1  /* Status Register                          */
> -#define MII_REG_PHYIR1      2  /* PHY Identification Register 1            */
> -#define MII_REG_PHYIR2      3  /* PHY Identification Register 2            */
> -#define MII_REG_ANAR        4  /* A-N Advertisement Register               */
> -#define MII_REG_ANLPAR      5  /* A-N Link Partner Ability Register        */
> -#define MII_REG_ANER        6  /* A-N Expansion Register                   */
> -#define MII_REG_ANNPTR      7  /* A-N Next Page Transmit Register          */
> -#define MII_REG_ANLPRNPR    8  /* A-N Link Partner Received Next Page Reg. */
> -
> -/* values for phy_status */
> -
> -#define PHY_CONF_ANE	0x0001  /* 1 auto-negotiation enabled */
> -#define PHY_CONF_LOOP	0x0002  /* 1 loopback mode enabled */
> -#define PHY_CONF_SPMASK	0x00f0  /* mask for speed */
> -#define PHY_CONF_10HDX	0x0010  /* 10 Mbit half duplex supported */
> -#define PHY_CONF_10FDX	0x0020  /* 10 Mbit full duplex supported */
> -#define PHY_CONF_100HDX	0x0040  /* 100 Mbit half duplex supported */
> -#define PHY_CONF_100FDX	0x0080  /* 100 Mbit full duplex supported */
> -
> -#define PHY_STAT_LINK	0x0100  /* 1 up - 0 down */
> -#define PHY_STAT_FAULT	0x0200  /* 1 remote fault */
> -#define PHY_STAT_ANC	0x0400  /* 1 auto-negotiation complete	*/
> -#define PHY_STAT_SPMASK	0xf000  /* mask for speed */
> -#define PHY_STAT_10HDX	0x1000  /* 10 Mbit half duplex selected	*/
> -#define PHY_STAT_10FDX	0x2000  /* 10 Mbit full duplex selected	*/
> -#define PHY_STAT_100HDX	0x4000  /* 100 Mbit half duplex selected */
> -#define PHY_STAT_100FDX	0x8000  /* 100 Mbit full duplex selected */
> -
> -
>  static int
>  fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
> @@ -406,12 +334,6 @@ fec_enet_interrupt(int irq, void * dev_id)
>  			ret = IRQ_HANDLED;
>  			fec_enet_tx(dev);
>  		}
> -
> -		if (int_events & FEC_ENET_MII) {
> -			ret = IRQ_HANDLED;
> -			fec_enet_mii(dev);
> -		}
> -
>  	} while (int_events);
>  
>  	return ret;
> @@ -607,827 +529,311 @@ rx_processing_done:
>  	spin_unlock(&fep->hw_lock);
>  }
>  
> -/* called from interrupt context */
> -static void
> -fec_enet_mii(struct net_device *dev)
> -{
> -	struct	fec_enet_private *fep;
> -	mii_list_t	*mip;
> -
> -	fep = netdev_priv(dev);
> -	spin_lock(&fep->mii_lock);
> -
> -	if ((mip = mii_head) == NULL) {
> -		printk("MII and no head!\n");
> -		goto unlock;
> -	}
> -
> -	if (mip->mii_func != NULL)
> -		(*(mip->mii_func))(readl(fep->hwp + FEC_MII_DATA), dev);
> -
> -	mii_head = mip->mii_next;
> -	mip->mii_next = mii_free;
> -	mii_free = mip;
> -
> -	if ((mip = mii_head) != NULL)
> -		writel(mip->mii_regval, fep->hwp + FEC_MII_DATA);
> -
> -unlock:
> -	spin_unlock(&fep->mii_lock);
> -}
> -
> -static int
> -mii_queue_unlocked(struct net_device *dev, int regval,
> -		void (*func)(uint, struct net_device *))
> +/* ------------------------------------------------------------------------- */
> +#ifdef CONFIG_M5272
> +static void __inline__ fec_get_mac(struct net_device *dev)
>  {
> -	struct fec_enet_private *fep;
> -	mii_list_t	*mip;
> -	int		retval;
> -
> -	/* Add PHY address to register command */
> -	fep = netdev_priv(dev);
> +	struct fec_enet_private *fep = netdev_priv(dev);
> +	unsigned char *iap, tmpaddr[ETH_ALEN];
>  
> -	regval |= fep->phy_addr << 23;
> -	retval = 0;
> -
> -	if ((mip = mii_free) != NULL) {
> -		mii_free = mip->mii_next;
> -		mip->mii_regval = regval;
> -		mip->mii_func = func;
> -		mip->mii_next = NULL;
> -		if (mii_head) {
> -			mii_tail->mii_next = mip;
> -			mii_tail = mip;
> -		} else {
> -			mii_head = mii_tail = mip;
> -			writel(regval, fep->hwp + FEC_MII_DATA);
> -		}
> +	if (FEC_FLASHMAC) {
> +		/*
> +		 * Get MAC address from FLASH.
> +		 * If it is all 1's or 0's, use the default.
> +		 */
> +		iap = (unsigned char *)FEC_FLASHMAC;
> +		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
> +		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
> +			iap = fec_mac_default;
> +		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
> +		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
> +			iap = fec_mac_default;
>  	} else {
> -		retval = 1;
> +		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
> +		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
> +		iap = &tmpaddr[0];
>  	}
>  
> -	return retval;
> -}
> -
> -static int
> -mii_queue(struct net_device *dev, int regval,
> -		void (*func)(uint, struct net_device *))
> -{
> -	struct fec_enet_private *fep;
> -	unsigned long   flags;
> -	int             retval;
> -	fep = netdev_priv(dev);
> -	spin_lock_irqsave(&fep->mii_lock, flags);
> -	retval = mii_queue_unlocked(dev, regval, func);
> -	spin_unlock_irqrestore(&fep->mii_lock, flags);
> -	return retval;
> -}
> -
> -static void mii_do_cmd(struct net_device *dev, const phy_cmd_t *c)
> -{
> -	if(!c)
> -		return;
> +	memcpy(dev->dev_addr, iap, ETH_ALEN);
>  
> -	for (; c->mii_data != mk_mii_end; c++)
> -		mii_queue(dev, c->mii_data, c->funct);
> +	/* Adjust MAC if using default MAC address */
> +	if (iap == fec_mac_default)
> +		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
>  }
> +#endif
>  
> -static void mii_parse_sr(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_STAT_LINK | PHY_STAT_FAULT | PHY_STAT_ANC);
> -
> -	if (mii_reg & 0x0004)
> -		status |= PHY_STAT_LINK;
> -	if (mii_reg & 0x0010)
> -		status |= PHY_STAT_FAULT;
> -	if (mii_reg & 0x0020)
> -		status |= PHY_STAT_ANC;
> -	*s = status;
> -}
> +/* ------------------------------------------------------------------------- */
>  
> -static void mii_parse_cr(uint mii_reg, struct net_device *dev)
> +/*
> + * Phy section
> + */
> +static void fec_enet_adjust_link(struct net_device *dev)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_CONF_ANE | PHY_CONF_LOOP);
> -
> -	if (mii_reg & 0x1000)
> -		status |= PHY_CONF_ANE;
> -	if (mii_reg & 0x4000)
> -		status |= PHY_CONF_LOOP;
> -	*s = status;
> -}
> +	struct phy_device *phy_dev = fep->phy_dev;
> +	unsigned long flags;
>  
> -static void mii_parse_anar(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_CONF_SPMASK);
> -
> -	if (mii_reg & 0x0020)
> -		status |= PHY_CONF_10HDX;
> -	if (mii_reg & 0x0040)
> -		status |= PHY_CONF_10FDX;
> -	if (mii_reg & 0x0080)
> -		status |= PHY_CONF_100HDX;
> -	if (mii_reg & 0x00100)
> -		status |= PHY_CONF_100FDX;
> -	*s = status;
> -}
> +	int status_change = 0;
>  
> -/* ------------------------------------------------------------------------- */
> -/* The Level one LXT970 is used by many boards				     */
> +	spin_lock_irqsave(&fep->hw_lock, flags);
>  
> -#define MII_LXT970_MIRROR    16  /* Mirror register           */
> -#define MII_LXT970_IER       17  /* Interrupt Enable Register */
> -#define MII_LXT970_ISR       18  /* Interrupt Status Register */
> -#define MII_LXT970_CONFIG    19  /* Configuration Register    */
> -#define MII_LXT970_CSR       20  /* Chip Status Register      */
> +	/* Prevent a state halted on mii error */
> +	if (fep->mii_timeout && phy_dev->state == PHY_HALTED) {
> +		phy_dev->state = PHY_RESUMING;
> +		goto spin_unlock;
> +	}
>  
> -static void mii_parse_lxt970_csr(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	/* Duplex link change */
> +	if (phy_dev->link) {
> +		if (fep->full_duplex != phy_dev->duplex) {
> +			fec_restart(dev, phy_dev->duplex);
> +			status_change = 1;
> +		}
> +	}
>  
> -	status = *s & ~(PHY_STAT_SPMASK);
> -	if (mii_reg & 0x0800) {
> -		if (mii_reg & 0x1000)
> -			status |= PHY_STAT_100FDX;
> +	/* Link on or off change */
> +	if (phy_dev->link != fep->link) {
> +		fep->link = phy_dev->link;
> +		if (phy_dev->link)
> +			fec_restart(dev, phy_dev->duplex);
>  		else
> -			status |= PHY_STAT_100HDX;
> -	} else {
> -		if (mii_reg & 0x1000)
> -			status |= PHY_STAT_10FDX;
> -		else
> -			status |= PHY_STAT_10HDX;
> +			fec_stop(dev);
> +		status_change = 1;
>  	}
> -	*s = status;
> -}
> -
> -static phy_cmd_t const phy_cmd_lxt970_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_startup[] = { /* enable interrupts */
> -		{ mk_mii_write(MII_LXT970_IER, 0x0002), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_ack_int[] = {
> -		/* read SR and ISR to acknowledge */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_LXT970_ISR), NULL },
> -
> -		/* find out the current status */
> -		{ mk_mii_read(MII_LXT970_CSR), mii_parse_lxt970_csr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_LXT970_IER, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_lxt970 = {
> -	.id = 0x07810000,
> -	.name = "LXT970",
> -	.config = phy_cmd_lxt970_config,
> -	.startup = phy_cmd_lxt970_startup,
> -	.ack_int = phy_cmd_lxt970_ack_int,
> -	.shutdown = phy_cmd_lxt970_shutdown
> -};
>  
> -/* ------------------------------------------------------------------------- */
> -/* The Level one LXT971 is used on some of my custom boards                  */
> -
> -/* register definitions for the 971 */
> +spin_unlock:
> +	spin_unlock_irqrestore(&fep->hw_lock, flags);
>  
> -#define MII_LXT971_PCR       16  /* Port Control Register     */
> -#define MII_LXT971_SR2       17  /* Status Register 2         */
> -#define MII_LXT971_IER       18  /* Interrupt Enable Register */
> -#define MII_LXT971_ISR       19  /* Interrupt Status Register */
> -#define MII_LXT971_LCR       20  /* LED Control Register      */
> -#define MII_LXT971_TCR       30  /* Transmit Control Register */
> +	if (status_change)
> +		phy_print_status(phy_dev);
> +}
>  
>  /*
> - * I had some nice ideas of running the MDIO faster...
> - * The 971 should support 8MHz and I tried it, but things acted really
> - * weird, so 2.5 MHz ought to be enough for anyone...
> + * NOTE: a MII transaction is during around 25 us, so polling it...
>   */
> -
> -static void mii_parse_lxt971_sr2(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	struct fec_enet_private *fep = bus->priv;
> +	int timeout = FEC_MII_TIMEOUT;
>  
> -	status = *s & ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
> +	fep->mii_timeout = 0;
>  
> -	if (mii_reg & 0x0400) {
> -		fep->link = 1;
> -		status |= PHY_STAT_LINK;
> -	} else {
> -		fep->link = 0;
> -	}
> -	if (mii_reg & 0x0080)
> -		status |= PHY_STAT_ANC;
> -	if (mii_reg & 0x4000) {
> -		if (mii_reg & 0x0200)
> -			status |= PHY_STAT_100FDX;
> -		else
> -			status |= PHY_STAT_100HDX;
> -	} else {
> -		if (mii_reg & 0x0200)
> -			status |= PHY_STAT_10FDX;
> -		else
> -			status |= PHY_STAT_10HDX;
> +	/* clear MII end of transfer bit*/
> +	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
> +
> +	/* start a read op */
> +	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> +		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> +		FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
> +
> +	/* wait for end of transfer */
> +	while (!(readl(fep->hwp + FEC_IEVENT) & FEC_ENET_MII)) {
> +		cpu_relax();
> +		if (timeout-- < 0) {
> +			fep->mii_timeout = 1;
> +			printk(KERN_ERR "FEC: MDIO read timeout\n");
> +			return -ETIMEDOUT;
> +		}
>  	}
> -	if (mii_reg & 0x0008)
> -		status |= PHY_STAT_FAULT;
>  
> -	*s = status;
> +	/* return value */
> +	return FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
>  }
>  
> -static phy_cmd_t const phy_cmd_lxt971_config[] = {
> -		/* limit to 10MBit because my prototype board
> -		 * doesn't work with 100. */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_LXT971_IER, 0x00f2), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_write(MII_LXT971_LCR, 0xd422), NULL }, /* LED config */
> -		/* Somehow does the 971 tell me that the link is down
> -		 * the first read after power-up.
> -		 * read here to get a valid value in ack_int */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_ack_int[] = {
> -		/* acknowledge the int before reading status ! */
> -		{ mk_mii_read(MII_LXT971_ISR), NULL },
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_LXT971_IER, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_lxt971 = {
> -	.id = 0x0001378e,
> -	.name = "LXT971",
> -	.config = phy_cmd_lxt971_config,
> -	.startup = phy_cmd_lxt971_startup,
> -	.ack_int = phy_cmd_lxt971_ack_int,
> -	.shutdown = phy_cmd_lxt971_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* The Quality Semiconductor QS6612 is used on the RPX CLLF                  */
> -
> -/* register definitions */
> -
> -#define MII_QS6612_MCR       17  /* Mode Control Register      */
> -#define MII_QS6612_FTR       27  /* Factory Test Register      */
> -#define MII_QS6612_MCO       28  /* Misc. Control Register     */
> -#define MII_QS6612_ISR       29  /* Interrupt Source Register  */
> -#define MII_QS6612_IMR       30  /* Interrupt Mask Register    */
> -#define MII_QS6612_PCR       31  /* 100BaseTx PHY Control Reg. */
> -
> -static void mii_parse_qs6612_pcr(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> +			   u16 value)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	struct fec_enet_private *fep = bus->priv;
> +	int timeout = FEC_MII_TIMEOUT;
>  
> -	status = *s & ~(PHY_STAT_SPMASK);
> +	fep->mii_timeout = 0;
>  
> -	switch((mii_reg >> 2) & 7) {
> -	case 1: status |= PHY_STAT_10HDX; break;
> -	case 2: status |= PHY_STAT_100HDX; break;
> -	case 5: status |= PHY_STAT_10FDX; break;
> -	case 6: status |= PHY_STAT_100FDX; break;
> -}
> -
> -	*s = status;
> -}
> -
> -static phy_cmd_t const phy_cmd_qs6612_config[] = {
> -		/* The PHY powers up isolated on the RPX,
> -		 * so send a command to allow operation.
> -		 */
> -		{ mk_mii_write(MII_QS6612_PCR, 0x0dc0), NULL },
> -
> -		/* parse cr and anar to get some info */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_QS6612_IMR, 0x003a), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_ack_int[] = {
> -		/* we need to read ISR, SR and ANER to acknowledge */
> -		{ mk_mii_read(MII_QS6612_ISR), NULL },
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_REG_ANER), NULL },
> -
> -		/* read pcr to get info */
> -		{ mk_mii_read(MII_QS6612_PCR), mii_parse_qs6612_pcr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_QS6612_IMR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_qs6612 = {
> -	.id = 0x00181440,
> -	.name = "QS6612",
> -	.config = phy_cmd_qs6612_config,
> -	.startup = phy_cmd_qs6612_startup,
> -	.ack_int = phy_cmd_qs6612_ack_int,
> -	.shutdown = phy_cmd_qs6612_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* AMD AM79C874 phy                                                          */
> +	/* clear MII end of transfer bit*/
> +	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>  
> -/* register definitions for the 874 */
> +	/* start a read op */
> +	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> +		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> +		FEC_MMFR_TA | FEC_MMFR_DATA(value),
> +		fep->hwp + FEC_MII_DATA);
> +
> +	/* wait for end of transfer */
> +	while (!(readl(fep->hwp + FEC_IEVENT) & FEC_ENET_MII)) {
> +		cpu_relax();
> +		if (timeout-- < 0) {
> +			fep->mii_timeout = 1;
> +			printk(KERN_ERR "FEC: MDIO write timeout\n");
> +			return -ETIMEDOUT;
> +		}
> +	}
>  
> -#define MII_AM79C874_MFR       16  /* Miscellaneous Feature Register */
> -#define MII_AM79C874_ICSR      17  /* Interrupt/Status Register      */
> -#define MII_AM79C874_DR        18  /* Diagnostic Register            */
> -#define MII_AM79C874_PMLR      19  /* Power and Loopback Register    */
> -#define MII_AM79C874_MCR       21  /* ModeControl Register           */
> -#define MII_AM79C874_DC        23  /* Disconnect Counter             */
> -#define MII_AM79C874_REC       24  /* Recieve Error Counter          */
> +	return 0;
> +}
>  
> -static void mii_parse_am79c874_dr(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_reset(struct mii_bus *bus)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_STAT_SPMASK | PHY_STAT_ANC);
> -
> -	if (mii_reg & 0x0080)
> -		status |= PHY_STAT_ANC;
> -	if (mii_reg & 0x0400)
> -		status |= ((mii_reg & 0x0800) ? PHY_STAT_100FDX : PHY_STAT_100HDX);
> -	else
> -		status |= ((mii_reg & 0x0800) ? PHY_STAT_10FDX : PHY_STAT_10HDX);
> -
> -	*s = status;
> +	return 0;
>  }
>  
> -static phy_cmd_t const phy_cmd_am79c874_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_AM79C874_ICSR, 0xff00), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_ack_int[] = {
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
> -		/* we only need to read ISR to acknowledge */
> -		{ mk_mii_read(MII_AM79C874_ICSR), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_AM79C874_ICSR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_am79c874 = {
> -	.id = 0x00022561,
> -	.name = "AM79C874",
> -	.config = phy_cmd_am79c874_config,
> -	.startup = phy_cmd_am79c874_startup,
> -	.ack_int = phy_cmd_am79c874_ack_int,
> -	.shutdown = phy_cmd_am79c874_shutdown
> -};
> -
> -
> -/* ------------------------------------------------------------------------- */
> -/* Kendin KS8721BL phy                                                       */
> -
> -/* register definitions for the 8721 */
> -
> -#define MII_KS8721BL_RXERCR	21
> -#define MII_KS8721BL_ICSR	27
> -#define	MII_KS8721BL_PHYCR	31
> -
> -static phy_cmd_t const phy_cmd_ks8721bl_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_KS8721BL_ICSR, 0xff00), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_ack_int[] = {
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		/* we only need to read ISR to acknowledge */
> -		{ mk_mii_read(MII_KS8721BL_ICSR), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_KS8721BL_ICSR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_ks8721bl = {
> -	.id = 0x00022161,
> -	.name = "KS8721BL",
> -	.config = phy_cmd_ks8721bl_config,
> -	.startup = phy_cmd_ks8721bl_startup,
> -	.ack_int = phy_cmd_ks8721bl_ack_int,
> -	.shutdown = phy_cmd_ks8721bl_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* register definitions for the DP83848 */
> -
> -#define MII_DP8384X_PHYSTST    16  /* PHY Status Register */
> -
> -static void mii_parse_dp8384x_sr2(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mii_probe(struct net_device *dev)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -
> -	*s &= ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
> -
> -	/* Link up */
> -	if (mii_reg & 0x0001) {
> -		fep->link = 1;
> -		*s |= PHY_STAT_LINK;
> -	} else
> -		fep->link = 0;
> -	/* Status of link */
> -	if (mii_reg & 0x0010)   /* Autonegotioation complete */
> -		*s |= PHY_STAT_ANC;
> -	if (mii_reg & 0x0002) {   /* 10MBps? */
> -		if (mii_reg & 0x0004)   /* Full Duplex? */
> -			*s |= PHY_STAT_10FDX;
> -		else
> -			*s |= PHY_STAT_10HDX;
> -	} else {                  /* 100 Mbps? */
> -		if (mii_reg & 0x0004)   /* Full Duplex? */
> -			*s |= PHY_STAT_100FDX;
> -		else
> -			*s |= PHY_STAT_100HDX;
> -	}
> -	if (mii_reg & 0x0008)
> -		*s |= PHY_STAT_FAULT;
> -}
> -
> -static phy_info_t phy_info_dp83848= {
> -	0x020005c9,
> -	"DP83848",
> +	struct phy_device *phy_dev = NULL;
> +	int phy_addr;
>  
> -	(const phy_cmd_t []) {  /* config */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_DP8384X_PHYSTST), mii_parse_dp8384x_sr2 },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) {  /* startup - enable interrupts */
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* ack_int - never happens, no interrupt */
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) {  /* shutdown */
> -		{ mk_mii_end, }
> -	},
> -};
> +	/* find the first phy */
> +	for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++) {
> +		if (fep->mii_bus->phy_map[phy_addr]) {
> +			phy_dev = fep->mii_bus->phy_map[phy_addr];
> +			break;
> +		}
> +	}
>  
> -static phy_info_t phy_info_lan8700 = {
> -	0x0007C0C,
> -	"LAN8700",
> -	(const phy_cmd_t []) { /* config */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* startup */
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* act_int */
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* shutdown */
> -		{ mk_mii_end, }
> -	},
> -};
> -/* ------------------------------------------------------------------------- */
> +	if (!phy_dev) {
> +		printk(KERN_ERR "%s: no PHY found\n", dev->name);
> +		return -ENODEV;
> +	}
>  
> -static phy_info_t const * const phy_info[] = {
> -	&phy_info_lxt970,
> -	&phy_info_lxt971,
> -	&phy_info_qs6612,
> -	&phy_info_am79c874,
> -	&phy_info_ks8721bl,
> -	&phy_info_dp83848,
> -	&phy_info_lan8700,
> -	NULL
> -};
> +	/* attach the mac to the phy */
> +	phy_dev = phy_connect(dev, dev_name(&phy_dev->dev),
> +			     &fec_enet_adjust_link, 0,
> +			     PHY_INTERFACE_MODE_MII);
> +	if (IS_ERR(phy_dev)) {
> +		printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
> +		return PTR_ERR(phy_dev);
> +	}
>  
> -/* ------------------------------------------------------------------------- */
> -#ifdef HAVE_mii_link_interrupt
> -static irqreturn_t
> -mii_link_interrupt(int irq, void * dev_id);
> +	/* mask with MAC supported features */
> +	phy_dev->supported &= PHY_BASIC_FEATURES;
> +	phy_dev->advertising = phy_dev->supported;
>  
> -/*
> - *	This is specific to the MII interrupt setup of the M5272EVB.
> - */
> -static void __inline__ fec_request_mii_intr(struct net_device *dev)
> -{
> -	if (request_irq(66, mii_link_interrupt, IRQF_DISABLED, "fec(MII)", dev) != 0)
> -		printk("FEC: Could not allocate fec(MII) IRQ(66)!\n");
> -}
> +	fep->phy_dev = phy_dev;
> +	fep->link = 0;
> +	fep->full_duplex = 0;
>  
> -static void __inline__ fec_disable_phy_intr(struct net_device *dev)
> -{
> -	free_irq(66, dev);
> +	return 0;
>  }
> -#endif
>  
> -#ifdef CONFIG_M5272
> -static void __inline__ fec_get_mac(struct net_device *dev)
> +static int fec_enet_mii_init(struct platform_device *pdev)
>  {
> +	struct net_device *dev = platform_get_drvdata(pdev);
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	unsigned char *iap, tmpaddr[ETH_ALEN];
> +	int err = -ENXIO, i;
>  
> -	if (FEC_FLASHMAC) {
> -		/*
> -		 * Get MAC address from FLASH.
> -		 * If it is all 1's or 0's, use the default.
> -		 */
> -		iap = (unsigned char *)FEC_FLASHMAC;
> -		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
> -		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
> -			iap = fec_mac_default;
> -		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
> -		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
> -			iap = fec_mac_default;
> -	} else {
> -		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
> -		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
> -		iap = &tmpaddr[0];
> -	}
> -
> -	memcpy(dev->dev_addr, iap, ETH_ALEN);
> -
> -	/* Adjust MAC if using default MAC address */
> -	if (iap == fec_mac_default)
> -		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
> -}
> -#endif
> +	fep->mii_timeout = 0;
>  
> -/* ------------------------------------------------------------------------- */
> -
> -static void mii_display_status(struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> +	/*
> +	 * Set MII speed to 2.5 MHz (= clk_get_rate() / 2 * phy_speed)
> +	 */
> +	fep->phy_speed = DIV_ROUND_UP(clk_get_rate(fep->clk), 5000000) << 1;
> +	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
>  
> -	if (!fep->link && !fep->old_link) {
> -		/* Link is still down - don't print anything */
> -		return;
> +	fep->mii_bus = mdiobus_alloc();
> +	if (fep->mii_bus == NULL) {
> +		err = -ENOMEM;
> +		goto err_out;
>  	}
>  
> -	printk("%s: status: ", dev->name);
> -
> -	if (!fep->link) {
> -		printk("link down");
> -	} else {
> -		printk("link up");
> -
> -		switch(*s & PHY_STAT_SPMASK) {
> -		case PHY_STAT_100FDX: printk(", 100MBit Full Duplex"); break;
> -		case PHY_STAT_100HDX: printk(", 100MBit Half Duplex"); break;
> -		case PHY_STAT_10FDX: printk(", 10MBit Full Duplex"); break;
> -		case PHY_STAT_10HDX: printk(", 10MBit Half Duplex"); break;
> -		default:
> -			printk(", Unknown speed/duplex");
> -		}
> -
> -		if (*s & PHY_STAT_ANC)
> -			printk(", auto-negotiation complete");
> +	fep->mii_bus->name = "fec_enet_mii_bus";
> +	fep->mii_bus->read = fec_enet_mdio_read;
> +	fep->mii_bus->write = fec_enet_mdio_write;
> +	fep->mii_bus->reset = fec_enet_mdio_reset;
> +	snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id);
> +	fep->mii_bus->priv = fep;
> +	fep->mii_bus->parent = &pdev->dev;
> +
> +	fep->mii_bus->irq = kmalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
> +	if (!fep->mii_bus->irq) {
> +		err = -ENOMEM;
> +		goto err_out_free_mdiobus;
>  	}
>  
> -	if (*s & PHY_STAT_FAULT)
> -		printk(", remote fault");
> -
> -	printk(".\n");
> -}
> -
> -static void mii_display_config(struct work_struct *work)
> -{
> -	struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
> -	struct net_device *dev = fep->netdev;
> -	uint status = fep->phy_status;
> +	for (i = 0; i < PHY_MAX_ADDR; i++)
> +		fep->mii_bus->irq[i] = PHY_POLL;
>  
> -	/*
> -	** When we get here, phy_task is already removed from
> -	** the workqueue.  It is thus safe to allow to reuse it.
> -	*/
> -	fep->mii_phy_task_queued = 0;
> -	printk("%s: config: auto-negotiation ", dev->name);
> -
> -	if (status & PHY_CONF_ANE)
> -		printk("on");
> -	else
> -		printk("off");
> +	platform_set_drvdata(dev, fep->mii_bus);
>  
> -	if (status & PHY_CONF_100FDX)
> -		printk(", 100FDX");
> -	if (status & PHY_CONF_100HDX)
> -		printk(", 100HDX");
> -	if (status & PHY_CONF_10FDX)
> -		printk(", 10FDX");
> -	if (status & PHY_CONF_10HDX)
> -		printk(", 10HDX");
> -	if (!(status & PHY_CONF_SPMASK))
> -		printk(", No speed/duplex selected?");
> +	if (mdiobus_register(fep->mii_bus))
> +		goto err_out_free_mdio_irq;
>  
> -	if (status & PHY_CONF_LOOP)
> -		printk(", loopback enabled");
> +	if (fec_enet_mii_probe(dev) != 0)
> +		goto err_out_unregister_bus;
>  
> -	printk(".\n");
> +	return 0;
>  
> -	fep->sequence_done = 1;
> +err_out_unregister_bus:
> +	mdiobus_unregister(fep->mii_bus);
> +err_out_free_mdio_irq:
> +	kfree(fep->mii_bus->irq);
> +err_out_free_mdiobus:
> +	mdiobus_free(fep->mii_bus);
> +err_out:
> +	return err;
>  }
>  
> -static void mii_relink(struct work_struct *work)
> +static void fec_enet_mii_remove(struct fec_enet_private *fep)
>  {
> -	struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
> -	struct net_device *dev = fep->netdev;
> -	int duplex;
> -
> -	/*
> -	** When we get here, phy_task is already removed from
> -	** the workqueue.  It is thus safe to allow to reuse it.
> -	*/
> -	fep->mii_phy_task_queued = 0;
> -	fep->link = (fep->phy_status & PHY_STAT_LINK) ? 1 : 0;
> -	mii_display_status(dev);
> -	fep->old_link = fep->link;
> -
> -	if (fep->link) {
> -		duplex = 0;
> -		if (fep->phy_status
> -		    & (PHY_STAT_100FDX | PHY_STAT_10FDX))
> -			duplex = 1;
> -		fec_restart(dev, duplex);
> -	} else
> -		fec_stop(dev);
> +	if (fep->phy_dev)
> +		phy_disconnect(fep->phy_dev);
> +	mdiobus_unregister(fep->mii_bus);
> +	kfree(fep->mii_bus->irq);
> +	mdiobus_free(fep->mii_bus);
>  }
>  
> -/* mii_queue_relink is called in interrupt context from mii_link_interrupt */
> -static void mii_queue_relink(uint mii_reg, struct net_device *dev)
> +static int fec_enet_get_settings(struct net_device *dev,
> +				  struct ethtool_cmd *cmd)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
>  
> -	/*
> -	 * We cannot queue phy_task twice in the workqueue.  It
> -	 * would cause an endless loop in the workqueue.
> -	 * Fortunately, if the last mii_relink entry has not yet been
> -	 * executed now, it will do the job for the current interrupt,
> -	 * which is just what we want.
> -	 */
> -	if (fep->mii_phy_task_queued)
> -		return;
> +	if (!phydev)
> +		return -ENODEV;
>  
> -	fep->mii_phy_task_queued = 1;
> -	INIT_WORK(&fep->phy_task, mii_relink);
> -	schedule_work(&fep->phy_task);
> +	return phy_ethtool_gset(phydev, cmd);
>  }
>  
> -/* mii_queue_config is called in interrupt context from fec_enet_mii */
> -static void mii_queue_config(uint mii_reg, struct net_device *dev)
> +static int fec_enet_set_settings(struct net_device *dev,
> +				 struct ethtool_cmd *cmd)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
>  
> -	if (fep->mii_phy_task_queued)
> -		return;
> +	if (!phydev)
> +		return -ENODEV;
>  
> -	fep->mii_phy_task_queued = 1;
> -	INIT_WORK(&fep->phy_task, mii_display_config);
> -	schedule_work(&fep->phy_task);
> +	return phy_ethtool_sset(phydev, cmd);
>  }
>  
> -phy_cmd_t const phy_cmd_relink[] = {
> -	{ mk_mii_read(MII_REG_CR), mii_queue_relink },
> -	{ mk_mii_end, }
> -	};
> -phy_cmd_t const phy_cmd_config[] = {
> -	{ mk_mii_read(MII_REG_CR), mii_queue_config },
> -	{ mk_mii_end, }
> -	};
> -
> -/* Read remainder of PHY ID. */
> -static void
> -mii_discover_phy3(uint mii_reg, struct net_device *dev)
> +static void fec_enet_get_drvinfo(struct net_device *dev,
> +				 struct ethtool_drvinfo *info)
>  {
> -	struct fec_enet_private *fep;
> -	int i;
> -
> -	fep = netdev_priv(dev);
> -	fep->phy_id |= (mii_reg & 0xffff);
> -	printk("fec: PHY @ 0x%x, ID 0x%08x", fep->phy_addr, fep->phy_id);
> -
> -	for(i = 0; phy_info[i]; i++) {
> -		if(phy_info[i]->id == (fep->phy_id >> 4))
> -			break;
> -	}
> -
> -	if (phy_info[i])
> -		printk(" -- %s\n", phy_info[i]->name);
> -	else
> -		printk(" -- unknown PHY!\n");
> +	struct fec_enet_private *fep = netdev_priv(dev);
>  
> -	fep->phy = phy_info[i];
> -	fep->phy_id_done = 1;
> +	strcpy(info->driver, fep->pdev->dev.driver->name);
> +	strcpy(info->version, "Revision: 1.0");
> +	strcpy(info->bus_info, dev_name(&dev->dev));
>  }
>  
> -/* Scan all of the MII PHY addresses looking for someone to respond
> - * with a valid ID.  This usually happens quickly.
> - */
> -static void
> -mii_discover_phy(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep;
> -	uint phytype;
> -
> -	fep = netdev_priv(dev);
> -
> -	if (fep->phy_addr < 32) {
> -		if ((phytype = (mii_reg & 0xffff)) != 0xffff && phytype != 0) {
> -
> -			/* Got first part of ID, now get remainder */
> -			fep->phy_id = phytype << 16;
> -			mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR2),
> -							mii_discover_phy3);
> -		} else {
> -			fep->phy_addr++;
> -			mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR1),
> -							mii_discover_phy);
> -		}
> -	} else {
> -		printk("FEC: No PHY device found.\n");
> -		/* Disable external MII interface */
> -		writel(0, fep->hwp + FEC_MII_SPEED);
> -		fep->phy_speed = 0;
> -#ifdef HAVE_mii_link_interrupt
> -		fec_disable_phy_intr(dev);
> -#endif
> -	}
> -}
> +static struct ethtool_ops fec_enet_ethtool_ops = {
> +	.get_settings		= fec_enet_get_settings,
> +	.set_settings		= fec_enet_set_settings,
> +	.get_drvinfo		= fec_enet_get_drvinfo,
> +	.get_link		= ethtool_op_get_link,
> +};
>  
> -/* This interrupt occurs when the PHY detects a link change */
> -#ifdef HAVE_mii_link_interrupt
> -static irqreturn_t
> -mii_link_interrupt(int irq, void * dev_id)
> +static int fec_enet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
>  {
> -	struct	net_device *dev = dev_id;
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
> +
> +	if (!netif_running(dev))
> +		return -EINVAL;
>  
> -	mii_do_cmd(dev, fep->phy->ack_int);
> -	mii_do_cmd(dev, phy_cmd_relink);  /* restart and display status */
> +	if (!phydev)
> +		return -ENODEV;
>  
> -	return IRQ_HANDLED;
> +	return phy_mii_ioctl(phydev, if_mii(rq), cmd);
>  }
> -#endif
>  
>  static void fec_enet_free_buffers(struct net_device *dev)
>  {
> @@ -1509,35 +915,8 @@ fec_enet_open(struct net_device *dev)
>  	if (ret)
>  		return ret;
>  
> -	fep->sequence_done = 0;
> -	fep->link = 0;
> -
> -	fec_restart(dev, 1);
> -
> -	if (fep->phy) {
> -		mii_do_cmd(dev, fep->phy->ack_int);
> -		mii_do_cmd(dev, fep->phy->config);
> -		mii_do_cmd(dev, phy_cmd_config);  /* display configuration */
> -
> -		/* Poll until the PHY tells us its configuration
> -		 * (not link state).
> -		 * Request is initiated by mii_do_cmd above, but answer
> -		 * comes by interrupt.
> -		 * This should take about 25 usec per register at 2.5 MHz,
> -		 * and we read approximately 5 registers.
> -		 */
> -		while(!fep->sequence_done)
> -			schedule();
> -
> -		mii_do_cmd(dev, fep->phy->startup);
> -	}
> -
> -	/* Set the initial link state to true. A lot of hardware
> -	 * based on this device does not implement a PHY interrupt,
> -	 * so we are never notified of link change.
> -	 */
> -	fep->link = 1;
> -
> +	/* schedule a link state check */
> +	phy_start(fep->phy_dev);
>  	netif_start_queue(dev);
>  	fep->opened = 1;
>  	return 0;
> @@ -1550,6 +929,7 @@ fec_enet_close(struct net_device *dev)
>  
>  	/* Don't know what to do yet. */
>  	fep->opened = 0;
> +	phy_stop(fep->phy_dev);
>  	netif_stop_queue(dev);
>  	fec_stop(dev);
>  
> @@ -1666,6 +1046,7 @@ static const struct net_device_ops fec_netdev_ops = {
>  	.ndo_validate_addr	= eth_validate_addr,
>  	.ndo_tx_timeout		= fec_timeout,
>  	.ndo_set_mac_address	= fec_set_mac_address,
> +	.ndo_do_ioctl           = fec_enet_ioctl,
>  };
>  
>   /*
> @@ -1689,7 +1070,6 @@ static int fec_enet_init(struct net_device *dev, int index)
>  	}
>  
>  	spin_lock_init(&fep->hw_lock);
> -	spin_lock_init(&fep->mii_lock);
>  
>  	fep->index = index;
>  	fep->hwp = (void __iomem *)dev->base_addr;
> @@ -1716,20 +1096,10 @@ static int fec_enet_init(struct net_device *dev, int index)
>  	fep->rx_bd_base = cbd_base;
>  	fep->tx_bd_base = cbd_base + RX_RING_SIZE;
>  
> -#ifdef HAVE_mii_link_interrupt
> -	fec_request_mii_intr(dev);
> -#endif
>  	/* The FEC Ethernet specific entries in the device structure */
>  	dev->watchdog_timeo = TX_TIMEOUT;
>  	dev->netdev_ops = &fec_netdev_ops;
> -
> -	for (i=0; i<NMII-1; i++)
> -		mii_cmds[i].mii_next = &mii_cmds[i+1];
> -	mii_free = mii_cmds;
> -
> -	/* Set MII speed to 2.5 MHz */
> -	fep->phy_speed = ((((clk_get_rate(fep->clk) / 2 + 4999999)
> -					/ 2500000) / 2) & 0x3F) << 1;
> +	dev->ethtool_ops = &fec_enet_ethtool_ops;
>  
>  	/* Initialize the receive buffer descriptors. */
>  	bdp = fep->rx_bd_base;
> @@ -1760,13 +1130,6 @@ static int fec_enet_init(struct net_device *dev, int index)
>  
>  	fec_restart(dev, 0);
>  
> -	/* Queue up command to detect the PHY and initialize the
> -	 * remainder of the interface.
> -	 */
> -	fep->phy_id_done = 0;
> -	fep->phy_addr = 0;
> -	mii_queue(dev, mk_mii_read(MII_REG_PHYIR1), mii_discover_phy);
> -
>  	return 0;
>  }
>  
> @@ -1835,8 +1198,7 @@ fec_restart(struct net_device *dev, int duplex)
>  	writel(0, fep->hwp + FEC_R_DES_ACTIVE);
>  
>  	/* Enable interrupts we wish to service */
> -	writel(FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII,
> -			fep->hwp + FEC_IMASK);
> +	writel(FEC_ENET_TXF | FEC_ENET_RXF, fep->hwp + FEC_IMASK);
>  }
>  
>  static void
> @@ -1859,7 +1221,6 @@ fec_stop(struct net_device *dev)
>  	/* Clear outstanding MII command interrupts. */
>  	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>  
> -	writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
>  	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
>  }
>  
> @@ -1891,6 +1252,7 @@ fec_probe(struct platform_device *pdev)
>  	memset(fep, 0, sizeof(*fep));
>  
>  	ndev->base_addr = (unsigned long)ioremap(r->start, resource_size(r));
> +	fep->pdev = pdev;
>  
>  	if (!ndev->base_addr) {
>  		ret = -ENOMEM;
> @@ -1926,13 +1288,24 @@ fec_probe(struct platform_device *pdev)
>  	if (ret)
>  		goto failed_init;
>  
> +	ret = fec_enet_mii_init(pdev);
> +	if (ret)
> +		goto failed_mii_init;
> +
>  	ret = register_netdev(ndev);
>  	if (ret)
>  		goto failed_register;
>  
> +	printk(KERN_INFO "%s: Freescale FEC PHY driver [%s] "
> +		"(mii_bus:phy_addr=%s, irq=%d)\n", ndev->name,
> +		fep->phy_dev->drv->name, dev_name(&fep->phy_dev->dev),
> +		fep->phy_dev->irq);
> +
>  	return 0;
>  
>  failed_register:
> +	fec_enet_mii_remove(fep);
> +failed_mii_init:
>  failed_init:
>  	clk_disable(fep->clk);
>  	clk_put(fep->clk);
> @@ -1959,6 +1332,7 @@ fec_drv_remove(struct platform_device *pdev)
>  	platform_set_drvdata(pdev, NULL);
>  
>  	fec_stop(ndev);
> +	fec_enet_mii_remove(fep);
>  	clk_disable(fep->clk);
>  	clk_put(fep->clk);
>  	iounmap((void __iomem *)ndev->base_addr);
> -- 
> 1.7.0.1
> 

-- 
----------------------------------------------------------------------
Amit Kucheria, Kernel Engineer || amit.kucheria@canonical.com
----------------------------------------------------------------------

^ permalink raw reply

* Re: [r8169] WARNING: at net/sched/sch_generic.c
From: Sergey Senozhatsky @ 2010-03-31 12:14 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, Francois Romieu, Eric Dumazet, David S. Miller,
	linux-kernel
In-Reply-To: <20100331111942.GB13963@hmsreliant.think-freely.org>

[-- Attachment #1: Type: text/plain, Size: 3536 bytes --]

Hello,

On (03/31/10 07:19), Neil Horman wrote:
> On Wed, Mar 31, 2010 at 01:21:42PM +0300, Sergey Senozhatsky wrote:
> > Hello,
> > I have the following problem:
> > 
> > [  296.337510] ------------[ cut here ]------------
> > [  296.337523] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xc1/0x125()
> > [  296.337527] Hardware name: F3JC                
> > [  296.337530] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> > [  296.337533] Modules linked in: pktgen ipv6 snd_hwdep snd_hda_codec_si3054 snd_hda_codec_realtek sdhci_pci sdhci asus_laptop sparse_keymap mmc_core led_class snd_hda_intel
> > snd_hda_codec psmouse snd_pcm snd_timer snd soundcore snd_page_alloc serio_raw i2c_i801 rng_core evdev sg r8169 mii usbhid hid uhci_hcd ehci_hcd sr_mod cdrom sd_mod usbcore
> > ata_piix
> > [  296.337586] Pid: 0, comm: swapper Not tainted 2.6.34-rc3-dbg #74
> > [  296.337589] Call Trace:
> > [  296.337597]  [<c102e71f>] warn_slowpath_common+0x65/0x7c
> > [  296.337603]  [<c126e30c>] ? dev_watchdog+0xc1/0x125
> > [  296.337608]  [<c102e76a>] warn_slowpath_fmt+0x24/0x27
> > [  296.337613]  [<c126e30c>] dev_watchdog+0xc1/0x125
> > [  296.337620]  [<c1040039>] ? prepare_to_wait_exclusive+0x52/0x5b
> > [  296.337627]  [<c1037053>] ? run_timer_softirq+0x120/0x1eb
> > [  296.337632]  [<c10370a9>] run_timer_softirq+0x176/0x1eb
> > [  296.337637]  [<c1037053>] ? run_timer_softirq+0x120/0x1eb
> > [  296.337643]  [<c126e24b>] ? dev_watchdog+0x0/0x125
> > [  296.337650]  [<c10331c9>] __do_softirq+0x8d/0x117
> > [  296.337655]  [<c103327e>] do_softirq+0x2b/0x43
> > [  296.337660]  [<c10333a3>] irq_exit+0x38/0x75
> > [  296.337667]  [<c1015138>] smp_apic_timer_interrupt+0x6d/0x7b
> > [  296.337673]  [<c12cbada>] apic_timer_interrupt+0x36/0x3c
> > [  296.337679]  [<c104007b>] ? prepare_to_wait+0x39/0x57
> > [  296.337685]  [<c11dd835>] ? acpi_idle_enter_simple+0x119/0x144
> > [  296.337692]  [<c124d358>] cpuidle_idle_call+0x6d/0xa5
> > [  296.337697]  [<c1001b51>] cpu_idle+0x92/0xc1
> > [  296.337704]  [<c12c63d0>] start_secondary+0x1f3/0x1fa
> > [  296.337708] ---[ end trace cd4a1b50139837df ]---
> > 
> > 
> > Reproducing 100% with pktgen tests.
> > 
> What kind of packets are you sending with pktgen?


Here is the sh file to run pktgen.

=====
#!/bin/sh

function pgset() {
    local result

    echo $1 > $PGDEV

    result=`cat $PGDEV | fgrep "Result: OK:"`
    if [ "$result" = "" ]; then
         cat $PGDEV | fgrep Result:
    fi
}

function pg() {
    echo inject > $PGDEV
    cat $PGDEV
}

# Config Start Here -----------------------------------------------------------


# thread config
# Each CPU has own thread. Two CPU exammple. We add eth1, eth2 respectivly.

PGDEV=/proc/net/pktgen/kpktgend_0
  echo "Removing all devices"
 pgset "rem_device_all" 
  echo "Adding eth0"
 pgset "add_device eth0" 
  echo "Setting max_before_softirq 10000"
 pgset "max_before_softirq 10000"


# device config
# delay 0 means maximum speed.

CLONE_SKB="clone_skb 500"
# NIC adds 4 bytes CRC
PKT_SIZE="pkt_size 2048"

# COUNT 0 means forever
#COUNT="count 0"
COUNT="count 110110"
DELAY="delay 3"

PGDEV=/proc/net/pktgen/eth0
  echo "Configuring $PGDEV"
 pgset "$COUNT"
 pgset "$CLONE_SKB"
 pgset "$PKT_SIZE"
 pgset "$DELAY"
 pgset "dst ???????"
 pgset "dst_mac ????????"


# Time to run
PGDEV=/proc/net/pktgen/pgctrl

 echo "Running... ctrl^C to stop"
 pgset "start" 
 echo "Done"

====


	Sergey

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply

* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Bryan Wu @ 2010-03-31 12:11 UTC (permalink / raw)
  To: Wolfram Sang, Sascha Hauer
  Cc: netdev, linux-kernel, kernel-team, gerg, linux-arm-kernel
In-Reply-To: <20100331081004.GB23391@pengutronix.de>

On 03/31/2010 04:10 PM, Wolfram Sang wrote:
>> With rounding this would be:
>>
>> (clk_get_rate() + 4999999) / 5000000
>
> Better use DIV_ROUND_UP(clk_get_rate(), 5000000)?
>
> Regards,
>
>     Wolfram
>

Sascha and Wolfram,

Thanks a lot. I updated the patch to v2 and sent it already.

-- 
Bryan Wu <bryan.wu@canonical.com>
Kernel Developer    +86.138-1617-6545 Mobile
Ubuntu Kernel Team | Hardware Enablement Team
Canonical Ltd.      www.canonical.com
Ubuntu - Linux for human beings | www.ubuntu.com

^ permalink raw reply

* [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection (v2)
From: Bryan Wu @ 2010-03-31 12:10 UTC (permalink / raw)
  To: s.hauer, gerg, amit.kucheria, w.sang
  Cc: netdev, kernel-team, linux-kernel, linux-arm-kernel

BugLink: http://bugs.launchpad.net/bugs/457878

v2:
 - remove duplicated phy_speed caculation
 - fix the phy_speed caculation according to the DataSheet

v1:
 - removed old MII phy control code
 - add phylib supporting
 - add ethtool interface to make user space NetworkManager works

Tested on Freescale i.MX51 Babbage board.

This patch is based on a patch from Frederic Rodo <fred.rodo@gmail.com>

Cc: Frederic Rodo <fred.rodo@gmail.com>
Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
---
 drivers/net/Kconfig |    1 +
 drivers/net/fec.c   | 1128 ++++++++++++---------------------------------------
 2 files changed, 252 insertions(+), 877 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0ba5b8e..41f6a70 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1916,6 +1916,7 @@ config FEC
 	bool "FEC ethernet controller (of ColdFire and some i.MX CPUs)"
 	depends on M523x || M527x || M5272 || M528x || M520x || M532x || \
 		MACH_MX27 || ARCH_MX35 || ARCH_MX25 || ARCH_MX5
+	select PHYLIB
 	help
 	  Say Y here if you want to use the built-in 10/100 Fast ethernet
 	  controller on some Motorola ColdFire and Freescale i.MX processors.
diff --git a/drivers/net/fec.c b/drivers/net/fec.c
index 9f98c1c..848eb19 100644
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -40,6 +40,7 @@
 #include <linux/irq.h>
 #include <linux/clk.h>
 #include <linux/platform_device.h>
+#include <linux/phy.h>
 
 #include <asm/cacheflush.h>
 
@@ -61,7 +62,6 @@
  * Define the fixed address of the FEC hardware.
  */
 #if defined(CONFIG_M5272)
-#define HAVE_mii_link_interrupt
 
 static unsigned char	fec_mac_default[] = {
 	0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
@@ -86,23 +86,6 @@ static unsigned char	fec_mac_default[] = {
 #endif
 #endif /* CONFIG_M5272 */
 
-/* Forward declarations of some structures to support different PHYs */
-
-typedef struct {
-	uint mii_data;
-	void (*funct)(uint mii_reg, struct net_device *dev);
-} phy_cmd_t;
-
-typedef struct {
-	uint id;
-	char *name;
-
-	const phy_cmd_t *config;
-	const phy_cmd_t *startup;
-	const phy_cmd_t *ack_int;
-	const phy_cmd_t *shutdown;
-} phy_info_t;
-
 /* The number of Tx and Rx buffers.  These are allocated from the page
  * pool.  The code may assume these are power of two, so it it best
  * to keep them that size.
@@ -189,29 +172,21 @@ struct fec_enet_private {
 	uint	tx_full;
 	/* hold while accessing the HW like ringbuffer for tx/rx but not MAC */
 	spinlock_t hw_lock;
-	/* hold while accessing the mii_list_t() elements */
-	spinlock_t mii_lock;
-
-	uint	phy_id;
-	uint	phy_id_done;
-	uint	phy_status;
-	uint	phy_speed;
-	phy_info_t const	*phy;
-	struct work_struct phy_task;
 
-	uint	sequence_done;
-	uint	mii_phy_task_queued;
+	struct  platform_device *pdev;
 
-	uint	phy_addr;
+	int	opened;
 
+	/* Phylib and MDIO interface */
+	struct  mii_bus *mii_bus;
+	struct  phy_device *phy_dev;
+	int     mii_timeout;
+	uint    phy_speed;
 	int	index;
-	int	opened;
 	int	link;
-	int	old_link;
 	int	full_duplex;
 };
 
-static void fec_enet_mii(struct net_device *dev);
 static irqreturn_t fec_enet_interrupt(int irq, void * dev_id);
 static void fec_enet_tx(struct net_device *dev);
 static void fec_enet_rx(struct net_device *dev);
@@ -219,67 +194,20 @@ static int fec_enet_close(struct net_device *dev);
 static void fec_restart(struct net_device *dev, int duplex);
 static void fec_stop(struct net_device *dev);
 
+/* FEC MII MMFR bits definition */
+#define FEC_MMFR_ST		(1 << 30)
+#define FEC_MMFR_OP_READ	(2 << 28)
+#define FEC_MMFR_OP_WRITE	(1 << 28)
+#define FEC_MMFR_PA(v)		((v & 0x1f) << 23)
+#define FEC_MMFR_RA(v)		((v & 0x1f) << 18)
+#define FEC_MMFR_TA		(2 << 16)
+#define FEC_MMFR_DATA(v)	(v & 0xffff)
 
-/* MII processing.  We keep this as simple as possible.  Requests are
- * placed on the list (if there is room).  When the request is finished
- * by the MII, an optional function may be called.
- */
-typedef struct mii_list {
-	uint	mii_regval;
-	void	(*mii_func)(uint val, struct net_device *dev);
-	struct	mii_list *mii_next;
-} mii_list_t;
-
-#define		NMII	20
-static mii_list_t	mii_cmds[NMII];
-static mii_list_t	*mii_free;
-static mii_list_t	*mii_head;
-static mii_list_t	*mii_tail;
-
-static int	mii_queue(struct net_device *dev, int request,
-				void (*func)(uint, struct net_device *));
-
-/* Make MII read/write commands for the FEC */
-#define mk_mii_read(REG)	(0x60020000 | ((REG & 0x1f) << 18))
-#define mk_mii_write(REG, VAL)	(0x50020000 | ((REG & 0x1f) << 18) | \
-						(VAL & 0xffff))
-#define mk_mii_end	0
+#define FEC_MII_TIMEOUT		10000
 
 /* Transmitter timeout */
 #define TX_TIMEOUT (2 * HZ)
 
-/* Register definitions for the PHY */
-
-#define MII_REG_CR          0  /* Control Register                         */
-#define MII_REG_SR          1  /* Status Register                          */
-#define MII_REG_PHYIR1      2  /* PHY Identification Register 1            */
-#define MII_REG_PHYIR2      3  /* PHY Identification Register 2            */
-#define MII_REG_ANAR        4  /* A-N Advertisement Register               */
-#define MII_REG_ANLPAR      5  /* A-N Link Partner Ability Register        */
-#define MII_REG_ANER        6  /* A-N Expansion Register                   */
-#define MII_REG_ANNPTR      7  /* A-N Next Page Transmit Register          */
-#define MII_REG_ANLPRNPR    8  /* A-N Link Partner Received Next Page Reg. */
-
-/* values for phy_status */
-
-#define PHY_CONF_ANE	0x0001  /* 1 auto-negotiation enabled */
-#define PHY_CONF_LOOP	0x0002  /* 1 loopback mode enabled */
-#define PHY_CONF_SPMASK	0x00f0  /* mask for speed */
-#define PHY_CONF_10HDX	0x0010  /* 10 Mbit half duplex supported */
-#define PHY_CONF_10FDX	0x0020  /* 10 Mbit full duplex supported */
-#define PHY_CONF_100HDX	0x0040  /* 100 Mbit half duplex supported */
-#define PHY_CONF_100FDX	0x0080  /* 100 Mbit full duplex supported */
-
-#define PHY_STAT_LINK	0x0100  /* 1 up - 0 down */
-#define PHY_STAT_FAULT	0x0200  /* 1 remote fault */
-#define PHY_STAT_ANC	0x0400  /* 1 auto-negotiation complete	*/
-#define PHY_STAT_SPMASK	0xf000  /* mask for speed */
-#define PHY_STAT_10HDX	0x1000  /* 10 Mbit half duplex selected	*/
-#define PHY_STAT_10FDX	0x2000  /* 10 Mbit full duplex selected	*/
-#define PHY_STAT_100HDX	0x4000  /* 100 Mbit half duplex selected */
-#define PHY_STAT_100FDX	0x8000  /* 100 Mbit full duplex selected */
-
-
 static int
 fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -406,12 +334,6 @@ fec_enet_interrupt(int irq, void * dev_id)
 			ret = IRQ_HANDLED;
 			fec_enet_tx(dev);
 		}
-
-		if (int_events & FEC_ENET_MII) {
-			ret = IRQ_HANDLED;
-			fec_enet_mii(dev);
-		}
-
 	} while (int_events);
 
 	return ret;
@@ -607,827 +529,311 @@ rx_processing_done:
 	spin_unlock(&fep->hw_lock);
 }
 
-/* called from interrupt context */
-static void
-fec_enet_mii(struct net_device *dev)
-{
-	struct	fec_enet_private *fep;
-	mii_list_t	*mip;
-
-	fep = netdev_priv(dev);
-	spin_lock(&fep->mii_lock);
-
-	if ((mip = mii_head) == NULL) {
-		printk("MII and no head!\n");
-		goto unlock;
-	}
-
-	if (mip->mii_func != NULL)
-		(*(mip->mii_func))(readl(fep->hwp + FEC_MII_DATA), dev);
-
-	mii_head = mip->mii_next;
-	mip->mii_next = mii_free;
-	mii_free = mip;
-
-	if ((mip = mii_head) != NULL)
-		writel(mip->mii_regval, fep->hwp + FEC_MII_DATA);
-
-unlock:
-	spin_unlock(&fep->mii_lock);
-}
-
-static int
-mii_queue_unlocked(struct net_device *dev, int regval,
-		void (*func)(uint, struct net_device *))
+/* ------------------------------------------------------------------------- */
+#ifdef CONFIG_M5272
+static void __inline__ fec_get_mac(struct net_device *dev)
 {
-	struct fec_enet_private *fep;
-	mii_list_t	*mip;
-	int		retval;
-
-	/* Add PHY address to register command */
-	fep = netdev_priv(dev);
+	struct fec_enet_private *fep = netdev_priv(dev);
+	unsigned char *iap, tmpaddr[ETH_ALEN];
 
-	regval |= fep->phy_addr << 23;
-	retval = 0;
-
-	if ((mip = mii_free) != NULL) {
-		mii_free = mip->mii_next;
-		mip->mii_regval = regval;
-		mip->mii_func = func;
-		mip->mii_next = NULL;
-		if (mii_head) {
-			mii_tail->mii_next = mip;
-			mii_tail = mip;
-		} else {
-			mii_head = mii_tail = mip;
-			writel(regval, fep->hwp + FEC_MII_DATA);
-		}
+	if (FEC_FLASHMAC) {
+		/*
+		 * Get MAC address from FLASH.
+		 * If it is all 1's or 0's, use the default.
+		 */
+		iap = (unsigned char *)FEC_FLASHMAC;
+		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
+		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
+			iap = fec_mac_default;
+		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
+		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
+			iap = fec_mac_default;
 	} else {
-		retval = 1;
+		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
+		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
+		iap = &tmpaddr[0];
 	}
 
-	return retval;
-}
-
-static int
-mii_queue(struct net_device *dev, int regval,
-		void (*func)(uint, struct net_device *))
-{
-	struct fec_enet_private *fep;
-	unsigned long   flags;
-	int             retval;
-	fep = netdev_priv(dev);
-	spin_lock_irqsave(&fep->mii_lock, flags);
-	retval = mii_queue_unlocked(dev, regval, func);
-	spin_unlock_irqrestore(&fep->mii_lock, flags);
-	return retval;
-}
-
-static void mii_do_cmd(struct net_device *dev, const phy_cmd_t *c)
-{
-	if(!c)
-		return;
+	memcpy(dev->dev_addr, iap, ETH_ALEN);
 
-	for (; c->mii_data != mk_mii_end; c++)
-		mii_queue(dev, c->mii_data, c->funct);
+	/* Adjust MAC if using default MAC address */
+	if (iap == fec_mac_default)
+		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
 }
+#endif
 
-static void mii_parse_sr(uint mii_reg, struct net_device *dev)
-{
-	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
-	uint status;
-
-	status = *s & ~(PHY_STAT_LINK | PHY_STAT_FAULT | PHY_STAT_ANC);
-
-	if (mii_reg & 0x0004)
-		status |= PHY_STAT_LINK;
-	if (mii_reg & 0x0010)
-		status |= PHY_STAT_FAULT;
-	if (mii_reg & 0x0020)
-		status |= PHY_STAT_ANC;
-	*s = status;
-}
+/* ------------------------------------------------------------------------- */
 
-static void mii_parse_cr(uint mii_reg, struct net_device *dev)
+/*
+ * Phy section
+ */
+static void fec_enet_adjust_link(struct net_device *dev)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
-	uint status;
-
-	status = *s & ~(PHY_CONF_ANE | PHY_CONF_LOOP);
-
-	if (mii_reg & 0x1000)
-		status |= PHY_CONF_ANE;
-	if (mii_reg & 0x4000)
-		status |= PHY_CONF_LOOP;
-	*s = status;
-}
+	struct phy_device *phy_dev = fep->phy_dev;
+	unsigned long flags;
 
-static void mii_parse_anar(uint mii_reg, struct net_device *dev)
-{
-	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
-	uint status;
-
-	status = *s & ~(PHY_CONF_SPMASK);
-
-	if (mii_reg & 0x0020)
-		status |= PHY_CONF_10HDX;
-	if (mii_reg & 0x0040)
-		status |= PHY_CONF_10FDX;
-	if (mii_reg & 0x0080)
-		status |= PHY_CONF_100HDX;
-	if (mii_reg & 0x00100)
-		status |= PHY_CONF_100FDX;
-	*s = status;
-}
+	int status_change = 0;
 
-/* ------------------------------------------------------------------------- */
-/* The Level one LXT970 is used by many boards				     */
+	spin_lock_irqsave(&fep->hw_lock, flags);
 
-#define MII_LXT970_MIRROR    16  /* Mirror register           */
-#define MII_LXT970_IER       17  /* Interrupt Enable Register */
-#define MII_LXT970_ISR       18  /* Interrupt Status Register */
-#define MII_LXT970_CONFIG    19  /* Configuration Register    */
-#define MII_LXT970_CSR       20  /* Chip Status Register      */
+	/* Prevent a state halted on mii error */
+	if (fep->mii_timeout && phy_dev->state == PHY_HALTED) {
+		phy_dev->state = PHY_RESUMING;
+		goto spin_unlock;
+	}
 
-static void mii_parse_lxt970_csr(uint mii_reg, struct net_device *dev)
-{
-	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
-	uint status;
+	/* Duplex link change */
+	if (phy_dev->link) {
+		if (fep->full_duplex != phy_dev->duplex) {
+			fec_restart(dev, phy_dev->duplex);
+			status_change = 1;
+		}
+	}
 
-	status = *s & ~(PHY_STAT_SPMASK);
-	if (mii_reg & 0x0800) {
-		if (mii_reg & 0x1000)
-			status |= PHY_STAT_100FDX;
+	/* Link on or off change */
+	if (phy_dev->link != fep->link) {
+		fep->link = phy_dev->link;
+		if (phy_dev->link)
+			fec_restart(dev, phy_dev->duplex);
 		else
-			status |= PHY_STAT_100HDX;
-	} else {
-		if (mii_reg & 0x1000)
-			status |= PHY_STAT_10FDX;
-		else
-			status |= PHY_STAT_10HDX;
+			fec_stop(dev);
+		status_change = 1;
 	}
-	*s = status;
-}
-
-static phy_cmd_t const phy_cmd_lxt970_config[] = {
-		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
-		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_lxt970_startup[] = { /* enable interrupts */
-		{ mk_mii_write(MII_LXT970_IER, 0x0002), NULL },
-		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_lxt970_ack_int[] = {
-		/* read SR and ISR to acknowledge */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_read(MII_LXT970_ISR), NULL },
-
-		/* find out the current status */
-		{ mk_mii_read(MII_LXT970_CSR), mii_parse_lxt970_csr },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_lxt970_shutdown[] = { /* disable interrupts */
-		{ mk_mii_write(MII_LXT970_IER, 0x0000), NULL },
-		{ mk_mii_end, }
-	};
-static phy_info_t const phy_info_lxt970 = {
-	.id = 0x07810000,
-	.name = "LXT970",
-	.config = phy_cmd_lxt970_config,
-	.startup = phy_cmd_lxt970_startup,
-	.ack_int = phy_cmd_lxt970_ack_int,
-	.shutdown = phy_cmd_lxt970_shutdown
-};
 
-/* ------------------------------------------------------------------------- */
-/* The Level one LXT971 is used on some of my custom boards                  */
-
-/* register definitions for the 971 */
+spin_unlock:
+	spin_unlock_irqrestore(&fep->hw_lock, flags);
 
-#define MII_LXT971_PCR       16  /* Port Control Register     */
-#define MII_LXT971_SR2       17  /* Status Register 2         */
-#define MII_LXT971_IER       18  /* Interrupt Enable Register */
-#define MII_LXT971_ISR       19  /* Interrupt Status Register */
-#define MII_LXT971_LCR       20  /* LED Control Register      */
-#define MII_LXT971_TCR       30  /* Transmit Control Register */
+	if (status_change)
+		phy_print_status(phy_dev);
+}
 
 /*
- * I had some nice ideas of running the MDIO faster...
- * The 971 should support 8MHz and I tried it, but things acted really
- * weird, so 2.5 MHz ought to be enough for anyone...
+ * NOTE: a MII transaction is during around 25 us, so polling it...
  */
-
-static void mii_parse_lxt971_sr2(uint mii_reg, struct net_device *dev)
+static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 {
-	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
-	uint status;
+	struct fec_enet_private *fep = bus->priv;
+	int timeout = FEC_MII_TIMEOUT;
 
-	status = *s & ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
+	fep->mii_timeout = 0;
 
-	if (mii_reg & 0x0400) {
-		fep->link = 1;
-		status |= PHY_STAT_LINK;
-	} else {
-		fep->link = 0;
-	}
-	if (mii_reg & 0x0080)
-		status |= PHY_STAT_ANC;
-	if (mii_reg & 0x4000) {
-		if (mii_reg & 0x0200)
-			status |= PHY_STAT_100FDX;
-		else
-			status |= PHY_STAT_100HDX;
-	} else {
-		if (mii_reg & 0x0200)
-			status |= PHY_STAT_10FDX;
-		else
-			status |= PHY_STAT_10HDX;
+	/* clear MII end of transfer bit*/
+	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
+
+	/* start a read op */
+	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
+		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+		FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
+
+	/* wait for end of transfer */
+	while (!(readl(fep->hwp + FEC_IEVENT) & FEC_ENET_MII)) {
+		cpu_relax();
+		if (timeout-- < 0) {
+			fep->mii_timeout = 1;
+			printk(KERN_ERR "FEC: MDIO read timeout\n");
+			return -ETIMEDOUT;
+		}
 	}
-	if (mii_reg & 0x0008)
-		status |= PHY_STAT_FAULT;
 
-	*s = status;
+	/* return value */
+	return FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
 }
 
-static phy_cmd_t const phy_cmd_lxt971_config[] = {
-		/* limit to 10MBit because my prototype board
-		 * doesn't work with 100. */
-		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
-		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
-		{ mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_lxt971_startup[] = {  /* enable interrupts */
-		{ mk_mii_write(MII_LXT971_IER, 0x00f2), NULL },
-		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
-		{ mk_mii_write(MII_LXT971_LCR, 0xd422), NULL }, /* LED config */
-		/* Somehow does the 971 tell me that the link is down
-		 * the first read after power-up.
-		 * read here to get a valid value in ack_int */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_lxt971_ack_int[] = {
-		/* acknowledge the int before reading status ! */
-		{ mk_mii_read(MII_LXT971_ISR), NULL },
-		/* find out the current status */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_lxt971_shutdown[] = { /* disable interrupts */
-		{ mk_mii_write(MII_LXT971_IER, 0x0000), NULL },
-		{ mk_mii_end, }
-	};
-static phy_info_t const phy_info_lxt971 = {
-	.id = 0x0001378e,
-	.name = "LXT971",
-	.config = phy_cmd_lxt971_config,
-	.startup = phy_cmd_lxt971_startup,
-	.ack_int = phy_cmd_lxt971_ack_int,
-	.shutdown = phy_cmd_lxt971_shutdown
-};
-
-/* ------------------------------------------------------------------------- */
-/* The Quality Semiconductor QS6612 is used on the RPX CLLF                  */
-
-/* register definitions */
-
-#define MII_QS6612_MCR       17  /* Mode Control Register      */
-#define MII_QS6612_FTR       27  /* Factory Test Register      */
-#define MII_QS6612_MCO       28  /* Misc. Control Register     */
-#define MII_QS6612_ISR       29  /* Interrupt Source Register  */
-#define MII_QS6612_IMR       30  /* Interrupt Mask Register    */
-#define MII_QS6612_PCR       31  /* 100BaseTx PHY Control Reg. */
-
-static void mii_parse_qs6612_pcr(uint mii_reg, struct net_device *dev)
+static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
+			   u16 value)
 {
-	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
-	uint status;
+	struct fec_enet_private *fep = bus->priv;
+	int timeout = FEC_MII_TIMEOUT;
 
-	status = *s & ~(PHY_STAT_SPMASK);
+	fep->mii_timeout = 0;
 
-	switch((mii_reg >> 2) & 7) {
-	case 1: status |= PHY_STAT_10HDX; break;
-	case 2: status |= PHY_STAT_100HDX; break;
-	case 5: status |= PHY_STAT_10FDX; break;
-	case 6: status |= PHY_STAT_100FDX; break;
-}
-
-	*s = status;
-}
-
-static phy_cmd_t const phy_cmd_qs6612_config[] = {
-		/* The PHY powers up isolated on the RPX,
-		 * so send a command to allow operation.
-		 */
-		{ mk_mii_write(MII_QS6612_PCR, 0x0dc0), NULL },
-
-		/* parse cr and anar to get some info */
-		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
-		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_qs6612_startup[] = {  /* enable interrupts */
-		{ mk_mii_write(MII_QS6612_IMR, 0x003a), NULL },
-		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_qs6612_ack_int[] = {
-		/* we need to read ISR, SR and ANER to acknowledge */
-		{ mk_mii_read(MII_QS6612_ISR), NULL },
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_read(MII_REG_ANER), NULL },
-
-		/* read pcr to get info */
-		{ mk_mii_read(MII_QS6612_PCR), mii_parse_qs6612_pcr },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_qs6612_shutdown[] = { /* disable interrupts */
-		{ mk_mii_write(MII_QS6612_IMR, 0x0000), NULL },
-		{ mk_mii_end, }
-	};
-static phy_info_t const phy_info_qs6612 = {
-	.id = 0x00181440,
-	.name = "QS6612",
-	.config = phy_cmd_qs6612_config,
-	.startup = phy_cmd_qs6612_startup,
-	.ack_int = phy_cmd_qs6612_ack_int,
-	.shutdown = phy_cmd_qs6612_shutdown
-};
-
-/* ------------------------------------------------------------------------- */
-/* AMD AM79C874 phy                                                          */
+	/* clear MII end of transfer bit*/
+	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
 
-/* register definitions for the 874 */
+	/* start a read op */
+	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
+		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+		FEC_MMFR_TA | FEC_MMFR_DATA(value),
+		fep->hwp + FEC_MII_DATA);
+
+	/* wait for end of transfer */
+	while (!(readl(fep->hwp + FEC_IEVENT) & FEC_ENET_MII)) {
+		cpu_relax();
+		if (timeout-- < 0) {
+			fep->mii_timeout = 1;
+			printk(KERN_ERR "FEC: MDIO write timeout\n");
+			return -ETIMEDOUT;
+		}
+	}
 
-#define MII_AM79C874_MFR       16  /* Miscellaneous Feature Register */
-#define MII_AM79C874_ICSR      17  /* Interrupt/Status Register      */
-#define MII_AM79C874_DR        18  /* Diagnostic Register            */
-#define MII_AM79C874_PMLR      19  /* Power and Loopback Register    */
-#define MII_AM79C874_MCR       21  /* ModeControl Register           */
-#define MII_AM79C874_DC        23  /* Disconnect Counter             */
-#define MII_AM79C874_REC       24  /* Recieve Error Counter          */
+	return 0;
+}
 
-static void mii_parse_am79c874_dr(uint mii_reg, struct net_device *dev)
+static int fec_enet_mdio_reset(struct mii_bus *bus)
 {
-	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
-	uint status;
-
-	status = *s & ~(PHY_STAT_SPMASK | PHY_STAT_ANC);
-
-	if (mii_reg & 0x0080)
-		status |= PHY_STAT_ANC;
-	if (mii_reg & 0x0400)
-		status |= ((mii_reg & 0x0800) ? PHY_STAT_100FDX : PHY_STAT_100HDX);
-	else
-		status |= ((mii_reg & 0x0800) ? PHY_STAT_10FDX : PHY_STAT_10HDX);
-
-	*s = status;
+	return 0;
 }
 
-static phy_cmd_t const phy_cmd_am79c874_config[] = {
-		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
-		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
-		{ mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_am79c874_startup[] = {  /* enable interrupts */
-		{ mk_mii_write(MII_AM79C874_ICSR, 0xff00), NULL },
-		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_am79c874_ack_int[] = {
-		/* find out the current status */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
-		/* we only need to read ISR to acknowledge */
-		{ mk_mii_read(MII_AM79C874_ICSR), NULL },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_am79c874_shutdown[] = { /* disable interrupts */
-		{ mk_mii_write(MII_AM79C874_ICSR, 0x0000), NULL },
-		{ mk_mii_end, }
-	};
-static phy_info_t const phy_info_am79c874 = {
-	.id = 0x00022561,
-	.name = "AM79C874",
-	.config = phy_cmd_am79c874_config,
-	.startup = phy_cmd_am79c874_startup,
-	.ack_int = phy_cmd_am79c874_ack_int,
-	.shutdown = phy_cmd_am79c874_shutdown
-};
-
-
-/* ------------------------------------------------------------------------- */
-/* Kendin KS8721BL phy                                                       */
-
-/* register definitions for the 8721 */
-
-#define MII_KS8721BL_RXERCR	21
-#define MII_KS8721BL_ICSR	27
-#define	MII_KS8721BL_PHYCR	31
-
-static phy_cmd_t const phy_cmd_ks8721bl_config[] = {
-		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
-		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_ks8721bl_startup[] = {  /* enable interrupts */
-		{ mk_mii_write(MII_KS8721BL_ICSR, 0xff00), NULL },
-		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_ks8721bl_ack_int[] = {
-		/* find out the current status */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		/* we only need to read ISR to acknowledge */
-		{ mk_mii_read(MII_KS8721BL_ICSR), NULL },
-		{ mk_mii_end, }
-	};
-static phy_cmd_t const phy_cmd_ks8721bl_shutdown[] = { /* disable interrupts */
-		{ mk_mii_write(MII_KS8721BL_ICSR, 0x0000), NULL },
-		{ mk_mii_end, }
-	};
-static phy_info_t const phy_info_ks8721bl = {
-	.id = 0x00022161,
-	.name = "KS8721BL",
-	.config = phy_cmd_ks8721bl_config,
-	.startup = phy_cmd_ks8721bl_startup,
-	.ack_int = phy_cmd_ks8721bl_ack_int,
-	.shutdown = phy_cmd_ks8721bl_shutdown
-};
-
-/* ------------------------------------------------------------------------- */
-/* register definitions for the DP83848 */
-
-#define MII_DP8384X_PHYSTST    16  /* PHY Status Register */
-
-static void mii_parse_dp8384x_sr2(uint mii_reg, struct net_device *dev)
+static int fec_enet_mii_probe(struct net_device *dev)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
-
-	*s &= ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
-
-	/* Link up */
-	if (mii_reg & 0x0001) {
-		fep->link = 1;
-		*s |= PHY_STAT_LINK;
-	} else
-		fep->link = 0;
-	/* Status of link */
-	if (mii_reg & 0x0010)   /* Autonegotioation complete */
-		*s |= PHY_STAT_ANC;
-	if (mii_reg & 0x0002) {   /* 10MBps? */
-		if (mii_reg & 0x0004)   /* Full Duplex? */
-			*s |= PHY_STAT_10FDX;
-		else
-			*s |= PHY_STAT_10HDX;
-	} else {                  /* 100 Mbps? */
-		if (mii_reg & 0x0004)   /* Full Duplex? */
-			*s |= PHY_STAT_100FDX;
-		else
-			*s |= PHY_STAT_100HDX;
-	}
-	if (mii_reg & 0x0008)
-		*s |= PHY_STAT_FAULT;
-}
-
-static phy_info_t phy_info_dp83848= {
-	0x020005c9,
-	"DP83848",
+	struct phy_device *phy_dev = NULL;
+	int phy_addr;
 
-	(const phy_cmd_t []) {  /* config */
-		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
-		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
-		{ mk_mii_read(MII_DP8384X_PHYSTST), mii_parse_dp8384x_sr2 },
-		{ mk_mii_end, }
-	},
-	(const phy_cmd_t []) {  /* startup - enable interrupts */
-		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_end, }
-	},
-	(const phy_cmd_t []) { /* ack_int - never happens, no interrupt */
-		{ mk_mii_end, }
-	},
-	(const phy_cmd_t []) {  /* shutdown */
-		{ mk_mii_end, }
-	},
-};
+	/* find the first phy */
+	for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++) {
+		if (fep->mii_bus->phy_map[phy_addr]) {
+			phy_dev = fep->mii_bus->phy_map[phy_addr];
+			break;
+		}
+	}
 
-static phy_info_t phy_info_lan8700 = {
-	0x0007C0C,
-	"LAN8700",
-	(const phy_cmd_t []) { /* config */
-		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
-		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
-		{ mk_mii_end, }
-	},
-	(const phy_cmd_t []) { /* startup */
-		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
-		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
-		{ mk_mii_end, }
-	},
-	(const phy_cmd_t []) { /* act_int */
-		{ mk_mii_end, }
-	},
-	(const phy_cmd_t []) { /* shutdown */
-		{ mk_mii_end, }
-	},
-};
-/* ------------------------------------------------------------------------- */
+	if (!phy_dev) {
+		printk(KERN_ERR "%s: no PHY found\n", dev->name);
+		return -ENODEV;
+	}
 
-static phy_info_t const * const phy_info[] = {
-	&phy_info_lxt970,
-	&phy_info_lxt971,
-	&phy_info_qs6612,
-	&phy_info_am79c874,
-	&phy_info_ks8721bl,
-	&phy_info_dp83848,
-	&phy_info_lan8700,
-	NULL
-};
+	/* attach the mac to the phy */
+	phy_dev = phy_connect(dev, dev_name(&phy_dev->dev),
+			     &fec_enet_adjust_link, 0,
+			     PHY_INTERFACE_MODE_MII);
+	if (IS_ERR(phy_dev)) {
+		printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
+		return PTR_ERR(phy_dev);
+	}
 
-/* ------------------------------------------------------------------------- */
-#ifdef HAVE_mii_link_interrupt
-static irqreturn_t
-mii_link_interrupt(int irq, void * dev_id);
+	/* mask with MAC supported features */
+	phy_dev->supported &= PHY_BASIC_FEATURES;
+	phy_dev->advertising = phy_dev->supported;
 
-/*
- *	This is specific to the MII interrupt setup of the M5272EVB.
- */
-static void __inline__ fec_request_mii_intr(struct net_device *dev)
-{
-	if (request_irq(66, mii_link_interrupt, IRQF_DISABLED, "fec(MII)", dev) != 0)
-		printk("FEC: Could not allocate fec(MII) IRQ(66)!\n");
-}
+	fep->phy_dev = phy_dev;
+	fep->link = 0;
+	fep->full_duplex = 0;
 
-static void __inline__ fec_disable_phy_intr(struct net_device *dev)
-{
-	free_irq(66, dev);
+	return 0;
 }
-#endif
 
-#ifdef CONFIG_M5272
-static void __inline__ fec_get_mac(struct net_device *dev)
+static int fec_enet_mii_init(struct platform_device *pdev)
 {
+	struct net_device *dev = platform_get_drvdata(pdev);
 	struct fec_enet_private *fep = netdev_priv(dev);
-	unsigned char *iap, tmpaddr[ETH_ALEN];
+	int err = -ENXIO, i;
 
-	if (FEC_FLASHMAC) {
-		/*
-		 * Get MAC address from FLASH.
-		 * If it is all 1's or 0's, use the default.
-		 */
-		iap = (unsigned char *)FEC_FLASHMAC;
-		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
-		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
-			iap = fec_mac_default;
-		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
-		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
-			iap = fec_mac_default;
-	} else {
-		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
-		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
-		iap = &tmpaddr[0];
-	}
-
-	memcpy(dev->dev_addr, iap, ETH_ALEN);
-
-	/* Adjust MAC if using default MAC address */
-	if (iap == fec_mac_default)
-		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
-}
-#endif
+	fep->mii_timeout = 0;
 
-/* ------------------------------------------------------------------------- */
-
-static void mii_display_status(struct net_device *dev)
-{
-	struct fec_enet_private *fep = netdev_priv(dev);
-	volatile uint *s = &(fep->phy_status);
+	/*
+	 * Set MII speed to 2.5 MHz (= clk_get_rate() / 2 * phy_speed)
+	 */
+	fep->phy_speed = DIV_ROUND_UP(clk_get_rate(fep->clk), 5000000) << 1;
+	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
 
-	if (!fep->link && !fep->old_link) {
-		/* Link is still down - don't print anything */
-		return;
+	fep->mii_bus = mdiobus_alloc();
+	if (fep->mii_bus == NULL) {
+		err = -ENOMEM;
+		goto err_out;
 	}
 
-	printk("%s: status: ", dev->name);
-
-	if (!fep->link) {
-		printk("link down");
-	} else {
-		printk("link up");
-
-		switch(*s & PHY_STAT_SPMASK) {
-		case PHY_STAT_100FDX: printk(", 100MBit Full Duplex"); break;
-		case PHY_STAT_100HDX: printk(", 100MBit Half Duplex"); break;
-		case PHY_STAT_10FDX: printk(", 10MBit Full Duplex"); break;
-		case PHY_STAT_10HDX: printk(", 10MBit Half Duplex"); break;
-		default:
-			printk(", Unknown speed/duplex");
-		}
-
-		if (*s & PHY_STAT_ANC)
-			printk(", auto-negotiation complete");
+	fep->mii_bus->name = "fec_enet_mii_bus";
+	fep->mii_bus->read = fec_enet_mdio_read;
+	fep->mii_bus->write = fec_enet_mdio_write;
+	fep->mii_bus->reset = fec_enet_mdio_reset;
+	snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id);
+	fep->mii_bus->priv = fep;
+	fep->mii_bus->parent = &pdev->dev;
+
+	fep->mii_bus->irq = kmalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
+	if (!fep->mii_bus->irq) {
+		err = -ENOMEM;
+		goto err_out_free_mdiobus;
 	}
 
-	if (*s & PHY_STAT_FAULT)
-		printk(", remote fault");
-
-	printk(".\n");
-}
-
-static void mii_display_config(struct work_struct *work)
-{
-	struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
-	struct net_device *dev = fep->netdev;
-	uint status = fep->phy_status;
+	for (i = 0; i < PHY_MAX_ADDR; i++)
+		fep->mii_bus->irq[i] = PHY_POLL;
 
-	/*
-	** When we get here, phy_task is already removed from
-	** the workqueue.  It is thus safe to allow to reuse it.
-	*/
-	fep->mii_phy_task_queued = 0;
-	printk("%s: config: auto-negotiation ", dev->name);
-
-	if (status & PHY_CONF_ANE)
-		printk("on");
-	else
-		printk("off");
+	platform_set_drvdata(dev, fep->mii_bus);
 
-	if (status & PHY_CONF_100FDX)
-		printk(", 100FDX");
-	if (status & PHY_CONF_100HDX)
-		printk(", 100HDX");
-	if (status & PHY_CONF_10FDX)
-		printk(", 10FDX");
-	if (status & PHY_CONF_10HDX)
-		printk(", 10HDX");
-	if (!(status & PHY_CONF_SPMASK))
-		printk(", No speed/duplex selected?");
+	if (mdiobus_register(fep->mii_bus))
+		goto err_out_free_mdio_irq;
 
-	if (status & PHY_CONF_LOOP)
-		printk(", loopback enabled");
+	if (fec_enet_mii_probe(dev) != 0)
+		goto err_out_unregister_bus;
 
-	printk(".\n");
+	return 0;
 
-	fep->sequence_done = 1;
+err_out_unregister_bus:
+	mdiobus_unregister(fep->mii_bus);
+err_out_free_mdio_irq:
+	kfree(fep->mii_bus->irq);
+err_out_free_mdiobus:
+	mdiobus_free(fep->mii_bus);
+err_out:
+	return err;
 }
 
-static void mii_relink(struct work_struct *work)
+static void fec_enet_mii_remove(struct fec_enet_private *fep)
 {
-	struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
-	struct net_device *dev = fep->netdev;
-	int duplex;
-
-	/*
-	** When we get here, phy_task is already removed from
-	** the workqueue.  It is thus safe to allow to reuse it.
-	*/
-	fep->mii_phy_task_queued = 0;
-	fep->link = (fep->phy_status & PHY_STAT_LINK) ? 1 : 0;
-	mii_display_status(dev);
-	fep->old_link = fep->link;
-
-	if (fep->link) {
-		duplex = 0;
-		if (fep->phy_status
-		    & (PHY_STAT_100FDX | PHY_STAT_10FDX))
-			duplex = 1;
-		fec_restart(dev, duplex);
-	} else
-		fec_stop(dev);
+	if (fep->phy_dev)
+		phy_disconnect(fep->phy_dev);
+	mdiobus_unregister(fep->mii_bus);
+	kfree(fep->mii_bus->irq);
+	mdiobus_free(fep->mii_bus);
 }
 
-/* mii_queue_relink is called in interrupt context from mii_link_interrupt */
-static void mii_queue_relink(uint mii_reg, struct net_device *dev)
+static int fec_enet_get_settings(struct net_device *dev,
+				  struct ethtool_cmd *cmd)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
+	struct phy_device *phydev = fep->phy_dev;
 
-	/*
-	 * We cannot queue phy_task twice in the workqueue.  It
-	 * would cause an endless loop in the workqueue.
-	 * Fortunately, if the last mii_relink entry has not yet been
-	 * executed now, it will do the job for the current interrupt,
-	 * which is just what we want.
-	 */
-	if (fep->mii_phy_task_queued)
-		return;
+	if (!phydev)
+		return -ENODEV;
 
-	fep->mii_phy_task_queued = 1;
-	INIT_WORK(&fep->phy_task, mii_relink);
-	schedule_work(&fep->phy_task);
+	return phy_ethtool_gset(phydev, cmd);
 }
 
-/* mii_queue_config is called in interrupt context from fec_enet_mii */
-static void mii_queue_config(uint mii_reg, struct net_device *dev)
+static int fec_enet_set_settings(struct net_device *dev,
+				 struct ethtool_cmd *cmd)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
+	struct phy_device *phydev = fep->phy_dev;
 
-	if (fep->mii_phy_task_queued)
-		return;
+	if (!phydev)
+		return -ENODEV;
 
-	fep->mii_phy_task_queued = 1;
-	INIT_WORK(&fep->phy_task, mii_display_config);
-	schedule_work(&fep->phy_task);
+	return phy_ethtool_sset(phydev, cmd);
 }
 
-phy_cmd_t const phy_cmd_relink[] = {
-	{ mk_mii_read(MII_REG_CR), mii_queue_relink },
-	{ mk_mii_end, }
-	};
-phy_cmd_t const phy_cmd_config[] = {
-	{ mk_mii_read(MII_REG_CR), mii_queue_config },
-	{ mk_mii_end, }
-	};
-
-/* Read remainder of PHY ID. */
-static void
-mii_discover_phy3(uint mii_reg, struct net_device *dev)
+static void fec_enet_get_drvinfo(struct net_device *dev,
+				 struct ethtool_drvinfo *info)
 {
-	struct fec_enet_private *fep;
-	int i;
-
-	fep = netdev_priv(dev);
-	fep->phy_id |= (mii_reg & 0xffff);
-	printk("fec: PHY @ 0x%x, ID 0x%08x", fep->phy_addr, fep->phy_id);
-
-	for(i = 0; phy_info[i]; i++) {
-		if(phy_info[i]->id == (fep->phy_id >> 4))
-			break;
-	}
-
-	if (phy_info[i])
-		printk(" -- %s\n", phy_info[i]->name);
-	else
-		printk(" -- unknown PHY!\n");
+	struct fec_enet_private *fep = netdev_priv(dev);
 
-	fep->phy = phy_info[i];
-	fep->phy_id_done = 1;
+	strcpy(info->driver, fep->pdev->dev.driver->name);
+	strcpy(info->version, "Revision: 1.0");
+	strcpy(info->bus_info, dev_name(&dev->dev));
 }
 
-/* Scan all of the MII PHY addresses looking for someone to respond
- * with a valid ID.  This usually happens quickly.
- */
-static void
-mii_discover_phy(uint mii_reg, struct net_device *dev)
-{
-	struct fec_enet_private *fep;
-	uint phytype;
-
-	fep = netdev_priv(dev);
-
-	if (fep->phy_addr < 32) {
-		if ((phytype = (mii_reg & 0xffff)) != 0xffff && phytype != 0) {
-
-			/* Got first part of ID, now get remainder */
-			fep->phy_id = phytype << 16;
-			mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR2),
-							mii_discover_phy3);
-		} else {
-			fep->phy_addr++;
-			mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR1),
-							mii_discover_phy);
-		}
-	} else {
-		printk("FEC: No PHY device found.\n");
-		/* Disable external MII interface */
-		writel(0, fep->hwp + FEC_MII_SPEED);
-		fep->phy_speed = 0;
-#ifdef HAVE_mii_link_interrupt
-		fec_disable_phy_intr(dev);
-#endif
-	}
-}
+static struct ethtool_ops fec_enet_ethtool_ops = {
+	.get_settings		= fec_enet_get_settings,
+	.set_settings		= fec_enet_set_settings,
+	.get_drvinfo		= fec_enet_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+};
 
-/* This interrupt occurs when the PHY detects a link change */
-#ifdef HAVE_mii_link_interrupt
-static irqreturn_t
-mii_link_interrupt(int irq, void * dev_id)
+static int fec_enet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
-	struct	net_device *dev = dev_id;
 	struct fec_enet_private *fep = netdev_priv(dev);
+	struct phy_device *phydev = fep->phy_dev;
+
+	if (!netif_running(dev))
+		return -EINVAL;
 
-	mii_do_cmd(dev, fep->phy->ack_int);
-	mii_do_cmd(dev, phy_cmd_relink);  /* restart and display status */
+	if (!phydev)
+		return -ENODEV;
 
-	return IRQ_HANDLED;
+	return phy_mii_ioctl(phydev, if_mii(rq), cmd);
 }
-#endif
 
 static void fec_enet_free_buffers(struct net_device *dev)
 {
@@ -1509,35 +915,8 @@ fec_enet_open(struct net_device *dev)
 	if (ret)
 		return ret;
 
-	fep->sequence_done = 0;
-	fep->link = 0;
-
-	fec_restart(dev, 1);
-
-	if (fep->phy) {
-		mii_do_cmd(dev, fep->phy->ack_int);
-		mii_do_cmd(dev, fep->phy->config);
-		mii_do_cmd(dev, phy_cmd_config);  /* display configuration */
-
-		/* Poll until the PHY tells us its configuration
-		 * (not link state).
-		 * Request is initiated by mii_do_cmd above, but answer
-		 * comes by interrupt.
-		 * This should take about 25 usec per register at 2.5 MHz,
-		 * and we read approximately 5 registers.
-		 */
-		while(!fep->sequence_done)
-			schedule();
-
-		mii_do_cmd(dev, fep->phy->startup);
-	}
-
-	/* Set the initial link state to true. A lot of hardware
-	 * based on this device does not implement a PHY interrupt,
-	 * so we are never notified of link change.
-	 */
-	fep->link = 1;
-
+	/* schedule a link state check */
+	phy_start(fep->phy_dev);
 	netif_start_queue(dev);
 	fep->opened = 1;
 	return 0;
@@ -1550,6 +929,7 @@ fec_enet_close(struct net_device *dev)
 
 	/* Don't know what to do yet. */
 	fep->opened = 0;
+	phy_stop(fep->phy_dev);
 	netif_stop_queue(dev);
 	fec_stop(dev);
 
@@ -1666,6 +1046,7 @@ static const struct net_device_ops fec_netdev_ops = {
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_tx_timeout		= fec_timeout,
 	.ndo_set_mac_address	= fec_set_mac_address,
+	.ndo_do_ioctl           = fec_enet_ioctl,
 };
 
  /*
@@ -1689,7 +1070,6 @@ static int fec_enet_init(struct net_device *dev, int index)
 	}
 
 	spin_lock_init(&fep->hw_lock);
-	spin_lock_init(&fep->mii_lock);
 
 	fep->index = index;
 	fep->hwp = (void __iomem *)dev->base_addr;
@@ -1716,20 +1096,10 @@ static int fec_enet_init(struct net_device *dev, int index)
 	fep->rx_bd_base = cbd_base;
 	fep->tx_bd_base = cbd_base + RX_RING_SIZE;
 
-#ifdef HAVE_mii_link_interrupt
-	fec_request_mii_intr(dev);
-#endif
 	/* The FEC Ethernet specific entries in the device structure */
 	dev->watchdog_timeo = TX_TIMEOUT;
 	dev->netdev_ops = &fec_netdev_ops;
-
-	for (i=0; i<NMII-1; i++)
-		mii_cmds[i].mii_next = &mii_cmds[i+1];
-	mii_free = mii_cmds;
-
-	/* Set MII speed to 2.5 MHz */
-	fep->phy_speed = ((((clk_get_rate(fep->clk) / 2 + 4999999)
-					/ 2500000) / 2) & 0x3F) << 1;
+	dev->ethtool_ops = &fec_enet_ethtool_ops;
 
 	/* Initialize the receive buffer descriptors. */
 	bdp = fep->rx_bd_base;
@@ -1760,13 +1130,6 @@ static int fec_enet_init(struct net_device *dev, int index)
 
 	fec_restart(dev, 0);
 
-	/* Queue up command to detect the PHY and initialize the
-	 * remainder of the interface.
-	 */
-	fep->phy_id_done = 0;
-	fep->phy_addr = 0;
-	mii_queue(dev, mk_mii_read(MII_REG_PHYIR1), mii_discover_phy);
-
 	return 0;
 }
 
@@ -1835,8 +1198,7 @@ fec_restart(struct net_device *dev, int duplex)
 	writel(0, fep->hwp + FEC_R_DES_ACTIVE);
 
 	/* Enable interrupts we wish to service */
-	writel(FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII,
-			fep->hwp + FEC_IMASK);
+	writel(FEC_ENET_TXF | FEC_ENET_RXF, fep->hwp + FEC_IMASK);
 }
 
 static void
@@ -1859,7 +1221,6 @@ fec_stop(struct net_device *dev)
 	/* Clear outstanding MII command interrupts. */
 	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
 
-	writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
 	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
 }
 
@@ -1891,6 +1252,7 @@ fec_probe(struct platform_device *pdev)
 	memset(fep, 0, sizeof(*fep));
 
 	ndev->base_addr = (unsigned long)ioremap(r->start, resource_size(r));
+	fep->pdev = pdev;
 
 	if (!ndev->base_addr) {
 		ret = -ENOMEM;
@@ -1926,13 +1288,24 @@ fec_probe(struct platform_device *pdev)
 	if (ret)
 		goto failed_init;
 
+	ret = fec_enet_mii_init(pdev);
+	if (ret)
+		goto failed_mii_init;
+
 	ret = register_netdev(ndev);
 	if (ret)
 		goto failed_register;
 
+	printk(KERN_INFO "%s: Freescale FEC PHY driver [%s] "
+		"(mii_bus:phy_addr=%s, irq=%d)\n", ndev->name,
+		fep->phy_dev->drv->name, dev_name(&fep->phy_dev->dev),
+		fep->phy_dev->irq);
+
 	return 0;
 
 failed_register:
+	fec_enet_mii_remove(fep);
+failed_mii_init:
 failed_init:
 	clk_disable(fep->clk);
 	clk_put(fep->clk);
@@ -1959,6 +1332,7 @@ fec_drv_remove(struct platform_device *pdev)
 	platform_set_drvdata(pdev, NULL);
 
 	fec_stop(ndev);
+	fec_enet_mii_remove(fep);
 	clk_disable(fep->clk);
 	clk_put(fep->clk);
 	iounmap((void __iomem *)ndev->base_addr);
-- 
1.7.0.1

^ permalink raw reply related

* Re: [PATCH] r8169: Fix rtl8169_rx_interrupt()
From: Eric Dumazet @ 2010-03-31 12:08 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Oleg Nesterov, David Miller, Francois Romieu, netdev
In-Reply-To: <20100325113038.GA3471@swordfish.minsk.epam.com>

Le jeudi 25 mars 2010 à 13:30 +0200, Sergey Senozhatsky a écrit :
> Hello,
> 
> On (03/16/10 19:48), Eric Dumazet wrote:
> [..]
> > OK thanks for the report, this rtl8169_reset_task() seems pretty buggy,
> > or multiple invocation...
> > 
> > Did you tried removing the rtl8169_schedule_work() call from
> > rtl8169_rx_interrupt() ?
> > 
> > Maybe the reset is not necessary at all in case of fifo overflow..
> > 
> > Cumulative patch :
> > 
> [..]
> 
> Eric, I think you can put Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> on the cumulative patch as I don't see any errors with pktgen over LAN testing.
> 

OK thanks, here is the patch then, but I prefer let the
rtl8169_schedule_work() from rtl8169_rx_interrupt() for now, its removal
might be the subject of another patch.

Lets be very conservative for now.

[PATCH net-next-2.6] r8169: Fix rtl8169_rx_interrupt()

In case a reset is performed, rtl8169_rx_interrupt() is called from
process context instead of softirq context. Special care must be taken
to call appropriate network core services (netif_rx() instead of
netif_receive_skb()). VLAN handling also corrected.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Diagnosed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/r8169.c |   22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 964305c..1dffe29 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1054,14 +1054,14 @@ static void rtl8169_vlan_rx_register(struct net_device *dev,
 }
 
 static int rtl8169_rx_vlan_skb(struct rtl8169_private *tp, struct RxDesc *desc,
-			       struct sk_buff *skb)
+			       struct sk_buff *skb, int polling)
 {
 	u32 opts2 = le32_to_cpu(desc->opts2);
 	struct vlan_group *vlgrp = tp->vlgrp;
 	int ret;
 
 	if (vlgrp && (opts2 & RxVlanTag)) {
-		vlan_hwaccel_receive_skb(skb, vlgrp, swab16(opts2 & 0xffff));
+		__vlan_hwaccel_rx(skb, vlgrp, swab16(opts2 & 0xffff), polling);
 		ret = 0;
 	} else
 		ret = -1;
@@ -1078,7 +1078,7 @@ static inline u32 rtl8169_tx_vlan_tag(struct rtl8169_private *tp,
 }
 
 static int rtl8169_rx_vlan_skb(struct rtl8169_private *tp, struct RxDesc *desc,
-			       struct sk_buff *skb)
+			       struct sk_buff *skb, int polling)
 {
 	return -1;
 }
@@ -4467,12 +4467,20 @@ out:
 	return done;
 }
 
+/*
+ * Warning : rtl8169_rx_interrupt() might be called :
+ * 1) from NAPI (softirq) context
+ *	(polling = 1 : we should call netif_receive_skb())
+ * 2) from process context (rtl8169_reset_task())
+ *	(polling = 0 : we must call netif_rx() instead)
+ */		
 static int rtl8169_rx_interrupt(struct net_device *dev,
 				struct rtl8169_private *tp,
 				void __iomem *ioaddr, u32 budget)
 {
 	unsigned int cur_rx, rx_left;
 	unsigned int delta, count;
+	int polling = (budget != ~(u32)0) ? 1 : 0;
 
 	cur_rx = tp->cur_rx;
 	rx_left = NUM_RX_DESC + tp->dirty_rx - cur_rx;
@@ -4534,8 +4542,12 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 			skb_put(skb, pkt_size);
 			skb->protocol = eth_type_trans(skb, dev);
 
-			if (rtl8169_rx_vlan_skb(tp, desc, skb) < 0)
-				netif_receive_skb(skb);
+			if (rtl8169_rx_vlan_skb(tp, desc, skb, polling) < 0) {
+				if (likely(polling))
+					netif_receive_skb(skb);
+				else
+					netif_rx(skb);
+			}
 
 			dev->stats.rx_bytes += pkt_size;
 			dev->stats.rx_packets++;



^ permalink raw reply related

* Re: [PATCH v2] Add Mergeable RX buffer feature to vhost_net
From: Michael S. Tsirkin @ 2010-03-31 12:02 UTC (permalink / raw)
  To: David Stevens; +Cc: kvm, kvm-owner, netdev, rusty, virtualization
In-Reply-To: <OF35D5EAC9.45A7DC5F-ON882576F7.0006F45E-882576F7.0007AB62@us.ibm.com>

On Tue, Mar 30, 2010 at 07:23:48PM -0600, David Stevens wrote:
> This patch adds support for the Mergeable Receive Buffers feature to
> vhost_net.
> 
> Changes:
>         1) generalize descriptor allocation functions to allow multiple
>                 descriptors per packet
>         2) add socket "peek" to know datalen at buffer allocation time
>         3) change notification to enable a multi-buffer max packet, rather
>                 than the single-buffer run until completely empty
> 
> Changes from previous revision:
> 1) incorporate review comments from Michael Tsirkin
> 2) assume use of TUNSETVNETHDRSZ ioctl by qemu, which simplifies vnet 
> header
>         processing
> 3) fixed notification code to only affect the receive side
> 
> Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
> 
> [in-line for review, attached for applying w/o whitespace mangling]

attached patch seems to be whiespace damaged as well.
Does the origin pass checkpatch.pl for you?

> diff -ruNp net-next-p0/drivers/vhost/net.c net-next-p3/drivers/vhost/net.c
> --- net-next-p0/drivers/vhost/net.c     2010-03-22 12:04:38.000000000 
> -0700
> +++ net-next-p3/drivers/vhost/net.c     2010-03-30 12:50:57.000000000 
> -0700
> @@ -54,26 +54,6 @@ struct vhost_net {
>         enum vhost_net_poll_state tx_poll_state;
>  };
>  
> -/* Pop first len bytes from iovec. Return number of segments used. */
> -static int move_iovec_hdr(struct iovec *from, struct iovec *to,
> -                         size_t len, int iov_count)
> -{
> -       int seg = 0;
> -       size_t size;
> -       while (len && seg < iov_count) {
> -               size = min(from->iov_len, len);
> -               to->iov_base = from->iov_base;
> -               to->iov_len = size;
> -               from->iov_len -= size;
> -               from->iov_base += size;
> -               len -= size;
> -               ++from;
> -               ++to;
> -               ++seg;
> -       }
> -       return seg;
> -}
> -
>  /* Caller must have TX VQ lock */
>  static void tx_poll_stop(struct vhost_net *net)
>  {
> @@ -97,7 +77,8 @@ static void tx_poll_start(struct vhost_n
>  static void handle_tx(struct vhost_net *net)
>  {
>         struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_TX];
> -       unsigned head, out, in, s;
> +       unsigned out, in;
> +       struct iovec head;
>         struct msghdr msg = {
>                 .msg_name = NULL,
>                 .msg_namelen = 0,
> @@ -108,8 +89,8 @@ static void handle_tx(struct vhost_net *
>         };
>         size_t len, total_len = 0;
>         int err, wmem;
> -       size_t hdr_size;
>         struct socket *sock = rcu_dereference(vq->private_data);
> +
>         if (!sock)
>                 return;
>  
> @@ -127,22 +108,19 @@ static void handle_tx(struct vhost_net *
>  
>         if (wmem < sock->sk->sk_sndbuf / 2)
>                 tx_poll_stop(net);
> -       hdr_size = vq->hdr_size;
>  
>         for (;;) {
> -               head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
> -                                        ARRAY_SIZE(vq->iov),
> -                                        &out, &in,
> -                                        NULL, NULL);
> +               head.iov_base = (void *)vhost_get_vq_desc(&net->dev, vq,
> +                       vq->iov, ARRAY_SIZE(vq->iov), &out, &in, NULL, 
> NULL);

I this casting confusing.
Is it really expensive to add an array of heads so that
we do not need to cast?


>                 /* Nothing new?  Wait for eventfd to tell us they 
> refilled. */
> -               if (head == vq->num) {
> +               if (head.iov_base == (void *)vq->num) {
>                         wmem = atomic_read(&sock->sk->sk_wmem_alloc);
>                         if (wmem >= sock->sk->sk_sndbuf * 3 / 4) {
>                                 tx_poll_start(net, sock);
>                                 set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
>                                 break;
>                         }
> -                       if (unlikely(vhost_enable_notify(vq))) {
> +                       if (unlikely(vhost_enable_notify(vq, 0))) {
>                                 vhost_disable_notify(vq);
>                                 continue;
>                         }
> @@ -154,27 +132,30 @@ static void handle_tx(struct vhost_net *
>                         break;
>                 }
>                 /* Skip header. TODO: support TSO. */
> -               s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out);
>                 msg.msg_iovlen = out;
> -               len = iov_length(vq->iov, out);
> +               head.iov_len = len = iov_length(vq->iov, out);
> +
>                 /* Sanity check */
>                 if (!len) {
> -                       vq_err(vq, "Unexpected header len for TX: "
> -                              "%zd expected %zd\n",
> -                              iov_length(vq->hdr, s), hdr_size);
> +                       vq_err(vq, "Unexpected buffer len for TX: %zd ", 
> len);
>                         break;
>                 }
> -               /* TODO: Check specific error and bomb out unless ENOBUFS? 
> */
>                 err = sock->ops->sendmsg(NULL, sock, &msg, len);
>                 if (unlikely(err < 0)) {
> -                       vhost_discard_vq_desc(vq);
> -                       tx_poll_start(net, sock);
> +                       if (err == -EAGAIN) {
> +                               vhost_discard_vq_desc(vq, 1);
> +                               tx_poll_start(net, sock);
> +                       } else {
> +                               vq_err(vq, "sendmsg: errno %d\n", -err);
> +                               /* drop packet; do not discard/resend */
> +                               vhost_add_used_and_signal(&net->dev, vq, 
> &head,
> +                                                         1, 0);
> +                       }
>                         break;
>                 }
>                 if (err != len)
> -                       pr_err("Truncated TX packet: "
> -                              " len %d != %zd\n", err, len);
> -               vhost_add_used_and_signal(&net->dev, vq, head, 0);
> +                       pr_err("Truncated TX packet: len %d != %zd\n", 
> err, len);
> +               vhost_add_used_and_signal(&net->dev, vq, &head, 1, 0);
>                 total_len += len;
>                 if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
>                         vhost_poll_queue(&vq->poll);
> @@ -186,12 +167,25 @@ static void handle_tx(struct vhost_net *
>         unuse_mm(net->dev.mm);
>  }
>  
> +static int vhost_head_len(struct sock *sk)
> +{
> +       struct sk_buff *head;
> +       int len = 0;
> +
> +       lock_sock(sk);
> +       head = skb_peek(&sk->sk_receive_queue);
> +       if (head)
> +               len = head->len;
> +       release_sock(sk);
> +       return len;
> +}
> +
>  /* Expects to be always run from workqueue - which acts as
>   * read-size critical section for our kind of RCU. */
>  static void handle_rx(struct vhost_net *net)
>  {
>         struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX];
> -       unsigned head, out, in, log, s;
> +       unsigned in, log;
>         struct vhost_log *vq_log;
>         struct msghdr msg = {
>                 .msg_name = NULL,
> @@ -202,34 +196,25 @@ static void handle_rx(struct vhost_net *
>                 .msg_flags = MSG_DONTWAIT,
>         };
>  
> -       struct virtio_net_hdr hdr = {
> -               .flags = 0,
> -               .gso_type = VIRTIO_NET_HDR_GSO_NONE
> -       };
> -
>         size_t len, total_len = 0;
> -       int err;
> -       size_t hdr_size;
> +       int err, headcount, datalen;
>         struct socket *sock = rcu_dereference(vq->private_data);
> +
>         if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue))
>                 return;
>  
>         use_mm(net->dev.mm);
>         mutex_lock(&vq->mutex);
>         vhost_disable_notify(vq);
> -       hdr_size = vq->hdr_size;
>  
>         vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
>                 vq->log : NULL;
>  
> -       for (;;) {
> -               head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
> -                                        ARRAY_SIZE(vq->iov),
> -                                        &out, &in,
> -                                        vq_log, &log);
> +       while ((datalen = vhost_head_len(sock->sk))) {
> +               headcount = vhost_get_heads(vq, datalen, &in, vq_log, 
> &log);
>                 /* OK, now we need to know about added descriptors. */
> -               if (head == vq->num) {
> -                       if (unlikely(vhost_enable_notify(vq))) {
> +               if (!headcount) {
> +                       if (unlikely(vhost_enable_notify(vq, 1))) {
>                                 /* They have slipped one in as we were
>                                  * doing that: check again. */
>                                 vhost_disable_notify(vq);
> @@ -240,46 +225,42 @@ static void handle_rx(struct vhost_net *
>                         break;
>                 }
>                 /* We don't need to be notified again. */
> -               if (out) {
> -                       vq_err(vq, "Unexpected descriptor format for RX: "
> -                              "out %d, int %d\n",
> -                              out, in);
> -                       break;
> -               }
> -               /* Skip header. TODO: support TSO/mergeable rx buffers. */
> -               s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, in);
> +               if (vq->rxmaxheadcount < headcount)
> +                       vq->rxmaxheadcount = headcount;

This seems the only place where we set the rxmaxheadcount
value. Maybe it can be moved out of vhost.c to net.c?
If vhost library needs this it can get it as function
parameter.

> +               /* Skip header. TODO: support TSO. */

You seem to have removed the code that skips the header.
Won't this break non-GSO backends such as raw?

>                 msg.msg_iovlen = in;
>                 len = iov_length(vq->iov, in);
>                 /* Sanity check */
>                 if (!len) {
> -                       vq_err(vq, "Unexpected header len for RX: "
> -                              "%zd expected %zd\n",
> -                              iov_length(vq->hdr, s), hdr_size);
> +                       vq_err(vq, "Unexpected buffer len for RX: %zd\n", 
> len);
>                         break;
>                 }
>                 err = sock->ops->recvmsg(NULL, sock, &msg,
>                                          len, MSG_DONTWAIT | MSG_TRUNC);
> -               /* TODO: Check specific error and bomb out unless EAGAIN? 
> */

Do you think it's not a good idea?

>                 if (err < 0) {
> -                       vhost_discard_vq_desc(vq);
> +                       vhost_discard_vq_desc(vq, headcount);
>                         break;
>                 }

I think we should detect and discard truncated messages,
since len might not be reliable if userspace pulls
a packet from under us. Also, if new packet is
shorter than the old one, there's no truncation but
headcount is wrong.

So simplest fix IMO would be to compare err with expected len.
If there's a difference, we hit the race and so
we would discard the packet.


>                 /* TODO: Should check and handle checksum. */
> +               if (vhost_has_feature(&net->dev, VIRTIO_NET_F_MRG_RXBUF)) 
> {
> +                       struct virtio_net_hdr_mrg_rxbuf *vhdr =
> +                               (struct virtio_net_hdr_mrg_rxbuf *)
> +                               vq->iov[0].iov_base;
> +                       /* add num_bufs */
> +                       if (put_user(headcount, &vhdr->num_buffers)) {
> +                               vq_err(vq, "Failed to write num_buffers");
> +                               vhost_discard_vq_desc(vq, headcount);

Let's do memcpy_to_iovecend etc so that we do not assume layout.
This is also why we need move_iovec: sendmsg might modify the iovec.
It would also be nice not to corrupt memory and
get a reasonable error if buffer size
that we get is smaller than expected header size.

> +                               break;
> +                       }
> +               }
>                 if (err > len) {
>                         pr_err("Discarded truncated rx packet: "
>                                " len %d > %zd\n", err, len);
> -                       vhost_discard_vq_desc(vq);
> +                       vhost_discard_vq_desc(vq, headcount);
>                         continue;
>                 }
>                 len = err;
> -               err = memcpy_toiovec(vq->hdr, (unsigned char *)&hdr, 
> hdr_size);
> -               if (err) {
> -                       vq_err(vq, "Unable to write vnet_hdr at addr %p: 
> %d\n",
> -                              vq->iov->iov_base, err);
> -                       break;
> -               }
> -               len += hdr_size;

This seems to break non-GSO backends as well.

> -               vhost_add_used_and_signal(&net->dev, vq, head, len);
> + vhost_add_used_and_signal(&net->dev,vq,vq->heads,headcount,1);
>                 if (unlikely(vq_log))
>                         vhost_log_write(vq, vq_log, log, len);
>                 total_len += len;
> @@ -560,9 +541,6 @@ done:
>  
>  static int vhost_net_set_features(struct vhost_net *n, u64 features)
>  {
> -       size_t hdr_size = features & (1 << VHOST_NET_F_VIRTIO_NET_HDR) ?
> -               sizeof(struct virtio_net_hdr) : 0;
> -       int i;
>         mutex_lock(&n->dev.mutex);
>         if ((features & (1 << VHOST_F_LOG_ALL)) &&
>             !vhost_log_access_ok(&n->dev)) {
> @@ -571,11 +549,6 @@ static int vhost_net_set_features(struct
>         }
>         n->dev.acked_features = features;
>         smp_wmb();
> -       for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
> -               mutex_lock(&n->vqs[i].mutex);
> -               n->vqs[i].hdr_size = hdr_size;
> -               mutex_unlock(&n->vqs[i].mutex);

I expect the above is a leftover from the previous version
which calculated header size in kernel?

> -       }
>         vhost_net_flush(n);
>         mutex_unlock(&n->dev.mutex);
>         return 0;
> diff -ruNp net-next-p0/drivers/vhost/vhost.c 
> net-next-p3/drivers/vhost/vhost.c
> --- net-next-p0/drivers/vhost/vhost.c   2010-03-22 12:04:38.000000000 
> -0700
> +++ net-next-p3/drivers/vhost/vhost.c   2010-03-29 20:12:42.000000000 
> -0700
> @@ -113,7 +113,7 @@ static void vhost_vq_reset(struct vhost_
>         vq->used_flags = 0;
>         vq->log_used = false;
>         vq->log_addr = -1ull;
> -       vq->hdr_size = 0;
> +       vq->rxmaxheadcount = 0;
>         vq->private_data = NULL;
>         vq->log_base = NULL;
>         vq->error_ctx = NULL;
> @@ -410,6 +410,7 @@ static long vhost_set_vring(struct vhost
>                 vq->last_avail_idx = s.num;
>                 /* Forget the cached index value. */
>                 vq->avail_idx = vq->last_avail_idx;
> +               vq->rxmaxheadcount = 0;
>                 break;
>         case VHOST_GET_VRING_BASE:
>                 s.index = idx;
> @@ -856,6 +857,48 @@ static unsigned get_indirect(struct vhos
>         return 0;
>  }
>  
> +/* This is a multi-head version of vhost_get_vq_desc

How about:
version of vhost_get_vq_desc that returns multiple
descriptors

> + * @vq         - the relevant virtqueue
> + * datalen     - data length we'll be reading
> + * @iovcount   - returned count of io vectors we fill
> + * @log                - vhost log
> + * @log_num    - log offset

return value?
Also - why unsigned?

> + */
> +unsigned vhost_get_heads(struct vhost_virtqueue *vq, int datalen, int 

Would vhost_get_vq_desc_multiple be a better name?

> *iovcount,
> +       struct vhost_log *log, unsigned int *log_num)
> +{
> +       struct iovec *heads = vq->heads;

I think it's better to pass in heads array than take it from vq->heads.

> +       int out, in = 0;

Why is in initialized here?

> +       int seg = 0;            /* iov index */
> +       int hc = 0;             /* head count */
> +
> +       while (datalen > 0) {

Can't this simply call vhost_get_vq_desc in a loop somehow,
or use a common function that both this and vhost_get_vq_desc call?

> +               if (hc >= VHOST_NET_MAX_SG) {
> +                       vhost_discard_vq_desc(vq, hc);
> +                       return 0;
> +               }
> +               heads[hc].iov_base = (void *)vhost_get_vq_desc(vq->dev, 
> vq,
> +                       vq->iov+seg, ARRAY_SIZE(vq->iov)-seg, &out, &in, 
> log,
> +                       log_num);
> +               if (heads[hc].iov_base == (void *)vq->num) {
> +                       vhost_discard_vq_desc(vq, hc);
> +                       return 0;
> +               }
> +               if (out || in <= 0) {
> +                       vq_err(vq, "unexpected descriptor format for RX: "
> +                               "out %d, in %d\n", out, in);
> +                       vhost_discard_vq_desc(vq, hc);

goto err above might help simplify cleanup.

> +                       return 0;
> +               }
> +               heads[hc].iov_len = iov_length(vq->iov+seg, in);
> +               datalen -= heads[hc].iov_len;
> +               hc++;
> +               seg += in;
> +       }
> +       *iovcount = seg;
> +       return hc;
> +}
> +
>  /* This looks in the virtqueue and for the first available buffer, and 
> converts
>   * it to an iovec for convenient access.  Since descriptors consist of 
> some
>   * number of output then some number of input descriptors, it's actually 
> two
> @@ -981,31 +1024,36 @@ unsigned vhost_get_vq_desc(struct vhost_
>  }
>  
>  /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
> -void vhost_discard_vq_desc(struct vhost_virtqueue *vq)
> +void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
>  {
> -       vq->last_avail_idx--;
> +       vq->last_avail_idx -= n;
>  }
>  
>  /* After we've used one of their buffers, we tell them about it.  We'll 
> then
>   * want to notify the guest, using eventfd. */
> -int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int 
> len)
> +int vhost_add_used(struct vhost_virtqueue *vq, struct iovec *heads, int 
> count)

count is always 0 for send, right?
I think it is better to have two APIs here as well:
vhost_add_used
vhost_add_used_multiple
we can use functions to avoid code duplication.

>  {
> -       struct vring_used_elem *used;
> +       struct vring_used_elem *used = 0;

why is used initialized here?

> +       int i;
>  
> -       /* The virtqueue contains a ring of used buffers.  Get a pointer 
> to the
> -        * next entry in that used ring. */
> -       used = &vq->used->ring[vq->last_used_idx % vq->num];
> -       if (put_user(head, &used->id)) {
> -               vq_err(vq, "Failed to write used id");
> -               return -EFAULT;
> -       }
> -       if (put_user(len, &used->len)) {
> -               vq_err(vq, "Failed to write used len");
> -               return -EFAULT;
> +       if (count <= 0)
> +               return -EINVAL;
> +
> +       for (i = 0; i < count; ++i) {
> +               used = &vq->used->ring[vq->last_used_idx % vq->num];
> +               if (put_user((unsigned)heads[i].iov_base, &used->id)) {
> +                       vq_err(vq, "Failed to write used id");
> +                       return -EFAULT;
> +               }
> +               if (put_user(heads[i].iov_len, &used->len)) {
> +                       vq_err(vq, "Failed to write used len");
> +                       return -EFAULT;
> +               }
> +               vq->last_used_idx++;

I think we should update last_used_idx on success only,
at the end. Simply use last_used_idx + count
instead of last_used_idx + 1.

>         }
>         /* Make sure buffer is written before we update index. */
>         smp_wmb();
> -       if (put_user(vq->last_used_idx + 1, &vq->used->idx)) {
> +       if (put_user(vq->last_used_idx, &vq->used->idx)) {
>                 vq_err(vq, "Failed to increment used idx");
>                 return -EFAULT;
>         }
> @@ -1023,22 +1071,35 @@ int vhost_add_used(struct vhost_virtqueu
>                 if (vq->log_ctx)
>                         eventfd_signal(vq->log_ctx, 1);
>         }
> -       vq->last_used_idx++;
>         return 0;
>  }
>  
> +int vhost_available(struct vhost_virtqueue *vq)

since this function is non-static, please document what it does.

> +{
> +       int avail;
> +
> +       if (!vq->rxmaxheadcount)        /* haven't got any yet */
> +               return 1;

since seems to make net - specific asumptions.
How about moving this check out to net.c?

> +       avail = vq->avail_idx - vq->last_avail_idx;
> +       if (avail < 0)
> +               avail += 0x10000; /* wrapped */

A branch that is likely non-taken. Also, rxmaxheadcount
is really unlikely to get as large as half the ring.
So this just wastes cycles?

> +       return avail;
> +}
> +
>  /* This actually signals the guest, using eventfd. */
> -void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq, bool 
> recvr)

This one is not elegant. receive is net.c concept, let's not
put it in vhost.c. Pass in headcount if you must.

>  {
>         __u16 flags = 0;
> +
>         if (get_user(flags, &vq->avail->flags)) {
>                 vq_err(vq, "Failed to get flags");
>                 return;
>         }
>  
> -       /* If they don't want an interrupt, don't signal, unless empty. */
> +       /* If they don't want an interrupt, don't signal, unless
> +        * empty or receiver can't get a max-sized packet. */
>         if ((flags & VRING_AVAIL_F_NO_INTERRUPT) &&
> -           (vq->avail_idx != vq->last_avail_idx ||
> +           (!recvr || vhost_available(vq) >= vq->rxmaxheadcount ||

Is the above really worth the complexity?
Guests can't rely on this kind of fuzzy logic, can they?
Do you see this helping performance at all?

>              !vhost_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY)))
>                 return;
>  
> @@ -1050,14 +1111,14 @@ void vhost_signal(struct vhost_dev *dev,
>  /* And here's the combo meal deal.  Supersize me! */
>  void vhost_add_used_and_signal(struct vhost_dev *dev,
>                                struct vhost_virtqueue *vq,
> -                              unsigned int head, int len)
> +                              struct iovec *heads, int count, bool recvr)
>  {
> -       vhost_add_used(vq, head, len);
> -       vhost_signal(dev, vq);
> +       vhost_add_used(vq, heads, count);
> +       vhost_signal(dev, vq, recvr);
>  }
>  
>  /* OK, now we need to know about added descriptors. */
> -bool vhost_enable_notify(struct vhost_virtqueue *vq)
> +bool vhost_enable_notify(struct vhost_virtqueue *vq, bool recvr)
>  {
>         u16 avail_idx;
>         int r;
> @@ -1080,6 +1141,8 @@ bool vhost_enable_notify(struct vhost_vi
>                 return false;
>         }
>  
> +       if (recvr && vq->rxmaxheadcount)
> +               return (avail_idx - vq->last_avail_idx) >= 
> vq->rxmaxheadcount;

The fuzzy logic behind rxmaxheadcount might be a good
optimization, but I am not comfortable using it
for correctness. Maybe vhost_enable_notify should get
the last head so we can redo poll when another one
is added.

>         return avail_idx != vq->last_avail_idx;
>  }
>  
> diff -ruNp net-next-p0/drivers/vhost/vhost.h 
> net-next-p3/drivers/vhost/vhost.h
> --- net-next-p0/drivers/vhost/vhost.h   2010-03-22 12:04:38.000000000 
> -0700
> +++ net-next-p3/drivers/vhost/vhost.h   2010-03-29 20:07:17.000000000 
> -0700
> @@ -82,9 +82,9 @@ struct vhost_virtqueue {
>         u64 log_addr;
>  
>         struct iovec indirect[VHOST_NET_MAX_SG];
> -       struct iovec iov[VHOST_NET_MAX_SG];
> -       struct iovec hdr[VHOST_NET_MAX_SG];
> -       size_t hdr_size;
> +       struct iovec iov[VHOST_NET_MAX_SG+1]; /* an extra for vnet hdr */

VHOST_NET_MAX_SG should already include iovec vnet hdr.

> +       struct iovec heads[VHOST_NET_MAX_SG];
> +       int rxmaxheadcount;
>         /* We use a kind of RCU to access private pointer.
>          * All readers access it from workqueue, which makes it possible 
> to
>          * flush the workqueue instead of synchronize_rcu. Therefore 
> readers do
> @@ -120,18 +120,20 @@ long vhost_dev_ioctl(struct vhost_dev *,
>  int vhost_vq_access_ok(struct vhost_virtqueue *vq);
>  int vhost_log_access_ok(struct vhost_dev *);
>  
> +unsigned vhost_get_heads(struct vhost_virtqueue *, int datalen, int 
> *iovcount,
> +                       struct vhost_log *log, unsigned int *log_num);
>  unsigned vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
>                            struct iovec iov[], unsigned int iov_count,
>                            unsigned int *out_num, unsigned int *in_num,
>                            struct vhost_log *log, unsigned int *log_num);
> -void vhost_discard_vq_desc(struct vhost_virtqueue *);
> +void vhost_discard_vq_desc(struct vhost_virtqueue *, int);
>  
> -int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
> -void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
> +int vhost_add_used(struct vhost_virtqueue *, struct iovec *heads, int 
> count);
>  void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue 
> *,
> -                              unsigned int head, int len);
> +                              struct iovec *heads, int count, bool 
> recvr);
> +void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *, bool);
>  void vhost_disable_notify(struct vhost_virtqueue *);
> -bool vhost_enable_notify(struct vhost_virtqueue *);
> +bool vhost_enable_notify(struct vhost_virtqueue *, bool);
>  
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
>                     unsigned int log_num, u64 len);
> @@ -149,7 +151,8 @@ enum {
>         VHOST_FEATURES = (1 << VIRTIO_F_NOTIFY_ON_EMPTY) |
>                          (1 << VIRTIO_RING_F_INDIRECT_DESC) |
>                          (1 << VHOST_F_LOG_ALL) |
> -                        (1 << VHOST_NET_F_VIRTIO_NET_HDR),
> +                        (1 << VHOST_NET_F_VIRTIO_NET_HDR) |
> +                        (1 << VIRTIO_NET_F_MRG_RXBUF),
>  };
>  
>  static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
> 



^ permalink raw reply

* [PATCH NEXT 5/8] qlcnic: use IDC defined timeout value
From: Amit Kumar Salecha @ 2010-03-31 12:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, Sucheta Chakraborty
In-Reply-To: <1270037026-9062-1-git-send-email-amit.salecha@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

o USE/Read IDC defined timeout value from ROM.
o While resetting chip, don't wait for other pci-func to respond,
  more than reset_ack_timeo seconds,

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    4 +++-
 drivers/net/qlcnic/qlcnic_hdr.h  |    2 ++
 drivers/net/qlcnic/qlcnic_init.c |   16 ++++++++++++++++
 drivers/net/qlcnic/qlcnic_main.c |   26 ++++++++++++++++----------
 4 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 8a3446d..87cd1a7 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -958,8 +958,9 @@ struct qlcnic_adapter {
 	u8 dev_state;
 	u8 diag_test;
 	u8 diag_cnt;
+	u8 reset_ack_timeo;
+	u8 dev_init_timeo;
 	u8 rsrd1;
-	u16 rsrd2;
 
 	u8 mac_addr[ETH_ALEN];
 
@@ -1040,6 +1041,7 @@ int qlcnic_need_fw_reset(struct qlcnic_adapter *adapter);
 void qlcnic_request_firmware(struct qlcnic_adapter *adapter);
 void qlcnic_release_firmware(struct qlcnic_adapter *adapter);
 int qlcnic_pinit_from_rom(struct qlcnic_adapter *adapter);
+void qlcnic_setup_idc_param(struct qlcnic_adapter *adapter);
 
 int qlcnic_rom_fast_read(struct qlcnic_adapter *adapter, int addr, int *valp);
 int qlcnic_rom_fast_read_words(struct qlcnic_adapter *adapter, int addr,
diff --git a/drivers/net/qlcnic/qlcnic_hdr.h b/drivers/net/qlcnic/qlcnic_hdr.h
index e9fb692..51fa3fb 100644
--- a/drivers/net/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/qlcnic/qlcnic_hdr.h
@@ -695,6 +695,8 @@ enum {
 #define QLCNIC_CRB_DRV_SCRATCH             (QLCNIC_CAM_RAM(0x148))
 #define QLCNIC_CRB_DEV_PARTITION_INFO      (QLCNIC_CAM_RAM(0x14c))
 #define QLCNIC_CRB_DRV_IDC_VER             (QLCNIC_CAM_RAM(0x14c))
+#define QLCNIC_ROM_DEV_INIT_TIMEOUT	(0x3e885c)
+#define QLCNIC_ROM_DRV_RESET_TIMEOUT	(0x3e8860)
 
 		 /* Device State */
 #define QLCNIC_DEV_COLD 		1
diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index 0a424e0..ccd24f4 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -529,6 +529,22 @@ int qlcnic_pinit_from_rom(struct qlcnic_adapter *adapter)
 	return 0;
 }
 
+void
+qlcnic_setup_idc_param(struct qlcnic_adapter *adapter) {
+
+	int timeo;
+
+	if (qlcnic_rom_fast_read(adapter, QLCNIC_ROM_DEV_INIT_TIMEOUT, &timeo))
+		timeo = 30;
+
+	adapter->dev_init_timeo = timeo;
+
+	if (qlcnic_rom_fast_read(adapter, QLCNIC_ROM_DRV_RESET_TIMEOUT, &timeo))
+		timeo = 10;
+
+	adapter->reset_ack_timeo = timeo;
+}
+
 static int
 qlcnic_has_mn(struct qlcnic_adapter *adapter)
 {
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index a234622..38e0829 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -649,7 +649,10 @@ qlcnic_start_firmware(struct qlcnic_adapter *adapter)
 	if (err)
 		return err;
 
-	if (!qlcnic_can_start_firmware(adapter))
+	err = qlcnic_can_start_firmware(adapter);
+	if (err < 0)
+		return err;
+	else if (!err)
 		goto wait_init;
 
 	first_boot = QLCRD32(adapter, QLCNIC_CAM_RAM(0x1fc));
@@ -1138,6 +1141,7 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_out_iounmap;
 	}
 
+	qlcnic_setup_idc_param(adapter);
 
 	err = qlcnic_start_firmware(adapter);
 	if (err)
@@ -2027,7 +2031,7 @@ static int
 qlcnic_can_start_firmware(struct qlcnic_adapter *adapter)
 {
 	u32 val, prev_state;
-	int cnt = 0;
+	u8 dev_init_timeo = adapter->dev_init_timeo;
 	int portnum = adapter->portnum;
 
 	if (qlcnic_api_lock(adapter))
@@ -2072,12 +2076,13 @@ start_fw:
 	}
 
 	qlcnic_api_unlock(adapter);
-	msleep(1000);
-	while ((QLCRD32(adapter, QLCNIC_CRB_DEV_STATE) != QLCNIC_DEV_READY) &&
-			++cnt < 20)
+
+	do {
 		msleep(1000);
+	} while ((QLCRD32(adapter, QLCNIC_CRB_DEV_STATE) != QLCNIC_DEV_READY)
+			&& --dev_init_timeo);
 
-	if (cnt >= 20)
+	if (!dev_init_timeo)
 		return -1;
 
 	if (qlcnic_api_lock(adapter))
@@ -2099,12 +2104,10 @@ qlcnic_fwinit_work(struct work_struct *work)
 			struct qlcnic_adapter, fw_work.work);
 	int dev_state;
 
-	if (++adapter->fw_wait_cnt > FW_POLL_THRESH)
-		goto err_ret;
-
 	if (test_bit(__QLCNIC_START_FW, &adapter->state)) {
 
-		if (qlcnic_check_drv_state(adapter)) {
+		if (qlcnic_check_drv_state(adapter) &&
+			(adapter->fw_wait_cnt++ < adapter->reset_ack_timeo)) {
 			qlcnic_schedule_work(adapter,
 					qlcnic_fwinit_work, FW_POLL_DELAY);
 			return;
@@ -2118,6 +2121,9 @@ qlcnic_fwinit_work(struct work_struct *work)
 		goto err_ret;
 	}
 
+	if (adapter->fw_wait_cnt++ > (adapter->dev_init_timeo / 2))
+		goto err_ret;
+
 	dev_state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
 	switch (dev_state) {
 	case QLCNIC_DEV_READY:
-- 
1.6.0.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox