Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH 34/35] Add the Xen virtual network device driver.
From: Christoph Hellwig @ 2006-05-09 11:58 UTC (permalink / raw)
  To: Chris Wright
  Cc: linux-kernel, virtualization, xen-devel, Ian Pratt,
	Christian Limpach, netdev
In-Reply-To: <20060509085201.446830000@sous-sol.org>

On Tue, May 09, 2006 at 12:00:34AM -0700, Chris Wright wrote:
> The network device frontend driver allows the kernel to access network
> devices exported exported by a virtual machine containing a physical
> network device driver.

Please don't add procfs code to new network drivers.  Especially if it's oopsable
like the code in this driver by simple device renaming.


^ permalink raw reply

* Re: [Xen-devel] [RFC PATCH 34/35] Add the Xen virtual network device driver.
From: Herbert Xu @ 2006-05-09 11:55 UTC (permalink / raw)
  To: Chris Wright
  Cc: linux-kernel, virtualization, Christian.Limpach, xen-devel,
	netdev, ian.pratt
In-Reply-To: <20060509085201.446830000@sous-sol.org>

Hi Chris:

Chris Wright <chrisw@sous-sol.org> wrote:
>
> +/** Send a packet on a net device to encourage switches to learn the
> + * MAC. We send a fake ARP request.
> + *
> + * @param dev device
> + * @return 0 on success, error code otherwise
> + */
> +static int send_fake_arp(struct net_device *dev)

I think we talked about this before.  I don't see why Xen is so special
that it needs its own gratuitous arp routine embedded in the driver.
If this is needed at all (presumably for migration) then it should be
performed by the management scripts which can send grat ARP packets just
as easily.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: iproute2 git repository
From: Herbert Xu @ 2006-05-09 11:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20060508153434.2b5155a9@localhost.localdomain>

Stephen Hemminger <shemminger@osdl.org> wrote:
> I moved iproute2 out of CVS. New home is:
>        git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

Thanks Stephen.

BTW, how come there is a checked out tree sitting in that git directory?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [NET] linkwatch: Handle jiffies wrap-around
From: Herbert Xu @ 2006-05-09 11:26 UTC (permalink / raw)
  To: Stefan Rompf; +Cc: David S. Miller, netdev
In-Reply-To: <200605082028.57328.stefan@loplof.de>

[-- Attachment #1: Type: text/plain, Size: 1118 bytes --]

On Mon, May 08, 2006 at 06:28:56PM +0000, Stefan Rompf wrote:
> 
> time_after() and friends can handle jiffies wrapping, however they require the 
> difference between compared times to be less than 0x80000000 jiffies (about 
> 24 days on HZ=1000) to work reliably on 32bit architectures. So if the 
> network is stable for 24 days, events generated within days 25-49 will suffer 
> a *huge* false delay.

How about something like this?

[NET] linkwatch: Handle jiffies wrap-around

The test used in the linkwatch does not handle wrap-arounds correctly.
Since the intention of the code is to eliminate bursts of messages we
can afford to delay things up to a second.  Using that fact we can
easily handle wrap-arounds by making sure that we don't delay things
by more than one second.

This is based on diagnosis and a patch by Stefan Rompf.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: linkwatch-wrap.patch --]
[-- Type: text/plain, Size: 717 bytes --]

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 341de44..646937c 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -170,13 +170,13 @@
 		spin_unlock_irqrestore(&lweventlist_lock, flags);
 
 		if (!test_and_set_bit(LW_RUNNING, &linkwatch_flags)) {
-			unsigned long thisevent = jiffies;
+			unsigned long delay = linkwatch_nextevent - jiffies;
 
-			if (thisevent >= linkwatch_nextevent) {
+			/* If we wrap around we'll delay it by at most HZ. */
+			if (!delay || delay > HZ)
 				schedule_work(&linkwatch_work);
-			} else {
-				schedule_delayed_work(&linkwatch_work, linkwatch_nextevent - thisevent);
-			}
+			else
+				schedule_delayed_work(&linkwatch_work, delay);
 		}
 	}
 }

^ permalink raw reply related

* [RFC PATCH 34/35] Add the Xen virtual network device driver.
From: Chris Wright @ 2006-05-09  7:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, xen-devel, Ian Pratt, Christian Limpach, netdev
In-Reply-To: <20060509084945.373541000@sous-sol.org>

[-- Attachment #1: netfront --]
[-- Type: text/plain, Size: 46804 bytes --]

The network device frontend driver allows the kernel to access network
devices exported exported by a virtual machine containing a physical
network device driver.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: netdev@vger.kernel.org
---
TODO:
- drop proc
- more ethtool ops
- s/g support

 drivers/net/Kconfig             |    2 
 drivers/xen/Kconfig.net         |   14 
 drivers/xen/Makefile            |    3 
 drivers/xen/net_driver_util.c   |   58 +
 drivers/xen/netfront/Makefile   |    4 
 drivers/xen/netfront/netfront.c | 1510 ++++++++++++++++++++++++++++++++++++++++
 include/xen/net_driver_util.h   |   48 +
 7 files changed, 1639 insertions(+)

--- linus-2.6.orig/drivers/net/Kconfig
+++ linus-2.6/drivers/net/Kconfig
@@ -2325,6 +2325,8 @@ source "drivers/atm/Kconfig"
 
 source "drivers/s390/net/Kconfig"
 
+source "drivers/xen/Kconfig.net"
+
 config ISERIES_VETH
 	tristate "iSeries Virtual Ethernet driver support"
 	depends on PPC_ISERIES
--- linus-2.6.orig/drivers/xen/Makefile
+++ linus-2.6/drivers/xen/Makefile
@@ -1,7 +1,10 @@
 
+obj-y	+= net_driver_util.o
 obj-y	+= util.o
 
 obj-y	+= core/
 obj-y	+= console/
 obj-y	+= xenbus/
 
+obj-$(CONFIG_XEN_NETDEV_FRONTEND)	+= netfront/
+
--- /dev/null
+++ linus-2.6/drivers/xen/Kconfig.net
@@ -0,0 +1,14 @@
+menu "Xen network device drivers"
+        depends on NETDEVICES && XEN
+
+config XEN_NETDEV_FRONTEND
+	tristate "Network-device frontend driver"
+	depends on XEN
+	default y
+	help
+	  The network-device frontend driver allows the kernel to access
+	  network interfaces within another guest OS. Unless you are building a
+	  dedicated device-driver domain, or your master control domain
+	  (domain 0), then you almost certainly want to say Y here.
+
+endmenu
--- /dev/null
+++ linus-2.6/drivers/xen/net_driver_util.c
@@ -0,0 +1,58 @@
+/*****************************************************************************
+ *
+ * Utility functions for Xen network devices.
+ *
+ * Copyright (c) 2005 XenSource Ltd.
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject
+ * to the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/if_ether.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <xen/net_driver_util.h>
+
+
+int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
+{
+	char *s;
+	int i;
+	char *e;
+	char *macstr = xenbus_read(XBT_NULL, dev->nodename, "mac", NULL);
+	if (IS_ERR(macstr))
+		return PTR_ERR(macstr);
+	s = macstr;
+	for (i = 0; i < ETH_ALEN; i++) {
+		mac[i] = simple_strtoul(s, &e, 16);
+		if (s == e || (e[0] != ':' && e[0] != 0)) {
+			kfree(macstr);
+			return -ENOENT;
+		}
+		s = &e[1];
+	}
+	kfree(macstr);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xen_net_read_mac);
--- /dev/null
+++ linus-2.6/drivers/xen/netfront/Makefile
@@ -0,0 +1,4 @@
+
+obj-$(CONFIG_XEN_NETDEV_FRONTEND)	:= xennet.o
+
+xennet-objs := netfront.o
--- /dev/null
+++ linus-2.6/drivers/xen/netfront/netfront.c
@@ -0,0 +1,1510 @@
+/******************************************************************************
+ * Virtual network driver for conversing with remote driver backends.
+ * 
+ * Copyright (c) 2002-2005, K A Fraser
+ * Copyright (c) 2005, XenSource Ltd
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/skbuff.h>
+#include <linux/init.h>
+#include <linux/bitops.h>
+#include <linux/proc_fs.h>
+#include <linux/ethtool.h>
+#include <linux/in.h>
+#include <net/sock.h>
+#include <net/pkt_sched.h>
+#include <net/arp.h>
+#include <net/route.h>
+#include <asm/io.h>
+#include <asm/uaccess.h>
+#include <xen/evtchn.h>
+#include <xen/xenbus.h>
+#include <xen/interface/io/netif.h>
+#include <xen/interface/memory.h>
+#ifdef CONFIG_XEN_BALLOON
+#include <xen/balloon.h>
+#endif
+#include <asm/page.h>
+#include <asm/uaccess.h>
+#include <xen/interface/grant_table.h>
+#include <xen/gnttab.h>
+#include <xen/net_driver_util.h>
+
+#define GRANT_INVALID_REF	0
+
+#define NET_TX_RING_SIZE __RING_SIZE((struct netif_tx_sring *)0, PAGE_SIZE)
+#define NET_RX_RING_SIZE __RING_SIZE((struct netif_rx_sring *)0, PAGE_SIZE)
+
+static inline void init_skb_shinfo(struct sk_buff *skb)
+{
+	atomic_set(&(skb_shinfo(skb)->dataref), 1);
+	skb_shinfo(skb)->nr_frags = 0;
+	skb_shinfo(skb)->frag_list = NULL;
+}
+
+struct netfront_info
+{
+	struct list_head list;
+	struct net_device *netdev;
+
+	struct net_device_stats stats;
+	unsigned int tx_full;
+
+	struct netif_tx_front_ring tx;
+	struct netif_rx_front_ring rx;
+
+	spinlock_t   tx_lock;
+	spinlock_t   rx_lock;
+
+	unsigned int handle;
+	unsigned int evtchn, irq;
+
+	/* What is the status of our connection to the remote backend? */
+#define BEST_CLOSED       0
+#define BEST_DISCONNECTED 1
+#define BEST_CONNECTED    2
+	unsigned int backend_state;
+
+	/* Is this interface open or closed (down or up)? */
+#define UST_CLOSED        0
+#define UST_OPEN          1
+	unsigned int user_state;
+
+	/* Receive-ring batched refills. */
+#define RX_MIN_TARGET 8
+#define RX_DFL_MIN_TARGET 64
+#define RX_MAX_TARGET NET_RX_RING_SIZE
+	int rx_min_target, rx_max_target, rx_target;
+	struct sk_buff_head rx_batch;
+
+	struct timer_list rx_refill_timer;
+
+	/*
+	 * {tx,rx}_skbs store outstanding skbuffs. The first entry in each
+	 * array is an index into a chain of free entries.
+	 */
+	struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1];
+	struct sk_buff *rx_skbs[NET_RX_RING_SIZE+1];
+
+	grant_ref_t gref_tx_head;
+	grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1];
+	grant_ref_t gref_rx_head;
+	grant_ref_t grant_rx_ref[NET_TX_RING_SIZE + 1];
+
+	struct xenbus_device *xbdev;
+	int tx_ring_ref;
+	int rx_ring_ref;
+	u8 mac[ETH_ALEN];
+
+	unsigned long rx_pfn_array[NET_RX_RING_SIZE];
+	struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
+	struct mmu_update rx_mmu[NET_RX_RING_SIZE];
+};
+
+/*
+ * Access macros for acquiring freeing slots in {tx,rx}_skbs[].
+ */
+
+static inline void add_id_to_freelist(struct sk_buff **list, unsigned short id)
+{
+	list[id] = list[0];
+	list[0]  = (void *)(unsigned long)id;
+}
+
+static inline unsigned short get_id_from_freelist(struct sk_buff **list)
+{
+	unsigned int id = (unsigned int)(unsigned long)list[0];
+	list[0] = list[id];
+	return id;
+}
+
+#ifdef DEBUG
+static char *be_state_name[] = {
+	[BEST_CLOSED]       = "closed",
+	[BEST_DISCONNECTED] = "disconnected",
+	[BEST_CONNECTED]    = "connected",
+};
+#endif
+
+#define DPRINTK(fmt, args...) pr_debug("netfront (%s:%d) " fmt, \
+                                       __FUNCTION__, __LINE__, ##args)
+#define IPRINTK(fmt, args...)				\
+	printk(KERN_INFO "netfront: " fmt, ##args)
+#define WPRINTK(fmt, args...)				\
+	printk(KERN_WARNING "netfront: " fmt, ##args)
+
+
+static int talk_to_backend(struct xenbus_device *, struct netfront_info *);
+static int setup_device(struct xenbus_device *, struct netfront_info *);
+static int create_netdev(int, struct xenbus_device *, struct net_device **);
+
+static void netfront_closing(struct xenbus_device *);
+
+static void end_access(int, void *);
+static void netif_disconnect_backend(struct netfront_info *);
+static void close_netdev(struct netfront_info *);
+static void netif_free(struct netfront_info *);
+
+static void show_device(struct netfront_info *);
+
+static void network_connect(struct net_device *);
+static void network_tx_buf_gc(struct net_device *);
+static void network_alloc_rx_buffers(struct net_device *);
+static int send_fake_arp(struct net_device *);
+
+static irqreturn_t netif_int(int irq, void *dev_id, struct pt_regs *ptregs);
+
+#define XEN_XENNET_PROC_INTERFACE
+#if defined(CONFIG_PROC_FS) && defined(XEN_XENNET_PROC_INTERFACE)
+static int xennet_proc_init(void);
+static int xennet_proc_addif(struct net_device *dev);
+static void xennet_proc_delif(struct net_device *dev);
+#else
+#define xennet_proc_init()  (0)
+#define xennet_proc_addif(d) (0)
+#define xennet_proc_delif(d) ((void)0)
+#endif
+
+
+/**
+ * Entry point to this code when a new device is created.  Allocate the basic
+ * structures and the ring buffers for communication with the backend, and
+ * inform the backend of the appropriate details for those.  Switch to
+ * Connected state.
+ */
+static int netfront_probe(struct xenbus_device *dev,
+			  const struct xenbus_device_id *id)
+{
+	int err;
+	struct net_device *netdev;
+	struct netfront_info *info;
+	unsigned int handle;
+
+	err = xenbus_scanf(XBT_NULL, dev->nodename, "handle", "%u", &handle);
+	if (err != 1) {
+		xenbus_dev_fatal(dev, err, "reading handle");
+		return err;
+	}
+
+	err = create_netdev(handle, dev, &netdev);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "creating netdev");
+		return err;
+	}
+
+	info = netdev_priv(netdev);
+	dev->data = info;
+
+	err = talk_to_backend(dev, info);
+	if (err) {
+		kfree(info);
+		dev->data = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
+
+/**
+ * We are reconnecting to the backend, due to a suspend/resume, or a backend
+ * driver restart.  We tear down our netif structure and recreate it, but
+ * leave the device-layer structures intact so that this is transparent to the
+ * rest of the kernel.
+ */
+static int netfront_resume(struct xenbus_device *dev)
+{
+	struct netfront_info *info = dev->data;
+
+	DPRINTK("%s\n", dev->nodename);
+
+	netif_disconnect_backend(info);
+	return talk_to_backend(dev, info);
+}
+
+
+/* Common code used when first setting up, and when resuming. */
+static int talk_to_backend(struct xenbus_device *dev,
+			   struct netfront_info *info)
+{
+	const char *message;
+	xenbus_transaction_t xbt;
+	int err;
+
+	err = xen_net_read_mac(dev, info->mac);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "parsing %s/mac", dev->nodename);
+		goto out;
+	}
+
+	/* Create shared ring, alloc event channel. */
+	err = setup_device(dev, info);
+	if (err)
+		goto out;
+
+again:
+	err = xenbus_transaction_start(&xbt);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "starting transaction");
+		goto destroy_ring;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref","%u",
+			    info->tx_ring_ref);
+	if (err) {
+		message = "writing tx ring-ref";
+		goto abort_transaction;
+	}
+	err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref","%u",
+			    info->rx_ring_ref);
+	if (err) {
+		message = "writing rx ring-ref";
+		goto abort_transaction;
+	}
+	err = xenbus_printf(xbt, dev->nodename,
+			    "event-channel", "%u", info->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename,
+			    "state", "%d", XenbusStateConnected);
+	if (err) {
+		message = "writing frontend XenbusStateConnected";
+		goto abort_transaction;
+	}
+
+	err = xenbus_transaction_end(xbt, 0);
+	if (err) {
+		if (err == -EAGAIN)
+			goto again;
+		xenbus_dev_fatal(dev, err, "completing transaction");
+		goto destroy_ring;
+	}
+
+	return 0;
+
+ abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	xenbus_dev_fatal(dev, err, "%s", message);
+ destroy_ring:
+	netif_free(info);
+ out:
+	return err;
+}
+
+
+static int setup_device(struct xenbus_device *dev, struct netfront_info *info)
+{
+	struct netif_tx_sring *txs;
+	struct netif_rx_sring *rxs;
+	int err;
+	struct net_device *netdev = info->netdev;
+
+	info->tx_ring_ref = GRANT_INVALID_REF;
+	info->rx_ring_ref = GRANT_INVALID_REF;
+	info->rx.sring = NULL;
+	info->tx.sring = NULL;
+	info->irq = 0;
+
+	txs = (struct netif_tx_sring *)get_zeroed_page(GFP_KERNEL);
+	if (!txs) {
+		err = -ENOMEM;
+		xenbus_dev_fatal(dev, err, "allocating tx ring page");
+		goto fail;
+	}
+	rxs = (struct netif_rx_sring *)get_zeroed_page(GFP_KERNEL);
+	if (!rxs) {
+		err = -ENOMEM;
+		xenbus_dev_fatal(dev, err, "allocating rx ring page");
+		free_page((unsigned long)txs);
+		goto fail;
+	}
+	info->backend_state = BEST_DISCONNECTED;
+
+	SHARED_RING_INIT(txs);
+	FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
+
+	SHARED_RING_INIT(rxs);
+	FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
+
+	err = xenbus_grant_ring(dev, virt_to_mfn(txs));
+	if (err < 0)
+		goto fail;
+	info->tx_ring_ref = err;
+
+	err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
+	if (err < 0)
+		goto fail;
+	info->rx_ring_ref = err;
+
+	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	if (err)
+		goto fail;
+
+	memcpy(netdev->dev_addr, info->mac, ETH_ALEN);
+	network_connect(netdev);
+	info->irq = bind_evtchn_to_irqhandler(
+		info->evtchn, netif_int, SA_SAMPLE_RANDOM, netdev->name,
+		netdev);
+	(void)send_fake_arp(netdev);
+	show_device(info);
+
+	return 0;
+
+ fail:
+	netif_free(info);
+	return err;
+}
+
+
+/**
+ * Callback received when the backend's state changes.
+ */
+static void backend_changed(struct xenbus_device *dev,
+			    XenbusState backend_state)
+{
+	DPRINTK("\n");
+
+	switch (backend_state) {
+	case XenbusStateInitialising:
+	case XenbusStateInitWait:
+	case XenbusStateInitialised:
+	case XenbusStateConnected:
+	case XenbusStateUnknown:
+	case XenbusStateClosed:
+		break;
+
+	case XenbusStateClosing:
+		netfront_closing(dev);
+		break;
+	}
+}
+
+
+/** Send a packet on a net device to encourage switches to learn the
+ * MAC. We send a fake ARP request.
+ *
+ * @param dev device
+ * @return 0 on success, error code otherwise
+ */
+static int send_fake_arp(struct net_device *dev)
+{
+	struct sk_buff *skb;
+	u32             src_ip, dst_ip;
+
+	dst_ip = INADDR_BROADCAST;
+	src_ip = inet_select_addr(dev, dst_ip, RT_SCOPE_LINK);
+
+	/* No IP? Then nothing to do. */
+	if (src_ip == 0)
+		return 0;
+
+	skb = arp_create(ARPOP_REPLY, ETH_P_ARP,
+			 dst_ip, dev, src_ip,
+			 /*dst_hw*/ NULL, /*src_hw*/ NULL,
+			 /*target_hw*/ dev->dev_addr);
+	if (skb == NULL)
+		return -ENOMEM;
+
+	return dev_queue_xmit(skb);
+}
+
+
+static int network_open(struct net_device *dev)
+{
+	struct netfront_info *np = netdev_priv(dev);
+
+	memset(&np->stats, 0, sizeof(np->stats));
+
+	np->user_state = UST_OPEN;
+
+	network_alloc_rx_buffers(dev);
+	np->rx.sring->rsp_event = np->rx.rsp_cons + 1;
+
+	netif_start_queue(dev);
+
+	return 0;
+}
+
+static void network_tx_buf_gc(struct net_device *dev)
+{
+	RING_IDX i, prod;
+	unsigned short id;
+	struct netfront_info *np = netdev_priv(dev);
+	struct sk_buff *skb;
+
+	if (np->backend_state != BEST_CONNECTED)
+		return;
+
+	do {
+		prod = np->tx.sring->rsp_prod;
+		rmb(); /* Ensure we see responses up to 'rp'. */
+
+		for (i = np->tx.rsp_cons; i != prod; i++) {
+			id  = RING_GET_RESPONSE(&np->tx, i)->id;
+			skb = np->tx_skbs[id];
+			if (unlikely(gnttab_query_foreign_access(
+				np->grant_tx_ref[id]) != 0)) {
+				printk(KERN_ALERT "network_tx_buf_gc: warning "
+				       "-- grant still in use by backend "
+				       "domain.\n");
+				goto out;
+			}
+			gnttab_end_foreign_access_ref(
+				np->grant_tx_ref[id], GNTMAP_readonly);
+			gnttab_release_grant_reference(
+				&np->gref_tx_head, np->grant_tx_ref[id]);
+			np->grant_tx_ref[id] = GRANT_INVALID_REF;
+			add_id_to_freelist(np->tx_skbs, id);
+			dev_kfree_skb_irq(skb);
+		}
+
+		np->tx.rsp_cons = prod;
+
+		/*
+		 * Set a new event, then check for race with update of tx_cons.
+		 * Note that it is essential to schedule a callback, no matter
+		 * how few buffers are pending. Even if there is space in the
+		 * transmit ring, higher layers may be blocked because too much
+		 * data is outstanding: in such cases notification from Xen is
+		 * likely to be the only kick that we'll get.
+		 */
+		np->tx.sring->rsp_event =
+			prod + ((np->tx.sring->req_prod - prod) >> 1) + 1;
+		mb();
+	} while (prod != np->tx.sring->rsp_prod);
+
+ out:
+	if (np->tx_full &&
+	    ((np->tx.sring->req_prod - prod) < NET_TX_RING_SIZE)) {
+		np->tx_full = 0;
+		if (np->user_state == UST_OPEN)
+			netif_wake_queue(dev);
+	}
+}
+
+
+static void rx_refill_timeout(unsigned long data)
+{
+	struct net_device *dev = (struct net_device *)data;
+	netif_rx_schedule(dev);
+}
+
+
+static void network_alloc_rx_buffers(struct net_device *dev)
+{
+	unsigned short id;
+	struct netfront_info *np = netdev_priv(dev);
+	struct sk_buff *skb;
+	int i, batch_target;
+	RING_IDX req_prod = np->rx.req_prod_pvt;
+	struct xen_memory_reservation reservation;
+	grant_ref_t ref;
+
+	if (unlikely(np->backend_state != BEST_CONNECTED))
+		return;
+
+	/*
+	 * Allocate skbuffs greedily, even though we batch updates to the
+	 * receive ring. This creates a less bursty demand on the memory
+	 * allocator, so should reduce the chance of failed allocation requests
+	 * both for ourself and for other kernel subsystems.
+	 */
+	batch_target = np->rx_target - (req_prod - np->rx.rsp_cons);
+	for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) {
+		/*
+		 * Subtract dev_alloc_skb headroom (16 bytes) and shared info
+		 * tailroom then round down to SKB_DATA_ALIGN boundary.
+		 */
+		skb = __dev_alloc_skb(
+			((PAGE_SIZE - sizeof(struct skb_shared_info)) &
+			 (-SKB_DATA_ALIGN(1))) - 16,
+			GFP_ATOMIC|__GFP_NOWARN);
+		if (skb == NULL) {
+			/* Any skbuffs queued for refill? Force them out. */
+			if (i != 0)
+				goto refill;
+			/* Could not allocate any skbuffs. Try again later. */
+			mod_timer(&np->rx_refill_timer,
+				  jiffies + (HZ/10));
+			return;
+		}
+		__skb_queue_tail(&np->rx_batch, skb);
+	}
+
+	/* Is the batch large enough to be worthwhile? */
+	if (i < (np->rx_target/2))
+		return;
+
+	/* Adjust our fill target if we risked running out of buffers. */
+	if (((req_prod - np->rx.sring->rsp_prod) < (np->rx_target / 4)) &&
+	    ((np->rx_target *= 2) > np->rx_max_target))
+		np->rx_target = np->rx_max_target;
+
+ refill:
+	for (i = 0; ; i++) {
+		if ((skb = __skb_dequeue(&np->rx_batch)) == NULL)
+			break;
+
+		skb->dev = dev;
+
+		id = get_id_from_freelist(np->rx_skbs);
+
+		np->rx_skbs[id] = skb;
+
+		RING_GET_REQUEST(&np->rx, req_prod + i)->id = id;
+		ref = gnttab_claim_grant_reference(&np->gref_rx_head);
+		BUG_ON((signed short)ref < 0);
+		np->grant_rx_ref[id] = ref;
+		gnttab_grant_foreign_transfer_ref(ref,
+						  np->xbdev->otherend_id,
+						  __pa(skb->head) >> PAGE_SHIFT);
+		RING_GET_REQUEST(&np->rx, req_prod + i)->gref = ref;
+		np->rx_pfn_array[i] = virt_to_mfn(skb->head);
+
+#ifndef CONFIG_XEN_SHADOW_MODE
+		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			/* Remove this page before passing back to Xen. */
+			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
+					    INVALID_P2M_ENTRY);
+			MULTI_update_va_mapping(np->rx_mcl+i,
+						(unsigned long)skb->head,
+						__pte(0), 0);
+		}
+#endif
+	}
+
+#ifdef CONFIG_XEN_BALLOON
+	/* Tell the ballon driver what is going on. */
+	balloon_update_driver_allowance(i);
+#endif
+
+	reservation.extent_start = np->rx_pfn_array;
+	reservation.nr_extents   = i;
+	reservation.extent_order = 0;
+	reservation.address_bits = 0;
+	reservation.domid        = DOMID_SELF;
+
+#ifndef CONFIG_XEN_SHADOW_MODE
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		/* After all PTEs have been zapped, flush the TLB. */
+		np->rx_mcl[i-1].args[MULTI_UVMFLAGS_INDEX] =
+			UVMF_TLB_FLUSH|UVMF_ALL;
+
+		/* Give away a batch of pages. */
+		np->rx_mcl[i].op = __HYPERVISOR_memory_op;
+		np->rx_mcl[i].args[0] = XENMEM_decrease_reservation;
+		np->rx_mcl[i].args[1] = (unsigned long)&reservation;
+
+		/* Zap PTEs and give away pages in one big multicall. */
+		(void)HYPERVISOR_multicall(np->rx_mcl, i+1);
+
+		/* Check return status of HYPERVISOR_memory_op(). */
+		if (unlikely(np->rx_mcl[i].result != i))
+			panic("Unable to reduce memory reservation\n");
+	} else
+#endif
+		if (HYPERVISOR_memory_op(XENMEM_decrease_reservation,
+					 &reservation) != i)
+			panic("Unable to reduce memory reservation\n");
+
+	/* Above is a suitable barrier to ensure backend will see requests. */
+	np->rx.req_prod_pvt = req_prod + i;
+	RING_PUSH_REQUESTS(&np->rx);
+}
+
+
+static int network_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	unsigned short id;
+	struct netfront_info *np = netdev_priv(dev);
+	struct netif_tx_request *tx;
+	RING_IDX i;
+	grant_ref_t ref;
+	unsigned long mfn;
+	int notify;
+
+	if (unlikely(np->tx_full)) {
+		printk(KERN_ALERT "%s: full queue wasn't stopped!\n",
+		       dev->name);
+		netif_stop_queue(dev);
+		goto drop;
+	}
+
+	if (unlikely((((unsigned long)skb->data & ~PAGE_MASK) + skb->len) >=
+		     PAGE_SIZE)) {
+		struct sk_buff *nskb;
+		nskb = __dev_alloc_skb(skb->len, GFP_ATOMIC|__GFP_NOWARN);
+		if (unlikely(nskb == NULL))
+			goto drop;
+		skb_put(nskb, skb->len);
+		memcpy(nskb->data, skb->data, skb->len);
+		nskb->dev = skb->dev;
+		dev_kfree_skb(skb);
+		skb = nskb;
+	}
+
+	spin_lock_irq(&np->tx_lock);
+
+	if (np->backend_state != BEST_CONNECTED) {
+		spin_unlock_irq(&np->tx_lock);
+		goto drop;
+	}
+
+	i = np->tx.req_prod_pvt;
+
+	id = get_id_from_freelist(np->tx_skbs);
+	np->tx_skbs[id] = skb;
+
+	tx = RING_GET_REQUEST(&np->tx, i);
+
+	tx->id   = id;
+	ref = gnttab_claim_grant_reference(&np->gref_tx_head);
+	BUG_ON((signed short)ref < 0);
+	mfn = virt_to_mfn(skb->data);
+	gnttab_grant_foreign_access_ref(
+		ref, np->xbdev->otherend_id, mfn, GNTMAP_readonly);
+	tx->gref = np->grant_tx_ref[id] = ref;
+	tx->offset = (unsigned long)skb->data & ~PAGE_MASK;
+	tx->size = skb->len;
+	tx->flags = (skb->ip_summed == CHECKSUM_HW) ? NETTXF_csum_blank : 0;
+
+	np->tx.req_prod_pvt = i + 1;
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->tx, notify);
+	if (notify)
+		notify_remote_via_irq(np->irq);
+
+	network_tx_buf_gc(dev);
+
+	if (RING_FULL(&np->tx)) {
+		np->tx_full = 1;
+		netif_stop_queue(dev);
+	}
+
+	spin_unlock_irq(&np->tx_lock);
+
+	np->stats.tx_bytes += skb->len;
+	np->stats.tx_packets++;
+
+	return 0;
+
+ drop:
+	np->stats.tx_dropped++;
+	dev_kfree_skb(skb);
+	return 0;
+}
+
+static irqreturn_t netif_int(int irq, void *dev_id, struct pt_regs *ptregs)
+{
+	struct net_device *dev = dev_id;
+	struct netfront_info *np = netdev_priv(dev);
+	unsigned long flags;
+
+	spin_lock_irqsave(&np->tx_lock, flags);
+	network_tx_buf_gc(dev);
+	spin_unlock_irqrestore(&np->tx_lock, flags);
+
+	if (RING_HAS_UNCONSUMED_RESPONSES(&np->rx) &&
+	    (np->user_state == UST_OPEN))
+		netif_rx_schedule(dev);
+
+	return IRQ_HANDLED;
+}
+
+
+static int netif_poll(struct net_device *dev, int *pbudget)
+{
+	struct netfront_info *np = netdev_priv(dev);
+	struct sk_buff *skb, *nskb;
+	struct netif_rx_response *rx;
+	RING_IDX i, rp;
+	struct mmu_update *mmu = np->rx_mmu;
+	struct multicall_entry *mcl = np->rx_mcl;
+	int work_done, budget, more_to_do = 1;
+	struct sk_buff_head rxq;
+	unsigned long flags;
+	unsigned long mfn;
+	grant_ref_t ref;
+
+	spin_lock(&np->rx_lock);
+
+	if (np->backend_state != BEST_CONNECTED) {
+		spin_unlock(&np->rx_lock);
+		return 0;
+	}
+
+	skb_queue_head_init(&rxq);
+
+	if ((budget = *pbudget) > dev->quota)
+		budget = dev->quota;
+	rp = np->rx.sring->rsp_prod;
+	rmb(); /* Ensure we see queued responses up to 'rp'. */
+
+	for (i = np->rx.rsp_cons, work_done = 0;
+	     (i != rp) && (work_done < budget);
+	     i++, work_done++) {
+		rx = RING_GET_RESPONSE(&np->rx, i);
+
+		/*
+                 * This definitely indicates a bug, either in this driver or
+                 * in the backend driver. In future this should flag the bad
+                 * situation to the system controller to reboot the backed.
+                 */
+		if ((ref = np->grant_rx_ref[rx->id]) == GRANT_INVALID_REF) {
+			WPRINTK("Bad rx response id %d.\n", rx->id);
+			work_done--;
+			continue;
+		}
+
+		/* Memory pressure, insufficient buffer headroom, ... */
+		if ((mfn = gnttab_end_foreign_transfer_ref(ref)) == 0) {
+			if (net_ratelimit())
+				WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
+					rx->id, rx->status);
+			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id =
+				rx->id;
+			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref =
+				ref;
+			np->rx.req_prod_pvt++;
+			RING_PUSH_REQUESTS(&np->rx);
+			work_done--;
+			continue;
+		}
+
+		gnttab_release_grant_reference(&np->gref_rx_head, ref);
+		np->grant_rx_ref[rx->id] = GRANT_INVALID_REF;
+
+		skb = np->rx_skbs[rx->id];
+		add_id_to_freelist(np->rx_skbs, rx->id);
+
+		/* NB. We handle skb overflow later. */
+		skb->data = skb->head + rx->offset;
+		skb->len  = rx->status;
+		skb->tail = skb->data + skb->len;
+
+		if (rx->flags & NETRXF_data_validated)
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+		np->stats.rx_packets++;
+		np->stats.rx_bytes += rx->status;
+
+#ifndef CONFIG_XEN_SHADOW_MODE
+		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			/* Remap the page. */
+			MULTI_update_va_mapping(mcl, (unsigned long)skb->head,
+						pfn_pte_ma(mfn, PAGE_KERNEL),
+						0);
+			mcl++;
+			mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
+				| MMU_MACHPHYS_UPDATE;
+			mmu->val = __pa(skb->head) >> PAGE_SHIFT;
+			mmu++;
+
+			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
+					    mfn);
+		}
+#endif
+
+		__skb_queue_tail(&rxq, skb);
+	}
+
+#ifdef CONFIG_XEN_BALLOON
+	/* Some pages are no longer absent... */
+	balloon_update_driver_allowance(-work_done);
+#endif
+
+	/* Do all the remapping work, and M2P updates, in one big hypercall. */
+	if (likely((mcl - np->rx_mcl) != 0)) {
+		mcl->op = __HYPERVISOR_mmu_update;
+		mcl->args[0] = (unsigned long)np->rx_mmu;
+		mcl->args[1] = mmu - np->rx_mmu;
+		mcl->args[2] = 0;
+		mcl->args[3] = DOMID_SELF;
+		mcl++;
+		(void)HYPERVISOR_multicall(np->rx_mcl, mcl - np->rx_mcl);
+	}
+
+	while ((skb = __skb_dequeue(&rxq)) != NULL) {
+		if (skb->len > (dev->mtu + ETH_HLEN + 4)) {
+			if (net_ratelimit())
+				printk(KERN_INFO "Received packet too big for "
+				       "MTU (%d > %d)\n",
+				       skb->len - ETH_HLEN - 4, dev->mtu);
+			skb->len  = 0;
+			skb->tail = skb->data;
+			init_skb_shinfo(skb);
+			dev_kfree_skb(skb);
+			continue;
+		}
+
+		/*
+		 * Enough room in skbuff for the data we were passed? Also,
+		 * Linux expects at least 16 bytes headroom in each rx buffer.
+		 */
+		if (unlikely(skb->tail > skb->end) ||
+		    unlikely((skb->data - skb->head) < 16)) {
+			if (net_ratelimit()) {
+				if (skb->tail > skb->end)
+					printk(KERN_INFO "Received packet "
+					       "is %zd bytes beyond tail.\n",
+					       skb->tail - skb->end);
+				else
+					printk(KERN_INFO "Received packet "
+					       "is %zd bytes before head.\n",
+					       16 - (skb->data - skb->head));
+			}
+
+			nskb = __dev_alloc_skb(skb->len + 2,
+					       GFP_ATOMIC|__GFP_NOWARN);
+			if (nskb != NULL) {
+				skb_reserve(nskb, 2);
+				skb_put(nskb, skb->len);
+				memcpy(nskb->data, skb->data, skb->len);
+				nskb->dev = skb->dev;
+				nskb->ip_summed = skb->ip_summed;
+			}
+
+			/* Reinitialise and then destroy the old skbuff. */
+			skb->len  = 0;
+			skb->tail = skb->data;
+			init_skb_shinfo(skb);
+			dev_kfree_skb(skb);
+
+			/* Switch old for new, if we copied the buffer. */
+			if ((skb = nskb) == NULL)
+				continue;
+		}
+
+		/* Set the shinfo area, which is hidden behind the data. */
+		init_skb_shinfo(skb);
+		/* Ethernet work: Delayed to here as it peeks the header. */
+		skb->protocol = eth_type_trans(skb, dev);
+
+		/* Pass it up. */
+		netif_receive_skb(skb);
+		dev->last_rx = jiffies;
+	}
+
+	np->rx.rsp_cons = i;
+
+	/* If we get a callback with very few responses, reduce fill target. */
+	/* NB. Note exponential increase, linear decrease. */
+	if (((np->rx.req_prod_pvt - np->rx.sring->rsp_prod) >
+	     ((3*np->rx_target) / 4)) &&
+	    (--np->rx_target < np->rx_min_target))
+		np->rx_target = np->rx_min_target;
+
+	network_alloc_rx_buffers(dev);
+
+	*pbudget   -= work_done;
+	dev->quota -= work_done;
+
+	if (work_done < budget) {
+		local_irq_save(flags);
+
+		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
+		if (!more_to_do)
+			__netif_rx_complete(dev);
+
+		local_irq_restore(flags);
+	}
+
+	spin_unlock(&np->rx_lock);
+
+	return more_to_do;
+}
+
+
+static int network_close(struct net_device *dev)
+{
+	struct netfront_info *np = netdev_priv(dev);
+	np->user_state = UST_CLOSED;
+	netif_stop_queue(np->netdev);
+	return 0;
+}
+
+
+static struct net_device_stats *network_get_stats(struct net_device *dev)
+{
+	struct netfront_info *np = netdev_priv(dev);
+	return &np->stats;
+}
+
+static void network_connect(struct net_device *dev)
+{
+	struct netfront_info *np;
+	int i, requeue_idx;
+	struct netif_tx_request *tx;
+	struct sk_buff *skb;
+
+	np = netdev_priv(dev);
+	spin_lock_irq(&np->tx_lock);
+	spin_lock(&np->rx_lock);
+
+	/* Recovery procedure: */
+
+	/* Step 1: Reinitialise variables. */
+	np->tx_full = 0;
+
+	/*
+	 * Step 2: Rebuild the RX and TX ring contents.
+	 * NB. We could just free the queued TX packets now but we hope
+	 * that sending them out might do some good.  We have to rebuild
+	 * the RX ring because some of our pages are currently flipped out
+	 * so we can't just free the RX skbs.
+	 * NB2. Freelist index entries are always going to be less than
+	 *  __PAGE_OFFSET, whereas pointers to skbs will always be equal or
+	 * greater than __PAGE_OFFSET: we use this property to distinguish
+	 * them.
+	 */
+
+	/*
+	 * Rebuild the TX buffer freelist and the TX ring itself.
+	 * NB. This reorders packets.  We could keep more private state
+	 * to avoid this but maybe it doesn't matter so much given the
+	 * interface has been down.
+	 */
+	for (requeue_idx = 0, i = 1; i <= NET_TX_RING_SIZE; i++) {
+		if ((unsigned long)np->tx_skbs[i] < __PAGE_OFFSET)
+			continue;
+
+		skb = np->tx_skbs[i];
+
+		tx = RING_GET_REQUEST(&np->tx, requeue_idx);
+		requeue_idx++;
+
+		tx->id = i;
+		gnttab_grant_foreign_access_ref(
+			np->grant_tx_ref[i], np->xbdev->otherend_id,
+			virt_to_mfn(np->tx_skbs[i]->data),
+			GNTMAP_readonly);
+		tx->gref = np->grant_tx_ref[i];
+		tx->offset = (unsigned long)skb->data & ~PAGE_MASK;
+		tx->size = skb->len;
+		tx->flags = (skb->ip_summed == CHECKSUM_HW) ?
+			NETTXF_csum_blank : 0;
+
+		np->stats.tx_bytes += skb->len;
+		np->stats.tx_packets++;
+	}
+
+	np->tx.req_prod_pvt = requeue_idx;
+	RING_PUSH_REQUESTS(&np->tx);
+
+	/* Rebuild the RX buffer freelist and the RX ring itself. */
+	for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) {
+		if ((unsigned long)np->rx_skbs[i] < __PAGE_OFFSET)
+			continue;
+		gnttab_grant_foreign_transfer_ref(
+			np->grant_rx_ref[i], np->xbdev->otherend_id,
+			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
+		RING_GET_REQUEST(&np->rx, requeue_idx)->gref =
+			np->grant_rx_ref[i];
+		RING_GET_REQUEST(&np->rx, requeue_idx)->id = i;
+		requeue_idx++;
+	}
+
+	np->rx.req_prod_pvt = requeue_idx;
+	RING_PUSH_REQUESTS(&np->rx);
+
+	/*
+	 * Step 3: All public and private state should now be sane.  Get
+	 * ready to start sending and receiving packets and give the driver
+	 * domain a kick because we've probably just requeued some
+	 * packets.
+	 */
+	np->backend_state = BEST_CONNECTED;
+	notify_remote_via_irq(np->irq);
+	network_tx_buf_gc(dev);
+
+	if (np->user_state == UST_OPEN)
+		netif_start_queue(dev);
+
+	spin_unlock(&np->rx_lock);
+	spin_unlock_irq(&np->tx_lock);
+}
+
+static void show_device(struct netfront_info *np)
+{
+#ifdef DEBUG
+	if (np) {
+		IPRINTK("<vif handle=%u %s(%s) evtchn=%u tx=%p rx=%p>\n",
+			np->handle,
+			be_state_name[np->backend_state],
+			np->user_state ? "open" : "closed",
+			np->evtchn,
+			np->tx,
+			np->rx);
+	} else
+		IPRINTK("<vif NULL>\n");
+#endif
+}
+
+static void netif_uninit(struct net_device *dev)
+{
+	struct netfront_info *np = netdev_priv(dev);
+	gnttab_free_grant_references(np->gref_tx_head);
+	gnttab_free_grant_references(np->gref_rx_head);
+}
+
+static struct ethtool_ops network_ethtool_ops =
+{
+	.get_tx_csum = ethtool_op_get_tx_csum,
+	.set_tx_csum = ethtool_op_set_tx_csum,
+};
+
+/** Create a network device.
+ * @param handle device handle
+ * @param val return parameter for created device
+ * @return 0 on success, error code otherwise
+ */
+static int create_netdev(int handle, struct xenbus_device *dev,
+			 struct net_device **val)
+{
+	int i, err = 0;
+	struct net_device *netdev = NULL;
+	struct netfront_info *np = NULL;
+
+	if ((netdev = alloc_etherdev(sizeof(struct netfront_info))) == NULL) {
+		printk(KERN_WARNING "%s> alloc_etherdev failed.\n",
+		       __FUNCTION__);
+		err = -ENOMEM;
+		goto exit;
+	}
+
+	np                = netdev_priv(netdev);
+	np->backend_state = BEST_CLOSED;
+	np->user_state    = UST_CLOSED;
+	np->handle        = handle;
+	np->xbdev         = dev;
+
+	spin_lock_init(&np->tx_lock);
+	spin_lock_init(&np->rx_lock);
+
+	skb_queue_head_init(&np->rx_batch);
+	np->rx_target     = RX_DFL_MIN_TARGET;
+	np->rx_min_target = RX_DFL_MIN_TARGET;
+	np->rx_max_target = RX_MAX_TARGET;
+
+	init_timer(&np->rx_refill_timer);
+	np->rx_refill_timer.data = (unsigned long)netdev;
+	np->rx_refill_timer.function = rx_refill_timeout;
+
+	/* Initialise {tx,rx}_skbs as a free chain containing every entry. */
+	for (i = 0; i <= NET_TX_RING_SIZE; i++) {
+		np->tx_skbs[i] = (void *)((unsigned long) i+1);
+		np->grant_tx_ref[i] = GRANT_INVALID_REF;
+	}
+
+	for (i = 0; i <= NET_RX_RING_SIZE; i++) {
+		np->rx_skbs[i] = (void *)((unsigned long) i+1);
+		np->grant_rx_ref[i] = GRANT_INVALID_REF;
+	}
+
+	/* A grant for every tx ring slot */
+	if (gnttab_alloc_grant_references(NET_TX_RING_SIZE,
+					  &np->gref_tx_head) < 0) {
+		printk(KERN_ALERT "#### netfront can't alloc tx grant refs\n");
+		err = -ENOMEM;
+		goto exit;
+	}
+	/* A grant for every rx ring slot */
+	if (gnttab_alloc_grant_references(NET_RX_RING_SIZE,
+					  &np->gref_rx_head) < 0) {
+		printk(KERN_ALERT "#### netfront can't alloc rx grant refs\n");
+		gnttab_free_grant_references(np->gref_tx_head);
+		err = -ENOMEM;
+		goto exit;
+	}
+
+	netdev->open            = network_open;
+	netdev->hard_start_xmit = network_start_xmit;
+	netdev->stop            = network_close;
+	netdev->get_stats       = network_get_stats;
+	netdev->poll            = netif_poll;
+	netdev->uninit          = netif_uninit;
+	netdev->weight          = 64;
+	netdev->features        = NETIF_F_IP_CSUM;
+
+	SET_ETHTOOL_OPS(netdev, &network_ethtool_ops);
+	SET_MODULE_OWNER(netdev);
+	SET_NETDEV_DEV(netdev, &dev->dev);
+
+	if ((err = register_netdev(netdev)) != 0) {
+		printk(KERN_WARNING "%s> register_netdev err=%d\n",
+		       __FUNCTION__, err);
+		goto exit_free_grefs;
+	}
+
+	if ((err = xennet_proc_addif(netdev)) != 0) {
+		unregister_netdev(netdev);
+		goto exit_free_grefs;
+	}
+
+	np->netdev = netdev;
+
+ exit:
+	if (err != 0) {
+		if (netdev)
+			free_netdev(netdev);
+	} else if (val != NULL)
+		*val = netdev;
+	return err;
+
+ exit_free_grefs:
+	gnttab_free_grant_references(np->gref_tx_head);
+	gnttab_free_grant_references(np->gref_rx_head);
+	goto exit;
+}
+
+/*
+ * We use this notifier to send out a fake ARP reply to reset switches and
+ * router ARP caches when an IP interface is brought up on a VIF.
+ */
+static int
+inetdev_notify(struct notifier_block *this, unsigned long event, void *ptr)
+{
+	struct in_ifaddr  *ifa = (struct in_ifaddr *)ptr;
+	struct net_device *dev = ifa->ifa_dev->dev;
+
+	/* UP event and is it one of our devices? */
+	if (event == NETDEV_UP && dev->open == network_open)
+		(void)send_fake_arp(dev);
+
+	return NOTIFY_DONE;
+}
+
+
+/* ** Close down ** */
+
+
+/**
+ * Handle the change of state of the backend to Closing.  We must delete our
+ * device-layer structures now, to ensure that writes are flushed through to
+ * the backend.  Once is this done, we can switch to Closed in
+ * acknowledgement.
+ */
+static void netfront_closing(struct xenbus_device *dev)
+{
+	struct netfront_info *info = dev->data;
+
+	DPRINTK("netfront_closing: %s removed\n", dev->nodename);
+
+	close_netdev(info);
+
+	xenbus_switch_state(dev, XBT_NULL, XenbusStateClosed);
+}
+
+
+static int netfront_remove(struct xenbus_device *dev)
+{
+	struct netfront_info *info = dev->data;
+
+	DPRINTK("%s\n", dev->nodename);
+
+	netif_disconnect_backend(info);
+	free_netdev(info->netdev);
+
+	return 0;
+}
+
+
+static void close_netdev(struct netfront_info *info)
+{
+	spin_lock_irq(&info->netdev->xmit_lock);
+	netif_stop_queue(info->netdev);
+	spin_unlock_irq(&info->netdev->xmit_lock);
+
+#if defined(CONFIG_PROC_FS) && defined(XEN_XENNET_PROC_INTERFACE)
+	xennet_proc_delif(info->netdev);
+#endif
+
+	del_timer_sync(&info->rx_refill_timer);
+
+	unregister_netdev(info->netdev);
+}
+
+
+static void netif_disconnect_backend(struct netfront_info *info)
+{
+	/* Stop old i/f to prevent errors whilst we rebuild the state. */
+	spin_lock_irq(&info->tx_lock);
+	spin_lock(&info->rx_lock);
+	info->backend_state = BEST_DISCONNECTED;
+	spin_unlock(&info->rx_lock);
+	spin_unlock_irq(&info->tx_lock);
+
+	if (info->irq)
+		unbind_from_irqhandler(info->irq, info->netdev);
+	info->evtchn = info->irq = 0;
+
+	end_access(info->tx_ring_ref, info->tx.sring);
+	end_access(info->rx_ring_ref, info->rx.sring);
+	info->tx_ring_ref = GRANT_INVALID_REF;
+	info->rx_ring_ref = GRANT_INVALID_REF;
+	info->tx.sring = NULL;
+	info->rx.sring = NULL;
+}
+
+
+static void netif_free(struct netfront_info *info)
+{
+	close_netdev(info);
+	netif_disconnect_backend(info);
+	free_netdev(info->netdev);
+}
+
+
+static void end_access(int ref, void *page)
+{
+	if (ref != GRANT_INVALID_REF)
+		gnttab_end_foreign_access(ref, 0, (unsigned long)page);
+}
+
+
+/* ** Driver registration ** */
+
+
+static struct xenbus_device_id netfront_ids[] = {
+	{ "vif" },
+	{ "" }
+};
+
+
+static struct xenbus_driver netfront = {
+	.name = "vif",
+	.owner = THIS_MODULE,
+	.ids = netfront_ids,
+	.probe = netfront_probe,
+	.remove = netfront_remove,
+	.resume = netfront_resume,
+	.otherend_changed = backend_changed,
+};
+
+
+static struct notifier_block notifier_inetdev = {
+	.notifier_call  = inetdev_notify,
+};
+
+static int __init netif_init(void)
+{
+	int err = 0;
+
+	if (xen_start_info->flags & SIF_INITDOMAIN)
+		return 0;
+
+	if ((err = xennet_proc_init()) != 0)
+		return err;
+
+	IPRINTK("Initialising virtual ethernet driver.\n");
+
+	(void)register_inetaddr_notifier(&notifier_inetdev);
+
+	return xenbus_register_frontend(&netfront);
+}
+module_init(netif_init);
+
+
+static void netif_exit(void)
+{
+	unregister_inetaddr_notifier(&notifier_inetdev);
+
+	return xenbus_unregister_driver(&netfront);
+}
+module_exit(netif_exit);
+
+MODULE_LICENSE("Dual BSD/GPL");
+
+
+/* ** /proc **/
+
+
+#if defined(CONFIG_PROC_FS) && defined(XEN_XENNET_PROC_INTERFACE)
+
+#define TARGET_MIN 0UL
+#define TARGET_MAX 1UL
+#define TARGET_CUR 2UL
+
+static int xennet_proc_read(
+	char *page, char **start, off_t off, int count, int *eof, void *data)
+{
+	struct net_device *dev =
+		(struct net_device *)((unsigned long)data & ~3UL);
+	struct netfront_info *np = netdev_priv(dev);
+	int len = 0, which_target = (long)data & 3;
+
+	switch (which_target) {
+	case TARGET_MIN:
+		len = sprintf(page, "%d\n", np->rx_min_target);
+		break;
+	case TARGET_MAX:
+		len = sprintf(page, "%d\n", np->rx_max_target);
+		break;
+	case TARGET_CUR:
+		len = sprintf(page, "%d\n", np->rx_target);
+		break;
+	}
+
+	*eof = 1;
+	return len;
+}
+
+static int xennet_proc_write(
+	struct file *file, const char __user *buffer,
+	unsigned long count, void *data)
+{
+	struct net_device *dev =
+		(struct net_device *)((unsigned long)data & ~3UL);
+	struct netfront_info *np = netdev_priv(dev);
+	int which_target = (long)data & 3;
+	char string[64];
+	long target;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (count <= 1)
+		return -EBADMSG; /* runt */
+	if (count > sizeof(string))
+		return -EFBIG;   /* too long */
+
+	if (copy_from_user(string, buffer, count))
+		return -EFAULT;
+	string[sizeof(string)-1] = '\0';
+
+	target = simple_strtol(string, NULL, 10);
+	if (target < RX_MIN_TARGET)
+		target = RX_MIN_TARGET;
+	if (target > RX_MAX_TARGET)
+		target = RX_MAX_TARGET;
+
+	spin_lock(&np->rx_lock);
+
+	switch (which_target) {
+	case TARGET_MIN:
+		if (target > np->rx_max_target)
+			np->rx_max_target = target;
+		np->rx_min_target = target;
+		if (target > np->rx_target)
+			np->rx_target = target;
+		break;
+	case TARGET_MAX:
+		if (target < np->rx_min_target)
+			np->rx_min_target = target;
+		np->rx_max_target = target;
+		if (target < np->rx_target)
+			np->rx_target = target;
+		break;
+	case TARGET_CUR:
+		break;
+	}
+
+	network_alloc_rx_buffers(dev);
+
+	spin_unlock(&np->rx_lock);
+
+	return count;
+}
+
+static int xennet_proc_init(void)
+{
+	if (proc_mkdir("xen/net", NULL) == NULL)
+		return -ENOMEM;
+	return 0;
+}
+
+static int xennet_proc_addif(struct net_device *dev)
+{
+	struct proc_dir_entry *dir, *min, *max, *cur;
+	char name[30];
+
+	sprintf(name, "xen/net/%s", dev->name);
+
+	dir = proc_mkdir(name, NULL);
+	if (!dir)
+		goto nomem;
+	dir->owner = THIS_MODULE;
+
+	min = create_proc_entry("rxbuf_min", 0644, dir);
+	max = create_proc_entry("rxbuf_max", 0644, dir);
+	cur = create_proc_entry("rxbuf_cur", 0444, dir);
+	if (!min || !max || !cur)
+		goto nomem;
+
+	min->read_proc  = xennet_proc_read;
+	min->write_proc = xennet_proc_write;
+	min->data       = (void *)((unsigned long)dev | TARGET_MIN);
+	min->owner = THIS_MODULE;
+
+	max->read_proc  = xennet_proc_read;
+	max->write_proc = xennet_proc_write;
+	max->data       = (void *)((unsigned long)dev | TARGET_MAX);
+	max->owner = THIS_MODULE;
+
+	cur->read_proc  = xennet_proc_read;
+	cur->write_proc = xennet_proc_write;
+	cur->data       = (void *)((unsigned long)dev | TARGET_CUR);
+	max->owner = THIS_MODULE;
+
+	return 0;
+
+ nomem:
+	xennet_proc_delif(dev);
+	return -ENOMEM;
+}
+
+static void xennet_proc_delif(struct net_device *dev)
+{
+	char name[30];
+
+	sprintf(name, "xen/net/%s/rxbuf_min", dev->name);
+	remove_proc_entry(name, NULL);
+
+	sprintf(name, "xen/net/%s/rxbuf_max", dev->name);
+	remove_proc_entry(name, NULL);
+
+	sprintf(name, "xen/net/%s/rxbuf_cur", dev->name);
+	remove_proc_entry(name, NULL);
+
+	sprintf(name, "xen/net/%s", dev->name);
+	remove_proc_entry(name, NULL);
+}
+
+#endif
--- /dev/null
+++ linus-2.6/include/xen/net_driver_util.h
@@ -0,0 +1,48 @@
+/*****************************************************************************
+ *
+ * Utility functions for Xen network devices.
+ *
+ * Copyright (c) 2005 XenSource Ltd.
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject
+ * to the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _ASM_XEN_NET_DRIVER_UTIL_H
+#define _ASM_XEN_NET_DRIVER_UTIL_H
+
+
+#include <xen/xenbus.h>
+
+
+/**
+ * Read the 'mac' node at the given device's node in the store, and parse that
+ * as colon-separated octets, placing result the given mac array.  mac must be
+ * a preallocated array of length ETH_ALEN (as declared in linux/if_ether.h).
+ * Return 0 on success, or -errno on error.
+ */
+int xen_net_read_mac(struct xenbus_device *dev, u8 mac[]);
+
+
+#endif /* _ASM_XEN_NET_DRIVER_UTIL_H */

--

^ permalink raw reply

* Re: dBm cutoff at -1dBm is too low
From: Pavel Roskin @ 2006-05-09  4:54 UTC (permalink / raw)
  To: jt; +Cc: NetDev
In-Reply-To: <20060508171711.GA10948@bougret.hpl.hp.com>

On Mon, 2006-05-08 at 10:17 -0700, Jean Tourrilhes wrote:
> > But shouldn't you trust the drivers using IW_QUAL_DBM, whether the value
> > is positive or negative?
> 
> 	You can't remove the test, making the rest pointeless. Old
> style driver never used the flags, new style driver that don't report
> dBm will never use the flags, and there is not way to dinstinguish
> both, apart from the 'sign' of the value.

I used "new style driver" as synonymous to the driver using IW_QUAL_DBM
and "old style driver" as the one that doesn't use IW_QUAL_DBM.

I don't think any drivers need to specify both dBm and non-dBm data for
the same device.

Drivers that give non-dBm data are already well served by wireless
tools.  They don't need to change.

Drivers that give dBm data had to limit the data to avoid its
misinterpretation as non-dBm data.  Now IW_QUAL_DBM is supposed to free
drivers from such checks.  But it doesn't deliver its promise, because
the data can still be misinterpreted, just is a different way.  Namely,
a strong signal (0dBm) can be interpreted as an incredibly weak signal
(-256dBm).  That's what I want to see fixed.

One tricky case would be when the driver sets the max signal e.g. to 30
and reports 35 (i.e. a positive value within the "reasonable" dBm
range).  I would probably prefer to show it as 30/30 rather than 35dBm,
but the driver is nuts anyway, and I'm not really concerned about it to
write an extra line of code.

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* I N F O R M   Á T I C A   P R O F I S S I O N A L
From: netdev-owner @ 2006-05-09  1:02 UTC (permalink / raw)
  To: netdev

C U R S O S   E M   C D - R O M   P A R A   A U T O A P R E N D I Z A G E M

Se você precisa aprender ou treinar alguém num dos seguintes programas:

AutoCAD
       Arqui3D
              3DStudio Max
                     Photoshop
                            Excel
                                   Flash
                                          DreamWeaver
                                                 CorelDraw

Acesse nosso site e baixe gratuitamente alguns demonstrativos.

WWW.CURSOSEMCDROM.COM.BR

Você aprende em sua própria casa, de acordo com seu ritmo 
de aprendizagem e com sua disponibilidade de tempo.

^ permalink raw reply

* Re: RFC: new WIP version of au1000_eth.c phylib conversion (was Re: RFC: au1000_etc.c phylib rewrite)
From: Herbert Valerio Riedel @ 2006-05-09  2:04 UTC (permalink / raw)
  To: Andy Fleming
  Cc: Mark Schank, ppopov, sshtylyov, linux-mips, jgarzik, netdev,
	Ralf Baechle, Robin H. Johnson
In-Reply-To: <6BF86C09-0732-4322-A43E-29705849886D@freescale.com>


[-- Attachment #1.1: Type: text/plain, Size: 4436 bytes --]


jfyi, yet another more recent snapshot is attached to this email...

On Mon, 2006-05-08 at 17:24 -0500, Andy Fleming wrote:
> > I've tried to adapt the PHY detection code to allow for dynamic  
> > runtime configuration (with fallback to search for the 2nd MAC PHY on the 1st
> > MAC's MII bus), as well as selectable static PHY configuration through
> > Kconfig (e.g. for supporting PHYs w/o MII connection)
> 
> Comments inline, below:

> [snip]
> 
> Nothing to say so far, except seeing all those "-" signs was quite  
> the thrill, since one of the goals of the phylib was to lead to  
> reduced complexity.  That said, it looked like there were about a  
> dozen PHY-specific code blocks in there.  I saw you submit one PHY  
> driver.  Were there others in there that could be ported?

could well be, although I hope that most would work with the genphy
driver out of the box; but I'm depending on other people here anyway to
test it, since I don't have all kinds of evaluation boards available for
testing myself... :-)

I submitted the SMSC phy driver, because that's a piece of hardware I
had to play with...

> [snip]
> 
> > +static int mdiobus_read(struct mii_bus *bus, int phy_addr, int  
> > regnum)
> > {
> > -	int i, val;
> > +	struct net_device *const dev = bus->priv; /* beware: bus->phy_map 
> > [phy_addr].attached_dev == dev does _NOT_ hold always  */
> > +	enable_mac(dev, 0); /* make sure MAC associated with this mii_bus  
> > is enabled */
> > +	return mdio_read(dev, phy_addr, regnum);
> > +}
> 
> 
> Why is attached_dev not always correct?  I'm not sure if I'm not  
> understanding the hardware (I'm unfamiliar with this NIC), or if  
> you've misinterpreted the meaning of the attached_dev field.  It's  
> supposed to be a connection between the network device and a PHY,  
> mainly used for allowing the PHY to signal state changes back to the  
> ethernet device.  Is it actually the case that there is one MAC being  
> used for two PHYs at the same time? 

exactly; some au1xxxx based boards have so-called DualPHYs which provide
two PHYs, but with only one MII interface...

>  If so, how do you resolve which  
> PHY's state gets used at any given moment?

the attached_dev points to the net_device for which the PHY state is
relevant -- which might not be the net_device whose MII bus the PHY is
attached to...

it's just that the net_device stored in the mii_bus->priv field points
always to the "physical" net_device

and finally, each net_device has a pointer to its "logical" phy_dev

...does this answer your question? :-)

> The same question applies for the code in mdiobus_write()
> 
> [snip]

> > -static int mii_probe (struct net_device * dev)
> > +static int mdiobus_reset(struct mii_bus *bus)
> > {
> > -	struct au1000_private *aup = (struct au1000_private *) dev->priv;
> > -	int phy_addr;
> > -#ifdef CONFIG_MIPS_BOSPORUS
> > -	int phy_found=0;
> > -#endif
> > +	struct net_device *dev = bus->priv;
> > -	/* search for total of 32 possible mii phy addresses */
> > -	for (phy_addr = 0; phy_addr < 32; phy_addr++) {
> > -		u16 mii_status;
> > -		u16 phy_id0, phy_id1;
> > -		int i;
> > +	enable_mac(dev, 0); /* make sure MAC associated with this mii_bus  
> > is enabled */
> >
> 
> Do you need to call enable_mac() every time?  If it needs to be up,  
> wouldn't it be easier to make sure it's up during bus initialization?

it's actually a hack :-/ it's needed for the case where a PHY is
attached to a MAC other than the state-associated one... as MAC can be
powered down, when the eth is down, this allows to keep the MAC down if
it's not needed, and also make sure it get's enabled again, in case
something happened between mdiobus_reset and mdiobus_read/write 

> [snip]
> Might want to change that to be "...not 10/100..." or add a case for  
> 1000.

done...

> [snip]
> 
> > +	spin_unlock_irqrestore(&aup->lock, flags);
> > +	if (status_change) {
> > +		phy_print_status(phydev);
> > +	}
> > }
> 
> 
> Stylistic issue (I've seen it a couple times, at least):  don't use  
> "{" and "}" if your block only has one line.
> ie:
> 	if (status_change)
> 		phy_print_status(phydev);

you seem to have found one of the very few if(){single-line} cases I
introduced... (and that particular one was even a placeholder which got
bloated anyway :-))

regards,
hvr

[-- Attachment #1.2: au1000eth_phylib_conversion.patch --]
[-- Type: text/x-patch, Size: 60187 bytes --]

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 1edcc0c..0f6a41a 100644
Index: gw1550kernel/drivers/net/Kconfig
===================================================================
--- gw1550kernel.orig/drivers/net/Kconfig	2006-05-09 03:01:37.000000000 +0200
+++ gw1550kernel/drivers/net/Kconfig	2006-05-09 03:20:19.000000000 +0200
@@ -455,11 +455,50 @@
 config MIPS_AU1X00_ENET
 	bool "MIPS AU1000 Ethernet support"
 	depends on NET_ETHERNET && SOC_AU1X00
+	select PHYLIB
 	select CRC32
 	help
 	  If you have an Alchemy Semi AU1X00 based system
 	  say Y.  Otherwise, say N.
 
+config MIPS_AU1X00_ENET_STATIC_PHY_CONFIG
+	bool "Static PHY configuration"
+	depends on MIPS_AU1X00_ENET
+	default n
+	help
+	  Say Y if you need to set a static PHY configuration. If you say N, the
+	  driver will try to autodetect the PHY configuration.
+
+config MIPS_AU1X00_ENET_ETH0_PHY_ADDR
+	int "1st Ethernet's PHY address {-1,0..31}"
+	depends on MIPS_AU1X00_ENET_STATIC_PHY_CONFIG
+	help
+	  Provide the PHY address of the first Dthernet device.
+	  If the PHY has no PHY address, say "-1".
+
+config MIPS_AU1X00_ENET_ETH0_PHY_IRQ
+	int "1st Ethernet's PHY IRQ"
+	depends on MIPS_AU1X00_ENET_STATIC_PHY_CONFIG
+
+config MIPS_AU1X00_ENET_ETH1_PHY_ON_MAC0
+	bool "2nd Ethernet's PHY is attached to 1st MAC"
+	depends on MIPS_AU1X00_ENET_STATIC_PHY_CONFIG
+	default n
+	help
+	  Say Y here, if the second Ethernet's PHY is attached to
+	  the MII bus of the first MAC.
+
+config MIPS_AU1X00_ENET_ETH1_PHY_ADDR
+	int "2nd Ethernet's PHY address {-1,0..31}"
+	depends on MIPS_AU1X00_ENET_STATIC_PHY_CONFIG
+	help
+	  Provide the PHY address of the second Ethernet device.
+	  If the PHY has no PHY address, say "-1".
+
+config MIPS_AU1X00_ENET_ETH1_PHY_IRQ
+	int "2nd Ethernet's PHY IRQ"
+	depends on MIPS_AU1X00_ENET_STATIC_PHY_CONFIG
+
 config SGI_IOC3_ETH
 	bool "SGI IOC3 Ethernet"
 	depends on NET_ETHERNET && PCI && SGI_IP27
Index: gw1550kernel/drivers/net/au1000_eth.c
===================================================================
--- gw1550kernel.orig/drivers/net/au1000_eth.c	2006-05-09 03:01:44.000000000 +0200
+++ gw1550kernel/drivers/net/au1000_eth.c	2006-05-09 03:45:03.000000000 +0200
@@ -9,6 +9,9 @@
  * Update: 2004 Bjoern Riemer, riemer@fokus.fraunhofer.de 
  * or riemer@riemer-nt.de: fixed the link beat detection with 
  * ioctls (SIOCGMIIPHY)
+ * Copyright 2006 Herbert Valerio Riedel <hvr@gnu.org>
+ *  converted to use linux-2.6.x's PHY framework
+ *
  * Author: MontaVista Software, Inc.
  *         	ppopov@mvista.com or source@mvista.com
  *
@@ -53,6 +56,7 @@
 #include <linux/skbuff.h>
 #include <linux/delay.h>
 #include <linux/crc32.h>
+#include <linux/phy.h>
 #include <asm/mipsregs.h>
 #include <asm/irq.h>
 #include <asm/io.h>
@@ -88,17 +92,15 @@
 static int au1000_rx(struct net_device *);
 static irqreturn_t au1000_interrupt(int, void *, struct pt_regs *);
 static void au1000_tx_timeout(struct net_device *);
-static int au1000_set_config(struct net_device *dev, struct ifmap *map);
 static void set_rx_mode(struct net_device *);
 static struct net_device_stats *au1000_get_stats(struct net_device *);
-static void au1000_timer(unsigned long);
 static int au1000_ioctl(struct net_device *, struct ifreq *, int);
 static int mdio_read(struct net_device *, int, int);
 static void mdio_write(struct net_device *, int, int, u16);
-static void dump_mii(struct net_device *dev, int phy_id);
+static void au1000_adjust_link(struct net_device *);
+static void enable_mac(struct net_device *, int);
 
 // externs
-extern  void ack_rise_edge_irq(unsigned int);
 extern int get_ethernet_addr(char *ethernet_addr);
 extern void str2eaddr(unsigned char *ea, unsigned char *str);
 extern char * __init prom_getcmdline(void);
@@ -108,11 +110,11 @@
  *
  * The Au1000 MACs use a simple rx and tx descriptor ring scheme. 
  * There are four receive and four transmit descriptors.  These 
- * descriptors are not in memory; rather, they are just a set of 
+ * descriptors are not in memory; rather, they are just a set of
  * hardware registers.
  *
  * Since the Au1000 has a coherent data cache, the receive and
- * transmit buffers are allocated from the KSEG0 segment. The 
+ * transmit buffers are allocated from the KSEG0 segment. The
  * hardware registers, however, are still mapped at KSEG1 to
  * make sure there's no out-of-order writes, and that all writes
  * complete immediately.
@@ -122,709 +124,23 @@
  * the mac address is, and the mac address is not passed on the
  * command line.
  */
-static unsigned char au1000_mac_addr[6] __devinitdata = { 
+static unsigned char au1000_mac_addr[6] __devinitdata = {
 	0x00, 0x50, 0xc2, 0x0c, 0x30, 0x00
 };
 
-#define nibswap(x) ((((x) >> 4) & 0x0f) | (((x) << 4) & 0xf0))
-#define RUN_AT(x) (jiffies + (x))
-
-// For reading/writing 32-bit words from/to DMA memory
-#define cpu_to_dma32 cpu_to_be32
-#define dma32_to_cpu be32_to_cpu
-
 struct au1000_private *au_macs[NUM_ETH_INTERFACES];
 
-/* FIXME 
- * All of the PHY code really should be detached from the MAC 
- * code.
+/*
+ * MII operations
  */
-
-/* Default advertise */
-#define GENMII_DEFAULT_ADVERTISE \
-	ADVERTISED_10baseT_Half | ADVERTISED_10baseT_Full | \
-	ADVERTISED_100baseT_Half | ADVERTISED_100baseT_Full | \
-	ADVERTISED_Autoneg
-
-#define GENMII_DEFAULT_FEATURES \
-	SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full | \
-	SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full | \
-	SUPPORTED_Autoneg
-
-int bcm_5201_init(struct net_device *dev, int phy_addr)
-{
-	s16 data;
-	
-	/* Stop auto-negotiation */
-	data = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, data & ~MII_CNTL_AUTO);
-
-	/* Set advertisement to 10/100 and Half/Full duplex
-	 * (full capabilities) */
-	data = mdio_read(dev, phy_addr, MII_ANADV);
-	data |= MII_NWAY_TX | MII_NWAY_TX_FDX | MII_NWAY_T_FDX | MII_NWAY_T;
-	mdio_write(dev, phy_addr, MII_ANADV, data);
-	
-	/* Restart auto-negotiation */
-	data = mdio_read(dev, phy_addr, MII_CONTROL);
-	data |= MII_CNTL_RST_AUTO | MII_CNTL_AUTO;
-	mdio_write(dev, phy_addr, MII_CONTROL, data);
-
-	if (au1000_debug > 4) 
-		dump_mii(dev, phy_addr);
-	return 0;
-}
-
-int bcm_5201_reset(struct net_device *dev, int phy_addr)
-{
-	s16 mii_control, timeout;
-	
-	mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, mii_control | MII_CNTL_RESET);
-	mdelay(1);
-	for (timeout = 100; timeout > 0; --timeout) {
-		mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-		if ((mii_control & MII_CNTL_RESET) == 0)
-			break;
-		mdelay(1);
-	}
-	if (mii_control & MII_CNTL_RESET) {
-		printk(KERN_ERR "%s PHY reset timeout !\n", dev->name);
-		return -1;
-	}
-	return 0;
-}
-
-int 
-bcm_5201_status(struct net_device *dev, int phy_addr, u16 *link, u16 *speed)
-{
-	u16 mii_data;
-	struct au1000_private *aup;
-
-	if (!dev) {
-		printk(KERN_ERR "bcm_5201_status error: NULL dev\n");
-		return -1;
-	}
-	aup = (struct au1000_private *) dev->priv;
-
-	mii_data = mdio_read(dev, aup->phy_addr, MII_STATUS);
-	if (mii_data & MII_STAT_LINK) {
-		*link = 1;
-		mii_data = mdio_read(dev, aup->phy_addr, MII_AUX_CNTRL);
-		if (mii_data & MII_AUX_100) {
-			if (mii_data & MII_AUX_FDX) {
-				*speed = IF_PORT_100BASEFX;
-				dev->if_port = IF_PORT_100BASEFX;
-			}
-			else {
-				*speed = IF_PORT_100BASETX;
-				dev->if_port = IF_PORT_100BASETX;
-			}
-		}
-		else  {
-			*speed = IF_PORT_10BASET;
-			dev->if_port = IF_PORT_10BASET;
-		}
-
-	}
-	else {
-		*link = 0;
-		*speed = 0;
-		dev->if_port = IF_PORT_UNKNOWN;
-	}
-	return 0;
-}
-
-int lsi_80227_init(struct net_device *dev, int phy_addr)
-{
-	if (au1000_debug > 4)
-		printk("lsi_80227_init\n");
-
-	/* restart auto-negotiation */
-	mdio_write(dev, phy_addr, MII_CONTROL,
-		   MII_CNTL_F100 | MII_CNTL_AUTO | MII_CNTL_RST_AUTO); // | MII_CNTL_FDX);
-	mdelay(1);
-
-	/* set up LEDs to correct display */
-#ifdef CONFIG_MIPS_MTX1
-	mdio_write(dev, phy_addr, 17, 0xff80);
-#else
-	mdio_write(dev, phy_addr, 17, 0xffc0);
-#endif
-
-	if (au1000_debug > 4)
-		dump_mii(dev, phy_addr);
-	return 0;
-}
-
-int lsi_80227_reset(struct net_device *dev, int phy_addr)
-{
-	s16 mii_control, timeout;
-	
-	if (au1000_debug > 4) {
-		printk("lsi_80227_reset\n");
-		dump_mii(dev, phy_addr);
-	}
-
-	mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, mii_control | MII_CNTL_RESET);
-	mdelay(1);
-	for (timeout = 100; timeout > 0; --timeout) {
-		mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-		if ((mii_control & MII_CNTL_RESET) == 0)
-			break;
-		mdelay(1);
-	}
-	if (mii_control & MII_CNTL_RESET) {
-		printk(KERN_ERR "%s PHY reset timeout !\n", dev->name);
-		return -1;
-	}
-	return 0;
-}
-
-int
-lsi_80227_status(struct net_device *dev, int phy_addr, u16 *link, u16 *speed)
-{
-	u16 mii_data;
-	struct au1000_private *aup;
-
-	if (!dev) {
-		printk(KERN_ERR "lsi_80227_status error: NULL dev\n");
-		return -1;
-	}
-	aup = (struct au1000_private *) dev->priv;
-
-	mii_data = mdio_read(dev, aup->phy_addr, MII_STATUS);
-	if (mii_data & MII_STAT_LINK) {
-		*link = 1;
-		mii_data = mdio_read(dev, aup->phy_addr, MII_LSI_PHY_STAT);
-		if (mii_data & MII_LSI_PHY_STAT_SPD) {
-			if (mii_data & MII_LSI_PHY_STAT_FDX) {
-				*speed = IF_PORT_100BASEFX;
-				dev->if_port = IF_PORT_100BASEFX;
-			}
-			else {
-				*speed = IF_PORT_100BASETX;
-				dev->if_port = IF_PORT_100BASETX;
-			}
-		}
-		else  {
-			*speed = IF_PORT_10BASET;
-			dev->if_port = IF_PORT_10BASET;
-		}
-
-	}
-	else {
-		*link = 0;
-		*speed = 0;
-		dev->if_port = IF_PORT_UNKNOWN;
-	}
-	return 0;
-}
-
-int am79c901_init(struct net_device *dev, int phy_addr)
-{
-	printk("am79c901_init\n");
-	return 0;
-}
-
-int am79c901_reset(struct net_device *dev, int phy_addr)
-{
-	printk("am79c901_reset\n");
-	return 0;
-}
-
-int 
-am79c901_status(struct net_device *dev, int phy_addr, u16 *link, u16 *speed)
-{
-	return 0;
-}
-
-int am79c874_init(struct net_device *dev, int phy_addr)
-{
-	s16 data;
-
-	/* 79c874 has quit resembled bit assignments to BCM5201 */
-	if (au1000_debug > 4)
-		printk("am79c847_init\n");
-
-	/* Stop auto-negotiation */
-	data = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, data & ~MII_CNTL_AUTO);
-
-	/* Set advertisement to 10/100 and Half/Full duplex
-	 * (full capabilities) */
-	data = mdio_read(dev, phy_addr, MII_ANADV);
-	data |= MII_NWAY_TX | MII_NWAY_TX_FDX | MII_NWAY_T_FDX | MII_NWAY_T;
-	mdio_write(dev, phy_addr, MII_ANADV, data);
-	
-	/* Restart auto-negotiation */
-	data = mdio_read(dev, phy_addr, MII_CONTROL);
-	data |= MII_CNTL_RST_AUTO | MII_CNTL_AUTO;
-
-	mdio_write(dev, phy_addr, MII_CONTROL, data);
-
-	if (au1000_debug > 4) dump_mii(dev, phy_addr);
-	return 0;
-}
-
-int am79c874_reset(struct net_device *dev, int phy_addr)
-{
-	s16 mii_control, timeout;
-	
-	if (au1000_debug > 4)
-		printk("am79c874_reset\n");
-
-	mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, mii_control | MII_CNTL_RESET);
-	mdelay(1);
-	for (timeout = 100; timeout > 0; --timeout) {
-		mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-		if ((mii_control & MII_CNTL_RESET) == 0)
-			break;
-		mdelay(1);
-	}
-	if (mii_control & MII_CNTL_RESET) {
-		printk(KERN_ERR "%s PHY reset timeout !\n", dev->name);
-		return -1;
-	}
-	return 0;
-}
-
-int 
-am79c874_status(struct net_device *dev, int phy_addr, u16 *link, u16 *speed)
-{
-	u16 mii_data;
-	struct au1000_private *aup;
-
-	// printk("am79c874_status\n");
-	if (!dev) {
-		printk(KERN_ERR "am79c874_status error: NULL dev\n");
-		return -1;
-	}
-
-	aup = (struct au1000_private *) dev->priv;
-	mii_data = mdio_read(dev, aup->phy_addr, MII_STATUS);
-
-	if (mii_data & MII_STAT_LINK) {
-		*link = 1;
-		mii_data = mdio_read(dev, aup->phy_addr, MII_AMD_PHY_STAT);
-		if (mii_data & MII_AMD_PHY_STAT_SPD) {
-			if (mii_data & MII_AMD_PHY_STAT_FDX) {
-				*speed = IF_PORT_100BASEFX;
-				dev->if_port = IF_PORT_100BASEFX;
-			}
-			else {
-				*speed = IF_PORT_100BASETX;
-				dev->if_port = IF_PORT_100BASETX;
-			}
-		}
-		else {
-			*speed = IF_PORT_10BASET;
-			dev->if_port = IF_PORT_10BASET;
-		}
-
-	}
-	else {
-		*link = 0;
-		*speed = 0;
-		dev->if_port = IF_PORT_UNKNOWN;
-	}
-	return 0;
-}
-
-int lxt971a_init(struct net_device *dev, int phy_addr)
-{
-	if (au1000_debug > 4)
-		printk("lxt971a_init\n");
-
-	/* restart auto-negotiation */
-	mdio_write(dev, phy_addr, MII_CONTROL,
-		   MII_CNTL_F100 | MII_CNTL_AUTO | MII_CNTL_RST_AUTO | MII_CNTL_FDX);
-
-	/* set up LEDs to correct display */
-	mdio_write(dev, phy_addr, 20, 0x0422);
-
-	if (au1000_debug > 4)
-		dump_mii(dev, phy_addr);
-	return 0;
-}
-
-int lxt971a_reset(struct net_device *dev, int phy_addr)
-{
-	s16 mii_control, timeout;
-	
-	if (au1000_debug > 4) {
-		printk("lxt971a_reset\n");
-		dump_mii(dev, phy_addr);
-	}
-
-	mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, mii_control | MII_CNTL_RESET);
-	mdelay(1);
-	for (timeout = 100; timeout > 0; --timeout) {
-		mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-		if ((mii_control & MII_CNTL_RESET) == 0)
-			break;
-		mdelay(1);
-	}
-	if (mii_control & MII_CNTL_RESET) {
-		printk(KERN_ERR "%s PHY reset timeout !\n", dev->name);
-		return -1;
-	}
-	return 0;
-}
-
-int
-lxt971a_status(struct net_device *dev, int phy_addr, u16 *link, u16 *speed)
-{
-	u16 mii_data;
-	struct au1000_private *aup;
-
-	if (!dev) {
-		printk(KERN_ERR "lxt971a_status error: NULL dev\n");
-		return -1;
-	}
-	aup = (struct au1000_private *) dev->priv;
-
-	mii_data = mdio_read(dev, aup->phy_addr, MII_STATUS);
-	if (mii_data & MII_STAT_LINK) {
-		*link = 1;
-		mii_data = mdio_read(dev, aup->phy_addr, MII_INTEL_PHY_STAT);
-		if (mii_data & MII_INTEL_PHY_STAT_SPD) {
-			if (mii_data & MII_INTEL_PHY_STAT_FDX) {
-				*speed = IF_PORT_100BASEFX;
-				dev->if_port = IF_PORT_100BASEFX;
-			}
-			else {
-				*speed = IF_PORT_100BASETX;
-				dev->if_port = IF_PORT_100BASETX;
-			}
-		}
-		else  {
-			*speed = IF_PORT_10BASET;
-			dev->if_port = IF_PORT_10BASET;
-		}
-
-	}
-	else {
-		*link = 0;
-		*speed = 0;
-		dev->if_port = IF_PORT_UNKNOWN;
-	}
-	return 0;
-}
-
-int ks8995m_init(struct net_device *dev, int phy_addr)
-{
-	s16 data;
-	
-//	printk("ks8995m_init\n");
-	/* Stop auto-negotiation */
-	data = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, data & ~MII_CNTL_AUTO);
-
-	/* Set advertisement to 10/100 and Half/Full duplex
-	 * (full capabilities) */
-	data = mdio_read(dev, phy_addr, MII_ANADV);
-	data |= MII_NWAY_TX | MII_NWAY_TX_FDX | MII_NWAY_T_FDX | MII_NWAY_T;
-	mdio_write(dev, phy_addr, MII_ANADV, data);
-	
-	/* Restart auto-negotiation */
-	data = mdio_read(dev, phy_addr, MII_CONTROL);
-	data |= MII_CNTL_RST_AUTO | MII_CNTL_AUTO;
-	mdio_write(dev, phy_addr, MII_CONTROL, data);
-
-	if (au1000_debug > 4) dump_mii(dev, phy_addr);
-
-	return 0;
-}
-
-int ks8995m_reset(struct net_device *dev, int phy_addr)
-{
-	s16 mii_control, timeout;
-	
-//	printk("ks8995m_reset\n");
-	mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, mii_control | MII_CNTL_RESET);
-	mdelay(1);
-	for (timeout = 100; timeout > 0; --timeout) {
-		mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-		if ((mii_control & MII_CNTL_RESET) == 0)
-			break;
-		mdelay(1);
-	}
-	if (mii_control & MII_CNTL_RESET) {
-		printk(KERN_ERR "%s PHY reset timeout !\n", dev->name);
-		return -1;
-	}
-	return 0;
-}
-
-int ks8995m_status(struct net_device *dev, int phy_addr, u16 *link, u16 *speed)
-{
-	u16 mii_data;
-	struct au1000_private *aup;
-
-	if (!dev) {
-		printk(KERN_ERR "ks8995m_status error: NULL dev\n");
-		return -1;
-	}
-	aup = (struct au1000_private *) dev->priv;
-
-	mii_data = mdio_read(dev, aup->phy_addr, MII_STATUS);
-	if (mii_data & MII_STAT_LINK) {
-		*link = 1;
-		mii_data = mdio_read(dev, aup->phy_addr, MII_AUX_CNTRL);
-		if (mii_data & MII_AUX_100) {
-			if (mii_data & MII_AUX_FDX) {
-				*speed = IF_PORT_100BASEFX;
-				dev->if_port = IF_PORT_100BASEFX;
-			}
-			else {
-				*speed = IF_PORT_100BASETX;
-				dev->if_port = IF_PORT_100BASETX;
-			}
-		}
-		else  {											
-			*speed = IF_PORT_10BASET;
-			dev->if_port = IF_PORT_10BASET;
-		}
-
-	}
-	else {
-		*link = 0;
-		*speed = 0;
-		dev->if_port = IF_PORT_UNKNOWN;
-	}
-	return 0;
-}
-
-int
-smsc_83C185_init (struct net_device *dev, int phy_addr)
-{
-	s16 data;
-
-	if (au1000_debug > 4)
-		printk("smsc_83C185_init\n");
-
-	/* Stop auto-negotiation */
-	data = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, data & ~MII_CNTL_AUTO);
-
-	/* Set advertisement to 10/100 and Half/Full duplex
-	 * (full capabilities) */
-	data = mdio_read(dev, phy_addr, MII_ANADV);
-	data |= MII_NWAY_TX | MII_NWAY_TX_FDX | MII_NWAY_T_FDX | MII_NWAY_T;
-	mdio_write(dev, phy_addr, MII_ANADV, data);
-	
-	/* Restart auto-negotiation */
-	data = mdio_read(dev, phy_addr, MII_CONTROL);
-	data |= MII_CNTL_RST_AUTO | MII_CNTL_AUTO;
-
-	mdio_write(dev, phy_addr, MII_CONTROL, data);
-
-	if (au1000_debug > 4) dump_mii(dev, phy_addr);
-	return 0;
-}
-
-int
-smsc_83C185_reset (struct net_device *dev, int phy_addr)
-{
-	s16 mii_control, timeout;
-	
-	if (au1000_debug > 4)
-		printk("smsc_83C185_reset\n");
-
-	mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-	mdio_write(dev, phy_addr, MII_CONTROL, mii_control | MII_CNTL_RESET);
-	mdelay(1);
-	for (timeout = 100; timeout > 0; --timeout) {
-		mii_control = mdio_read(dev, phy_addr, MII_CONTROL);
-		if ((mii_control & MII_CNTL_RESET) == 0)
-			break;
-		mdelay(1);
-	}
-	if (mii_control & MII_CNTL_RESET) {
-		printk(KERN_ERR "%s PHY reset timeout !\n", dev->name);
-		return -1;
-	}
-	return 0;
-}
-
-int 
-smsc_83C185_status (struct net_device *dev, int phy_addr, u16 *link, u16 *speed)
-{
-	u16 mii_data;
-	struct au1000_private *aup;
-
-	if (!dev) {
-		printk(KERN_ERR "smsc_83C185_status error: NULL dev\n");
-		return -1;
-	}
-
-	aup = (struct au1000_private *) dev->priv;
-	mii_data = mdio_read(dev, aup->phy_addr, MII_STATUS);
-
-	if (mii_data & MII_STAT_LINK) {
-		*link = 1;
-		mii_data = mdio_read(dev, aup->phy_addr, 0x1f);
-		if (mii_data & (1<<3)) {
-			if (mii_data & (1<<4)) {
-				*speed = IF_PORT_100BASEFX;
-				dev->if_port = IF_PORT_100BASEFX;
-			}
-			else {
-				*speed = IF_PORT_100BASETX;
-				dev->if_port = IF_PORT_100BASETX;
-			}
-		}
-		else {
-			*speed = IF_PORT_10BASET;
-			dev->if_port = IF_PORT_10BASET;
-		}
-	}
-	else {
-		*link = 0;
-		*speed = 0;
-		dev->if_port = IF_PORT_UNKNOWN;
-	}
-	return 0;
-}
-
-
-#ifdef CONFIG_MIPS_BOSPORUS
-int stub_init(struct net_device *dev, int phy_addr)
-{
-	//printk("PHY stub_init\n");
-	return 0;
-}
-
-int stub_reset(struct net_device *dev, int phy_addr)
-{
-	//printk("PHY stub_reset\n");
-	return 0;
-}
-
-int 
-stub_status(struct net_device *dev, int phy_addr, u16 *link, u16 *speed)
-{
-	//printk("PHY stub_status\n");
-	*link = 1;
-	/* hmmm, revisit */
-	*speed = IF_PORT_100BASEFX;
-	dev->if_port = IF_PORT_100BASEFX;
-	return 0;
-}
-#endif
-
-struct phy_ops bcm_5201_ops = {
-	bcm_5201_init,
-	bcm_5201_reset,
-	bcm_5201_status,
-};
-
-struct phy_ops am79c874_ops = {
-	am79c874_init,
-	am79c874_reset,
-	am79c874_status,
-};
-
-struct phy_ops am79c901_ops = {
-	am79c901_init,
-	am79c901_reset,
-	am79c901_status,
-};
-
-struct phy_ops lsi_80227_ops = { 
-	lsi_80227_init,
-	lsi_80227_reset,
-	lsi_80227_status,
-};
-
-struct phy_ops lxt971a_ops = { 
-	lxt971a_init,
-	lxt971a_reset,
-	lxt971a_status,
-};
-
-struct phy_ops ks8995m_ops = {
-	ks8995m_init,
-	ks8995m_reset,
-	ks8995m_status,
-};
-
-struct phy_ops smsc_83C185_ops = {
-	smsc_83C185_init,
-	smsc_83C185_reset,
-	smsc_83C185_status,
-};
-
-#ifdef CONFIG_MIPS_BOSPORUS
-struct phy_ops stub_ops = {
-	stub_init,
-	stub_reset,
-	stub_status,
-};
-#endif
-
-static struct mii_chip_info {
-	const char * name;
-	u16 phy_id0;
-	u16 phy_id1;
-	struct phy_ops *phy_ops;	
-	int dual_phy;
-} mii_chip_table[] = {
-	{"Broadcom BCM5201 10/100 BaseT PHY",0x0040,0x6212, &bcm_5201_ops,0},
-	{"Broadcom BCM5221 10/100 BaseT PHY",0x0040,0x61e4, &bcm_5201_ops,0},
-	{"Broadcom BCM5222 10/100 BaseT PHY",0x0040,0x6322, &bcm_5201_ops,1},
-	{"NS DP83847 PHY", 0x2000, 0x5c30, &bcm_5201_ops ,0},
-	{"AMD 79C901 HomePNA PHY",0x0000,0x35c8, &am79c901_ops,0},
-	{"AMD 79C874 10/100 BaseT PHY",0x0022,0x561b, &am79c874_ops,0},
-	{"LSI 80227 10/100 BaseT PHY",0x0016,0xf840, &lsi_80227_ops,0},
-	{"Intel LXT971A Dual Speed PHY",0x0013,0x78e2, &lxt971a_ops,0},
-	{"Kendin KS8995M 10/100 BaseT PHY",0x0022,0x1450, &ks8995m_ops,0},
-	{"SMSC LAN83C185 10/100 BaseT PHY",0x0007,0xc0a3, &smsc_83C185_ops,0},
-#ifdef CONFIG_MIPS_BOSPORUS
-	{"Stub", 0x1234, 0x5678, &stub_ops },
-#endif
-	{0,},
-};
-
-static int mdio_read(struct net_device *dev, int phy_id, int reg)
+static int mdio_read(struct net_device *dev, int phy_addr, int reg)
 {
 	struct au1000_private *aup = (struct au1000_private *) dev->priv;
-	volatile u32 *mii_control_reg;
-	volatile u32 *mii_data_reg;
+	volatile u32 *const mii_control_reg = &aup->mac->mii_control;
+	volatile u32 *const mii_data_reg = &aup->mac->mii_data;
 	u32 timedout = 20;
 	u32 mii_control;
 
-	#ifdef CONFIG_BCM5222_DUAL_PHY
-	/* First time we probe, it's for the mac0 phy.
-	 * Since we haven't determined yet that we have a dual phy,
-	 * aup->mii->mii_control_reg won't be setup and we'll
-	 * default to the else statement.
-	 * By the time we probe for the mac1 phy, the mii_control_reg
-	 * will be setup to be the address of the mac0 phy control since
-	 * both phys are controlled through mac0.
-	 */
-	if (aup->mii && aup->mii->mii_control_reg) {
-		mii_control_reg = aup->mii->mii_control_reg;
-		mii_data_reg = aup->mii->mii_data_reg;
-	}
-	else if (au_macs[0]->mii && au_macs[0]->mii->mii_control_reg) {
-		/* assume both phys are controlled through mac0 */
-		mii_control_reg = au_macs[0]->mii->mii_control_reg;
-		mii_data_reg = au_macs[0]->mii->mii_data_reg;
-	}
-	else 
-	#endif
-	{
-		/* default control and data reg addresses */
-		mii_control_reg = &aup->mac->mii_control;
-		mii_data_reg = &aup->mac->mii_data;
-	}
-
 	while (*mii_control_reg & MAC_MII_BUSY) {
 		mdelay(1);
 		if (--timedout == 0) {
@@ -835,7 +151,7 @@
 	}
 
 	mii_control = MAC_SET_MII_SELECT_REG(reg) | 
-		MAC_SET_MII_SELECT_PHY(phy_id) | MAC_MII_READ;
+		MAC_SET_MII_SELECT_PHY(phy_addr) | MAC_MII_READ;
 
 	*mii_control_reg = mii_control;
 
@@ -851,32 +167,14 @@
 	return (int)*mii_data_reg;
 }
 
-static void mdio_write(struct net_device *dev, int phy_id, int reg, u16 value)
+static void mdio_write(struct net_device *dev, int phy_addr, int reg, u16 value)
 {
 	struct au1000_private *aup = (struct au1000_private *) dev->priv;
-	volatile u32 *mii_control_reg;
-	volatile u32 *mii_data_reg;
+	volatile u32 *const mii_control_reg = &aup->mac->mii_control;
+	volatile u32 *const mii_data_reg = &aup->mac->mii_data;
 	u32 timedout = 20;
 	u32 mii_control;
 
-	#ifdef CONFIG_BCM5222_DUAL_PHY
-	if (aup->mii && aup->mii->mii_control_reg) {
-		mii_control_reg = aup->mii->mii_control_reg;
-		mii_data_reg = aup->mii->mii_data_reg;
-	}
-	else if (au_macs[0]->mii && au_macs[0]->mii->mii_control_reg) {
-		/* assume both phys are controlled through mac0 */
-		mii_control_reg = au_macs[0]->mii->mii_control_reg;
-		mii_data_reg = au_macs[0]->mii->mii_data_reg;
-	}
-	else 
-	#endif
-	{
-		/* default control and data reg addresses */
-		mii_control_reg = &aup->mac->mii_control;
-		mii_data_reg = &aup->mac->mii_data;
-	}
-
 	while (*mii_control_reg & MAC_MII_BUSY) {
 		mdelay(1);
 		if (--timedout == 0) {
@@ -887,165 +185,167 @@
 	}
 
 	mii_control = MAC_SET_MII_SELECT_REG(reg) | 
-		MAC_SET_MII_SELECT_PHY(phy_id) | MAC_MII_WRITE;
+		MAC_SET_MII_SELECT_PHY(phy_addr) | MAC_MII_WRITE;
 
 	*mii_data_reg = value;
 	*mii_control_reg = mii_control;
 }
 
-
-static void dump_mii(struct net_device *dev, int phy_id)
+static int mdiobus_read(struct mii_bus *bus, int phy_addr, int regnum)
 {
-	int i, val;
-
-	for (i = 0; i < 7; i++) {
-		if ((val = mdio_read(dev, phy_id, i)) >= 0)
-			printk("%s: MII Reg %d=%x\n", dev->name, i, val);
+	/* WARNING: bus->phy_map[phy_addr].attached_dev == dev does
+	 * _NOT_ hold (e.g. when PHY is accessed through other MAC's MII bus) */
+	struct net_device *const dev = bus->priv;
+
+	enable_mac(dev, 0); /* make sure the MAC associated with this
+			     * mii_bus is enabled */
+	return mdio_read(dev, phy_addr, regnum);
+}
+
+static int mdiobus_write(struct mii_bus *bus, int phy_addr, int regnum,
+			 u16 value)
+{
+	struct net_device *const dev = bus->priv;
+
+	enable_mac(dev, 0); /* make sure the MAC associated with this
+			     * mii_bus is enabled */
+	mdio_write(dev, phy_addr, regnum, value);
+	return 0;
+}
+
+static int mdiobus_reset(struct mii_bus *bus)
+{
+	struct net_device *const dev = bus->priv;
+
+	enable_mac(dev, 0); /* make sure the MAC associated with this
+			     * mii_bus is enabled */
+	return 0;
+}
+
+static int mii_probe (struct net_device *dev)
+{
+	struct au1000_private *const aup = (struct au1000_private *) dev->priv;
+	struct phy_device *phydev = NULL;
+
+#if defined(CONFIG_MIPS_AU1X00_ENET_STATIC_PHY_CONFIG)
+	BUG_ON(aup->mac_id < 0 || aup->mac_id > 1);
+
+	if(aup->mac_id == 0) {
+# if CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_ADDR==-1
+		printk (KERN_INFO DRV_NAME ":%s: using PHY-less config\n", dev->name);
+		return 0;
+# elif (CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_ADDR>=0) && (CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_ADDR<PHY_MAX_ADDR)
+		phydev = aup->mii_bus.phy_map[CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_ADDR];
+#   if defined(CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_IRQ)
+		/* fix up IRQ */
+		aup->mii_bus.irq[CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_ADDR] = CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_IRQ;
+		if(phydev)
+			phydev->irq = CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_IRQ;
+#   endif
+# else
+#  error Bad value for CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_ADDR given
+# endif
+	} else if (aup->mac_id == 1) {
+# if CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ADDR==-1
+		printk (KERN_INFO DRV_NAME ":%s: using PHY-less config\n", dev->name);
+		return 0;
+# elif (CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ADDR>=0) && (CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ADDR<PHY_MAX_ADDR)
+#  if defined(CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ON_MAC0)
+#   if CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ADDR==CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_ADDR
+#    error given same static PHY address, and both PHYs are on the same bus
+#   endif
+		phydev = au_macs[0]->mii_bus.phy_map[CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ADDR];
+#   if defined(CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_IRQ)
+		/* fix up IRQ */
+		au_macs[0]->mii_bus.irq[CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ADDR] = CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_IRQ;
+		if(phydev)
+			phydev->irq = CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_IRQ;
+#   endif
+#  else
+		phydev = aup->mii_bus.phy_map[CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ADDR];
+#   if defined(CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_IRQ)
+		/* fix up IRQ */
+		aup->mii_bus.irq[CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_ADDR] = CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_IRQ;
+		if(phydev)
+			phydev->irq = CONFIG_MIPS_AU1X00_ENET_ETH1_PHY_IRQ;
+#   endif
+#  endif
+# else
+#  error Bad value for CONFIG_MIPS_AU1X00_ENET_ETH0_PHY_ADDR given
+# endif
 	}
-	for (i = 16; i < 25; i++) {
-		if ((val = mdio_read(dev, phy_id, i)) >= 0)
-			printk("%s: MII Reg %d=%x\n", dev->name, i, val);
-	}
-}
 
-static int mii_probe (struct net_device * dev)
-{
-	struct au1000_private *aup = (struct au1000_private *) dev->priv;
+#else /* runtime detected PHY configuration */
 	int phy_addr;
-#ifdef CONFIG_MIPS_BOSPORUS
-	int phy_found=0;
-#endif
-
-	/* search for total of 32 possible mii phy addresses */
-	for (phy_addr = 0; phy_addr < 32; phy_addr++) {
-		u16 mii_status;
-		u16 phy_id0, phy_id1;
-		int i;
-
-		#ifdef CONFIG_BCM5222_DUAL_PHY
-		/* Mask the already found phy, try next one */
-		if (au_macs[0]->mii && au_macs[0]->mii->mii_control_reg) {
-			if (au_macs[0]->phy_addr == phy_addr)
-				continue;
-		}
-		#endif
 
-		mii_status = mdio_read(dev, phy_addr, MII_STATUS);
-		if (mii_status == 0xffff || mii_status == 0x0000)
-			/* the mii is not accessable, try next one */
-			continue;
-
-		phy_id0 = mdio_read(dev, phy_addr, MII_PHY_ID0);
-		phy_id1 = mdio_read(dev, phy_addr, MII_PHY_ID1);
-
-		/* search our mii table for the current mii */ 
-		for (i = 0; mii_chip_table[i].phy_id1; i++) {
-			if (phy_id0 == mii_chip_table[i].phy_id0 &&
-			    phy_id1 == mii_chip_table[i].phy_id1) {
-				struct mii_phy * mii_phy = aup->mii;
-
-				printk(KERN_INFO "%s: %s at phy address %d\n",
-				       dev->name, mii_chip_table[i].name, 
-				       phy_addr);
-#ifdef CONFIG_MIPS_BOSPORUS
-				phy_found = 1;
-#endif
-				mii_phy->chip_info = mii_chip_table+i;
-				aup->phy_addr = phy_addr;
-				aup->want_autoneg = 1;
-				aup->phy_ops = mii_chip_table[i].phy_ops;
-				aup->phy_ops->phy_init(dev,phy_addr);
-
-				// Check for dual-phy and then store required 
-				// values and set indicators. We need to do 
-				// this now since mdio_{read,write} need the 
-				// control and data register addresses.
-				#ifdef CONFIG_BCM5222_DUAL_PHY
-				if ( mii_chip_table[i].dual_phy) {
-
-					/* assume both phys are controlled 
-					 * through MAC0. Board specific? */
-					
-					/* sanity check */
-					if (!au_macs[0] || !au_macs[0]->mii)
-						return -1;
-					aup->mii->mii_control_reg = (u32 *)
-						&au_macs[0]->mac->mii_control;
-					aup->mii->mii_data_reg = (u32 *)
-						&au_macs[0]->mac->mii_data;
-				}
-				#endif
-				goto found;
-			}
-		}
-	}
-found:
+	/* find the first (lowest address) PHY on the current MAC's MII bus */
+	for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++)
+		if (aup->mii_bus.phy_map[phy_addr]) {
+			phydev = aup->mii_bus.phy_map[phy_addr];
+			break; /* found it */
+		}
+
+	/* try harder to find a PHY */
+	if (!phydev && (aup->mac_id == 1)) {
+		/* no PHY found, maybe we have a dual PHY? */
+		printk (KERN_INFO DRV_NAME ": no PHY found on MAC1, "
+			"let's see if it's attached to MAC0...\n");
+
+		BUG_ON(!au_macs[0] || !au_macs[0]->mii_bus);
+
+		/* find the first (lowest address) non-attached PHY on
+		 * the MAC0 MII bus */
+		for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++) {
+			struct phy_device *const tmp_phydev =
+				aup->au_macs[0]->mii_bus.phy_map[phy_addr];
 
-#ifdef CONFIG_MIPS_BOSPORUS
-	/* This is a workaround for the Micrel/Kendin 5 port switch
-	   The second MAC doesn't see a PHY connected... so we need to
-	   trick it into thinking we have one.
-		
-	   If this kernel is run on another Au1500 development board
-	   the stub will be found as well as the actual PHY. However,
-	   the last found PHY will be used... usually at Addr 31 (Db1500).	
-	*/
-	if ( (!phy_found) )
-	{
-		u16 phy_id0, phy_id1;
-		int i;
+			if (!tmp_phydev)
+				continue; /* no PHY here... */
 
-		phy_id0 = 0x1234;
-		phy_id1 = 0x5678;
+			if (tmp_phydev->attached_dev)
+				continue; /* already claimed by MAC0 */
 
-		/* search our mii table for the current mii */ 
-		for (i = 0; mii_chip_table[i].phy_id1; i++) {
-			if (phy_id0 == mii_chip_table[i].phy_id0 &&
-			    phy_id1 == mii_chip_table[i].phy_id1) {
-				struct mii_phy * mii_phy;
-
-				printk(KERN_INFO "%s: %s at phy address %d\n",
-				       dev->name, mii_chip_table[i].name, 
-				       phy_addr);
-				mii_phy = kmalloc(sizeof(struct mii_phy), 
-						GFP_KERNEL);
-				if (mii_phy) {
-					mii_phy->chip_info = mii_chip_table+i;
-					aup->phy_addr = phy_addr;
-					mii_phy->next = aup->mii;
-					aup->phy_ops = 
-						mii_chip_table[i].phy_ops;
-					aup->mii = mii_phy;
-					aup->phy_ops->phy_init(dev,phy_addr);
-				} else {
-					printk(KERN_ERR "%s: out of memory\n", 
-							dev->name);
-					return -1;
-				}
-				mii_phy->chip_info = mii_chip_table+i;
-				aup->phy_addr = phy_addr;
-				aup->phy_ops = mii_chip_table[i].phy_ops;
-				aup->phy_ops->phy_init(dev,phy_addr);
-				break;
-			}
+			phydev = tmp_phydev;
+			break; /* found it */
 		}
 	}
-	if (aup->mac_id == 0) {
-		/* the Bosporus phy responds to addresses 0-5 but 
-		 * 5 is the correct one.
-		 */
-		aup->phy_addr = 5;
-	}
-#endif
 
-	if (aup->mii->chip_info == NULL) {
-		printk(KERN_ERR "%s: Au1x No known MII transceivers found!\n",
-				dev->name);
+#endif
+	if (!phydev) {
+		printk (KERN_ERR DRV_NAME ":%s: no PHY found\n", dev->name);
 		return -1;
 	}
 
-	printk(KERN_INFO "%s: Using %s as default\n", 
-			dev->name, aup->mii->chip_info->name);
+	/* now we are supposed to have a proper phydev, to attach to... */
+	BUG_ON(!phydev);
+
+	phydev = phy_connect(dev, phydev->dev.bus_id, &au1000_adjust_link, 0);
+
+	if (IS_ERR(phydev)) {
+		printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
+		return PTR_ERR(phydev);
+	}
+
+	/* mask with MAC supported features */
+	phydev->supported &= (SUPPORTED_10baseT_Half
+			      | SUPPORTED_10baseT_Full
+			      | SUPPORTED_100baseT_Half
+			      | SUPPORTED_100baseT_Full
+			      | SUPPORTED_Autoneg
+			      /* | SUPPORTED_Pause | SUPPORTED_Asym_Pause */
+			      | SUPPORTED_MII
+			      | SUPPORTED_TP);
+
+	phydev->advertising = phydev->supported;
+
+	aup->old_link = 0;
+	aup->old_speed = 0;
+	aup->old_duplex = -1;
+	aup->phy_dev = phydev;
+
+	printk(KERN_INFO "%s: attached PHY driver [%s] "
+	       "(mii_bus:phy_addr=%s, irq=%d)\n",
+	       dev->name, phydev->drv->name, phydev->dev.bus_id, phydev->irq);
 
 	return 0;
 }
@@ -1097,35 +397,38 @@
 	au_sync_delay(10);
 }
 
-
-static void reset_mac(struct net_device *dev)
+static void enable_mac(struct net_device *dev, int force_reset)
 {
-	int i;
-	u32 flags;
+	unsigned long flags;
 	struct au1000_private *aup = (struct au1000_private *) dev->priv;
 
-	if (au1000_debug > 4) 
-		printk(KERN_INFO "%s: reset mac, aup %x\n", 
-				dev->name, (unsigned)aup);
-
 	spin_lock_irqsave(&aup->lock, flags);
-	if (aup->timer.function == &au1000_timer) {/* check if timer initted */
-		del_timer(&aup->timer);
-	}
 
-	hard_stop(dev);
-	#ifdef CONFIG_BCM5222_DUAL_PHY
-	if (aup->mac_id != 0) {
-	#endif
-		/* If BCM5222, we can't leave MAC0 in reset because then 
-		 * we can't access the dual phy for ETH1 */
+	if(force_reset || (!aup->mac_enabled)) {
 		*aup->enable = MAC_EN_CLOCK_ENABLE;
 		au_sync_delay(2);
-		*aup->enable = 0;
+		*aup->enable = (MAC_EN_RESET0 | MAC_EN_RESET1 | MAC_EN_RESET2
+				| MAC_EN_CLOCK_ENABLE);
 		au_sync_delay(2);
-	#ifdef CONFIG_BCM5222_DUAL_PHY
+
+		aup->mac_enabled = 1;
 	}
-	#endif
+
+	spin_unlock_irqrestore(&aup->lock, flags);
+}
+
+static void reset_mac_unlocked(struct net_device *dev)
+{
+	struct au1000_private *const aup = (struct au1000_private *) dev->priv;
+	int i;
+
+	hard_stop(dev);
+
+	*aup->enable = MAC_EN_CLOCK_ENABLE;
+	au_sync_delay(2);
+	*aup->enable = 0;
+	au_sync_delay(2);
+
 	aup->tx_full = 0;
 	for (i = 0; i < NUM_RX_DMA; i++) {
 		/* reset control bits */
@@ -1135,9 +438,26 @@
 		/* reset control bits */
 		aup->tx_dma_ring[i]->buff_stat &= ~0xf;
 	}
-	spin_unlock_irqrestore(&aup->lock, flags);
+
+	aup->mac_enabled = 0;
+
 }
 
+static void reset_mac(struct net_device *dev)
+{
+	struct au1000_private *const aup = (struct au1000_private *) dev->priv;
+	unsigned long flags;
+
+	if (au1000_debug > 4)
+		printk(KERN_INFO "%s: reset mac, aup %x\n",
+		       dev->name, (unsigned)aup);
+
+	spin_lock_irqsave(&aup->lock, flags);
+
+	reset_mac_unlocked (dev);
+
+	spin_unlock_irqrestore(&aup->lock, flags);
+}
 
 /* 
  * Setup the receive and transmit "rings".  These pointers are the addresses
@@ -1237,178 +557,31 @@
 	return 0;
 }
 
-static int au1000_setup_aneg(struct net_device *dev, u32 advertise)
-{
-	struct au1000_private *aup = (struct au1000_private *)dev->priv;
-	u16 ctl, adv;
-
-	/* Setup standard advertise */
-	adv = mdio_read(dev, aup->phy_addr, MII_ADVERTISE);
-	adv &= ~(ADVERTISE_ALL | ADVERTISE_100BASE4);
-	if (advertise & ADVERTISED_10baseT_Half)
-		adv |= ADVERTISE_10HALF;
-	if (advertise & ADVERTISED_10baseT_Full)
-		adv |= ADVERTISE_10FULL;
-	if (advertise & ADVERTISED_100baseT_Half)
-		adv |= ADVERTISE_100HALF;
-	if (advertise & ADVERTISED_100baseT_Full)
-		adv |= ADVERTISE_100FULL;
-	mdio_write(dev, aup->phy_addr, MII_ADVERTISE, adv);
-
-	/* Start/Restart aneg */
-	ctl = mdio_read(dev, aup->phy_addr, MII_BMCR);
-	ctl |= (BMCR_ANENABLE | BMCR_ANRESTART);
-	mdio_write(dev, aup->phy_addr, MII_BMCR, ctl);
-
-	return 0;
-}
-
-static int au1000_setup_forced(struct net_device *dev, int speed, int fd)
-{
-	struct au1000_private *aup = (struct au1000_private *)dev->priv;
-	u16 ctl;
-
-	ctl = mdio_read(dev, aup->phy_addr, MII_BMCR);
-	ctl &= ~(BMCR_FULLDPLX | BMCR_SPEED100 | BMCR_ANENABLE);
-
-	/* First reset the PHY */
-	mdio_write(dev, aup->phy_addr, MII_BMCR, ctl | BMCR_RESET);
-
-	/* Select speed & duplex */
-	switch (speed) {
-		case SPEED_10:
-			break;
-		case SPEED_100:
-			ctl |= BMCR_SPEED100;
-			break;
-		case SPEED_1000:
-		default:
-			return -EINVAL;
-	}
-	if (fd == DUPLEX_FULL)
-		ctl |= BMCR_FULLDPLX;
-	mdio_write(dev, aup->phy_addr, MII_BMCR, ctl);
-
-	return 0;
-}
-
-
-static void
-au1000_start_link(struct net_device *dev, struct ethtool_cmd *cmd)
-{
-	struct au1000_private *aup = (struct au1000_private *)dev->priv;
-	u32 advertise;
-	int autoneg;
-	int forced_speed;
-	int forced_duplex;
-
-	/* Default advertise */
-	advertise = GENMII_DEFAULT_ADVERTISE;
-	autoneg = aup->want_autoneg;
-	forced_speed = SPEED_100;
-	forced_duplex = DUPLEX_FULL;
-
-	/* Setup link parameters */
-	if (cmd) {
-		if (cmd->autoneg == AUTONEG_ENABLE) {
-			advertise = cmd->advertising;
-			autoneg = 1;
-		} else {
-			autoneg = 0;
-
-			forced_speed = cmd->speed;
-			forced_duplex = cmd->duplex;
-		}
-	}
-
-	/* Configure PHY & start aneg */
-	aup->want_autoneg = autoneg;
-	if (autoneg)
-		au1000_setup_aneg(dev, advertise);
-	else
-		au1000_setup_forced(dev, forced_speed, forced_duplex);
-	mod_timer(&aup->timer, jiffies + HZ);
-}
+/*
+ * ethtool operations
+ */
 
 static int au1000_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 {
 	struct au1000_private *aup = (struct au1000_private *)dev->priv;
-	u16 link, speed;
 
-	cmd->supported = GENMII_DEFAULT_FEATURES;
-	cmd->advertising = GENMII_DEFAULT_ADVERTISE;
-	cmd->port = PORT_MII;
-	cmd->transceiver = XCVR_EXTERNAL;
-	cmd->phy_address = aup->phy_addr;
-	spin_lock_irq(&aup->lock);
-	cmd->autoneg = aup->want_autoneg;
-	aup->phy_ops->phy_status(dev, aup->phy_addr, &link, &speed);
-	if ((speed == IF_PORT_100BASETX) || (speed == IF_PORT_100BASEFX))
-		cmd->speed = SPEED_100;
-	else if (speed == IF_PORT_10BASET)
-		cmd->speed = SPEED_10;
-	if (link && (dev->if_port == IF_PORT_100BASEFX))
-		cmd->duplex = DUPLEX_FULL;
-	else
-		cmd->duplex = DUPLEX_HALF;
-	spin_unlock_irq(&aup->lock);
-	return 0;
+	if (aup->phy_dev)
+		return phy_ethtool_gset(aup->phy_dev, cmd);
+
+	return -EINVAL;
 }
 
 static int au1000_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 {
-	 struct au1000_private *aup = (struct au1000_private *)dev->priv;
-	  unsigned long features = GENMII_DEFAULT_FEATURES;
+	struct au1000_private *aup = (struct au1000_private *)dev->priv;
 
-	 if (!capable(CAP_NET_ADMIN))
-		 return -EPERM;
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
 
-	 if (cmd->autoneg != AUTONEG_ENABLE && cmd->autoneg != AUTONEG_DISABLE)
-		 return -EINVAL;
-	 if (cmd->autoneg == AUTONEG_ENABLE && cmd->advertising == 0)
-		 return -EINVAL;
-	 if (cmd->duplex != DUPLEX_HALF && cmd->duplex != DUPLEX_FULL)
-		 return -EINVAL;
-	 if (cmd->autoneg == AUTONEG_DISABLE)
-		 switch (cmd->speed) {
-		 case SPEED_10:
-			 if (cmd->duplex == DUPLEX_HALF &&
-				 (features & SUPPORTED_10baseT_Half) == 0)
-				 return -EINVAL;
-			 if (cmd->duplex == DUPLEX_FULL &&
-				 (features & SUPPORTED_10baseT_Full) == 0)
-				 return -EINVAL;
-			 break;
-		 case SPEED_100:
-			 if (cmd->duplex == DUPLEX_HALF &&
-				 (features & SUPPORTED_100baseT_Half) == 0)
-				 return -EINVAL;
-			 if (cmd->duplex == DUPLEX_FULL &&
-				 (features & SUPPORTED_100baseT_Full) == 0)
-				 return -EINVAL;
-			 break;
-		 default:
-			 return -EINVAL;
-		 }
-	 else if ((features & SUPPORTED_Autoneg) == 0)
-		 return -EINVAL;
-
-	 spin_lock_irq(&aup->lock);
-	 au1000_start_link(dev, cmd);
-	 spin_unlock_irq(&aup->lock);
-	 return 0;
-}
+	if (aup->phy_dev)
+		return phy_ethtool_sset(aup->phy_dev, cmd);
 
-static int au1000_nway_reset(struct net_device *dev)
-{
-	struct au1000_private *aup = (struct au1000_private *)dev->priv;
-
-	if (!aup->want_autoneg)
-		return -EINVAL;
-	spin_lock_irq(&aup->lock);
-	au1000_start_link(dev, NULL);
-	spin_unlock_irq(&aup->lock);
-	return 0;
+	return -EINVAL;
 }
 
 static void
@@ -1423,19 +596,15 @@
 	info->regdump_len = 0;
 }
 
-static u32 au1000_get_link(struct net_device *dev)
-{
-	return netif_carrier_ok(dev);
-}
-
 static struct ethtool_ops au1000_ethtool_ops = {
 	.get_settings = au1000_get_settings,
 	.set_settings = au1000_set_settings,
 	.get_drvinfo = au1000_get_drvinfo,
-	.nway_reset = au1000_nway_reset,
-	.get_link = au1000_get_link
+	.get_link = ethtool_op_get_link,
 };
 
+
+
 static struct net_device *
 au1000_probe(u32 ioaddr, int irq, int port_num)
 {
@@ -1527,24 +696,22 @@
 		printk(KERN_ERR "%s: bad ioaddr\n", dev->name);
 	}
 
-	/* bring the device out of reset, otherwise probing the mii
-	 * will hang */
-	*aup->enable = MAC_EN_CLOCK_ENABLE;
-	au_sync_delay(2);
-	*aup->enable = MAC_EN_RESET0 | MAC_EN_RESET1 | 
-		MAC_EN_RESET2 | MAC_EN_CLOCK_ENABLE;
-	au_sync_delay(2);
+	*aup->enable = 0;
+	aup->mac_enabled = 0;
 
-	aup->mii = kmalloc(sizeof(struct mii_phy), GFP_KERNEL);
-	if (!aup->mii) {
-		printk(KERN_ERR "%s: out of memory\n", dev->name);
-		goto err_out;
+	aup->mii_bus.priv = dev;
+	aup->mii_bus.read = mdiobus_read;
+	aup->mii_bus.write = mdiobus_write;
+	aup->mii_bus.reset = mdiobus_reset;
+	aup->mii_bus.name = "au1000_eth_mii";
+	aup->mii_bus.id = aup->mac_id;
+	aup->mii_bus.irq = kmalloc(sizeof(int)*PHY_MAX_ADDR, GFP_KERNEL);
+	for(i = 0; i < PHY_MAX_ADDR; ++i) {
+		aup->mii_bus.phy_map[i] = NULL; /* paranoia */
+		aup->mii_bus.irq[i] = PHY_POLL;
 	}
-	aup->mii->next = NULL;
-	aup->mii->chip_info = NULL;
-	aup->mii->status = 0;
-	aup->mii->mii_control_reg = 0;
-	aup->mii->mii_data_reg = 0;
+
+	mdiobus_register(&aup->mii_bus);
 
 	if (mii_probe(dev) != 0) {
 		goto err_out;
@@ -1590,7 +757,6 @@
 	dev->set_multicast_list = &set_rx_mode;
 	dev->do_ioctl = &au1000_ioctl;
 	SET_ETHTOOL_OPS(dev, &au1000_ethtool_ops);
-	dev->set_config = &au1000_set_config;
 	dev->tx_timeout = au1000_tx_timeout;
 	dev->watchdog_timeo = ETH_TX_TIMEOUT;
 
@@ -1606,7 +772,7 @@
 	/* here we should have a valid dev plus aup-> register addresses
 	 * so we can reset the mac properly.*/
 	reset_mac(dev);
-	kfree(aup->mii);
+
 	for (i = 0; i < NUM_RX_DMA; i++) {
 		if (aup->rx_db_inuse[i])
 			ReleaseDB(aup, aup->rx_db_inuse[i]);
@@ -1640,19 +806,14 @@
 	u32 flags;
 	int i;
 	u32 control;
-	u16 link, speed;
 
 	if (au1000_debug > 4) 
 		printk("%s: au1000_init\n", dev->name);
 
-	spin_lock_irqsave(&aup->lock, flags);
-
 	/* bring the device out of reset */
-	*aup->enable = MAC_EN_CLOCK_ENABLE;
-        au_sync_delay(2);
-	*aup->enable = MAC_EN_RESET0 | MAC_EN_RESET1 | 
-		MAC_EN_RESET2 | MAC_EN_CLOCK_ENABLE;
-	au_sync_delay(20);
+	enable_mac(dev, 1);
+
+	spin_lock_irqsave(&aup->lock, flags);
 
 	aup->mac->control = 0;
 	aup->tx_head = (aup->tx_dma_ring[0]->buff_stat & 0xC) >> 2;
@@ -1668,12 +829,16 @@
 	}
 	au_sync();
 
-	aup->phy_ops->phy_status(dev, aup->phy_addr, &link, &speed);
-	control = MAC_DISABLE_RX_OWN | MAC_RX_ENABLE | MAC_TX_ENABLE;
+	control = MAC_RX_ENABLE | MAC_TX_ENABLE;
 #ifndef CONFIG_CPU_LITTLE_ENDIAN
 	control |= MAC_BIG_ENDIAN;
 #endif
-	if (link && (dev->if_port == IF_PORT_100BASEFX)) {
+	if (aup->phy_dev) {
+		if (aup->phy_dev->link && (DUPLEX_FULL == aup->phy_dev->duplex))
+			control |= MAC_FULL_DUPLEX;
+		else
+			control |= MAC_DISABLE_RX_OWN;
+	} else { /* PHY-less op, assume full-duplex */
 		control |= MAC_FULL_DUPLEX;
 	}
 
@@ -1685,57 +850,84 @@
 	return 0;
 }
 
-static void au1000_timer(unsigned long data)
+static void
+au1000_adjust_link(struct net_device *dev)
 {
-	struct net_device *dev = (struct net_device *)data;
 	struct au1000_private *aup = (struct au1000_private *) dev->priv;
-	unsigned char if_port;
-	u16 link, speed;
+	struct phy_device *phydev = aup->phy_dev;
+	unsigned long flags;
 
-	if (!dev) {
-		/* fatal error, don't restart the timer */
-		printk(KERN_ERR "au1000_timer error: NULL dev\n");
-		return;
-	}
+	int status_change = 0;
 
-	if_port = dev->if_port;
-	if (aup->phy_ops->phy_status(dev, aup->phy_addr, &link, &speed) == 0) {
-		if (link) {
-			if (!netif_carrier_ok(dev)) {
-				netif_carrier_on(dev);
-				printk(KERN_INFO "%s: link up\n", dev->name);
-			}
-		}
-		else {
-			if (netif_carrier_ok(dev)) {
-				netif_carrier_off(dev);
-				dev->if_port = 0;
-				printk(KERN_INFO "%s: link down\n", dev->name);
-			}
+	BUG_ON(!aup->phy_dev);
+
+	spin_lock_irqsave(&aup->lock, flags);
+
+	if (phydev->link && (aup->old_speed != phydev->speed)) {
+		// speed changed
+
+		switch(phydev->speed) {
+		case SPEED_10:
+		case SPEED_100:
+			break;
+		default:
+			printk(KERN_WARNING
+			       "%s: Speed (%d) is not 10/100 ???\n",
+			       dev->name, phydev->speed);
+			break;
 		}
+
+		aup->old_speed = phydev->speed;
+
+		status_change = 1;
 	}
 
-	if (link && (dev->if_port != if_port) && 
-			(dev->if_port != IF_PORT_UNKNOWN)) {
+	if (phydev->link && (aup->old_duplex != phydev->duplex)) {
+		// duplex mode changed
+
+		/* switching duplex mode requires to disable rx and tx! */
 		hard_stop(dev);
-		if (dev->if_port == IF_PORT_100BASEFX) {
-			printk(KERN_INFO "%s: going to full duplex\n", 
-					dev->name);
-			aup->mac->control |= MAC_FULL_DUPLEX;
-			au_sync_delay(1);
-		}
-		else {
-			aup->mac->control &= ~MAC_FULL_DUPLEX;
-			au_sync_delay(1);
-		}
+
+		if (DUPLEX_FULL == phydev->duplex)
+			aup->mac->control = ((aup->mac->control
+					     | MAC_FULL_DUPLEX)
+					     & ~MAC_DISABLE_RX_OWN);
+		else
+			aup->mac->control = ((aup->mac->control
+					      & ~MAC_FULL_DUPLEX)
+					     | MAC_DISABLE_RX_OWN);
+		au_sync_delay(1);
+
 		enable_rx_tx(dev);
+		aup->old_duplex = phydev->duplex;
+
+		status_change = 1;
+	}
+
+	if(phydev->link != aup->old_link) {
+		// link state changed
+
+		if (phydev->link) // link went up
+			netif_schedule(dev);
+		else { // link went down
+			aup->old_speed = 0;
+			aup->old_duplex = -1;
+		}
+
+		aup->old_link = phydev->link;
+		status_change = 1;
 	}
 
-	aup->timer.expires = RUN_AT((1*HZ)); 
-	aup->timer.data = (unsigned long)dev;
-	aup->timer.function = &au1000_timer; /* timer handler */
-	add_timer(&aup->timer);
+	spin_unlock_irqrestore(&aup->lock, flags);
 
+	if (status_change) {
+		if (phydev->link)
+			printk(KERN_INFO "%s: link up (%d/%s)\n",
+			       dev->name, phydev->speed,
+			       DUPLEX_FULL == phydev->duplex ? "Full" : "Half");
+		else
+			printk(KERN_INFO "%s: link down\n", dev->name);
+	}
 }
 
 static int au1000_open(struct net_device *dev)
@@ -1746,25 +938,26 @@
 	if (au1000_debug > 4)
 		printk("%s: open: dev=%p\n", dev->name, dev);
 
+	if ((retval = request_irq(dev->irq, &au1000_interrupt, 0,
+					dev->name, dev))) {
+		printk(KERN_ERR "%s: unable to get IRQ %d\n",
+				dev->name, dev->irq);
+		return retval;
+	}
+
 	if ((retval = au1000_init(dev))) {
 		printk(KERN_ERR "%s: error in au1000_init\n", dev->name);
 		free_irq(dev->irq, dev);
 		return retval;
 	}
-	netif_start_queue(dev);
 
-	if ((retval = request_irq(dev->irq, &au1000_interrupt, 0, 
-					dev->name, dev))) {
-		printk(KERN_ERR "%s: unable to get IRQ %d\n", 
-				dev->name, dev->irq);
-		return retval;
+	if (aup->phy_dev) {
+		/* cause the PHY state machine to schedule a link state check */
+		aup->phy_dev->state = PHY_CHANGELINK;
+		phy_start(aup->phy_dev);
 	}
 
-	init_timer(&aup->timer); /* used in ioctl() */
-	aup->timer.expires = RUN_AT((3*HZ)); 
-	aup->timer.data = (unsigned long)dev;
-	aup->timer.function = &au1000_timer; /* timer handler */
-	add_timer(&aup->timer);
+	netif_start_queue(dev);
 
 	if (au1000_debug > 4)
 		printk("%s: open: Initialization done.\n", dev->name);
@@ -1774,16 +967,19 @@
 
 static int au1000_close(struct net_device *dev)
 {
-	u32 flags;
-	struct au1000_private *aup = (struct au1000_private *) dev->priv;
+	unsigned long flags;
+	struct au1000_private *const aup = (struct au1000_private *) dev->priv;
 
 	if (au1000_debug > 4)
 		printk("%s: close: dev=%p\n", dev->name, dev);
 
-	reset_mac(dev);
+	if (aup->phy_dev)
+		phy_stop(aup->phy_dev);
 
 	spin_lock_irqsave(&aup->lock, flags);
-	
+
+	reset_mac_unlocked (dev);
+
 	/* stop the device */
 	netif_stop_queue(dev);
 
@@ -1805,7 +1001,7 @@
 		if (dev) {
 			aup = (struct au1000_private *) dev->priv;
 			unregister_netdev(dev);
-			kfree(aup->mii);
+
 			for (j = 0; j < NUM_RX_DMA; j++) {
 				if (aup->rx_db_inuse[j])
 					ReleaseDB(aup, aup->rx_db_inuse[j]);
@@ -1830,7 +1026,7 @@
 	struct net_device_stats *ps = &aup->stats;
 
 	if (status & TX_FRAME_ABORTED) {
-		if (dev->if_port == IF_PORT_100BASEFX) {
+		if (!aup->phy_dev || (DUPLEX_FULL == aup->phy_dev->duplex)) {
 			if (status & (TX_JAB_TIMEOUT | TX_UNDERRUN)) {
 				/* any other tx errors are only valid
 				 * in half duplex mode */
@@ -2104,126 +1300,15 @@
 	}
 }
 
-
 static int au1000_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
 	struct au1000_private *aup = (struct au1000_private *)dev->priv;
-	u16 *data = (u16 *)&rq->ifr_ifru;
 
-	switch(cmd) { 
-		case SIOCDEVPRIVATE:	/* Get the address of the PHY in use. */
-		case SIOCGMIIPHY:
-		        if (!netif_running(dev)) return -EINVAL;
-			data[0] = aup->phy_addr;
-		case SIOCDEVPRIVATE+1:	/* Read the specified MII register. */
-		case SIOCGMIIREG:
-			data[3] =  mdio_read(dev, data[0], data[1]); 
-			return 0;
-		case SIOCDEVPRIVATE+2:	/* Write the specified MII register */
-		case SIOCSMIIREG: 
-			if (!capable(CAP_NET_ADMIN))
-				return -EPERM;
-			mdio_write(dev, data[0], data[1],data[2]);
-			return 0;
-		default:
-			return -EOPNOTSUPP;
-	}
+	if (!netif_running(dev)) return -EINVAL;
 
-}
+	if (!aup->phy_dev) return -EINVAL; // PHY not controllable
 
-
-static int au1000_set_config(struct net_device *dev, struct ifmap *map)
-{
-	struct au1000_private *aup = (struct au1000_private *) dev->priv;
-	u16 control;
-
-	if (au1000_debug > 4)  {
-		printk("%s: set_config called: dev->if_port %d map->port %x\n", 
-				dev->name, dev->if_port, map->port);
-	}
-
-	switch(map->port){
-		case IF_PORT_UNKNOWN: /* use auto here */   
-			printk(KERN_INFO "%s: config phy for aneg\n", 
-					dev->name);
-			dev->if_port = map->port;
-			/* Link Down: the timer will bring it up */
-			netif_carrier_off(dev);
-	
-			/* read current control */
-			control = mdio_read(dev, aup->phy_addr, MII_CONTROL);
-			control &= ~(MII_CNTL_FDX | MII_CNTL_F100);
-
-			/* enable auto negotiation and reset the negotiation */
-			mdio_write(dev, aup->phy_addr, MII_CONTROL, 
-					control | MII_CNTL_AUTO | 
-					MII_CNTL_RST_AUTO);
-
-			break;
-    
-		case IF_PORT_10BASET: /* 10BaseT */         
-			printk(KERN_INFO "%s: config phy for 10BaseT\n", 
-					dev->name);
-			dev->if_port = map->port;
-	
-			/* Link Down: the timer will bring it up */
-			netif_carrier_off(dev);
-
-			/* set Speed to 10Mbps, Half Duplex */
-			control = mdio_read(dev, aup->phy_addr, MII_CONTROL);
-			control &= ~(MII_CNTL_F100 | MII_CNTL_AUTO | 
-					MII_CNTL_FDX);
-	
-			/* disable auto negotiation and force 10M/HD mode*/
-			mdio_write(dev, aup->phy_addr, MII_CONTROL, control);
-			break;
-    
-		case IF_PORT_100BASET: /* 100BaseT */
-		case IF_PORT_100BASETX: /* 100BaseTx */ 
-			printk(KERN_INFO "%s: config phy for 100BaseTX\n", 
-					dev->name);
-			dev->if_port = map->port;
-	
-			/* Link Down: the timer will bring it up */
-			netif_carrier_off(dev);
-	
-			/* set Speed to 100Mbps, Half Duplex */
-			/* disable auto negotiation and enable 100MBit Mode */
-			control = mdio_read(dev, aup->phy_addr, MII_CONTROL);
-			control &= ~(MII_CNTL_AUTO | MII_CNTL_FDX);
-			control |= MII_CNTL_F100;
-			mdio_write(dev, aup->phy_addr, MII_CONTROL, control);
-			break;
-    
-		case IF_PORT_100BASEFX: /* 100BaseFx */
-			printk(KERN_INFO "%s: config phy for 100BaseFX\n", 
-					dev->name);
-			dev->if_port = map->port;
-	
-			/* Link Down: the timer will bring it up */
-			netif_carrier_off(dev);
-	
-			/* set Speed to 100Mbps, Full Duplex */
-			/* disable auto negotiation and enable 100MBit Mode */
-			control = mdio_read(dev, aup->phy_addr, MII_CONTROL);
-			control &= ~MII_CNTL_AUTO;
-			control |=  MII_CNTL_F100 | MII_CNTL_FDX;
-			mdio_write(dev, aup->phy_addr, MII_CONTROL, control);
-			break;
-		case IF_PORT_10BASE2: /* 10Base2 */
-		case IF_PORT_AUI: /* AUI */
-		/* These Modes are not supported (are they?)*/
-			printk(KERN_ERR "%s: 10Base2/AUI not supported", 
-					dev->name);
-			return -EOPNOTSUPP;
-			break;
-    
-		default:
-			printk(KERN_ERR "%s: Invalid media selected", 
-					dev->name);
-			return -EINVAL;
-	}
-	return 0;
+	return phy_mii_ioctl(aup->phy_dev, if_mii(rq), cmd);
 }
 
 static struct net_device_stats *au1000_get_stats(struct net_device *dev)
Index: gw1550kernel/drivers/net/au1000_eth.h
===================================================================
--- gw1550kernel.orig/drivers/net/au1000_eth.h	2006-05-09 03:01:45.000000000 +0200
+++ gw1550kernel/drivers/net/au1000_eth.h	2006-05-09 03:21:19.000000000 +0200
@@ -40,120 +40,6 @@
 
 #define MULTICAST_FILTER_LIMIT 64
 
-/* FIXME 
- * The PHY defines should be in a separate file.
- */
-
-/* MII register offsets */
-#define	MII_CONTROL 0x0000
-#define MII_STATUS  0x0001
-#define MII_PHY_ID0 0x0002
-#define	MII_PHY_ID1 0x0003
-#define MII_ANADV   0x0004
-#define MII_ANLPAR  0x0005
-#define MII_AEXP    0x0006
-#define MII_ANEXT   0x0007
-#define MII_LSI_PHY_CONFIG 0x0011
-/* Status register */
-#define MII_LSI_PHY_STAT   0x0012
-#define MII_AMD_PHY_STAT   MII_LSI_PHY_STAT
-#define MII_INTEL_PHY_STAT 0x0011
-
-#define MII_AUX_CNTRL  0x0018
-/* mii registers specific to AMD 79C901 */
-#define	MII_STATUS_SUMMARY = 0x0018
-
-/* MII Control register bit definitions. */
-#define	MII_CNTL_FDX      0x0100
-#define MII_CNTL_RST_AUTO 0x0200
-#define	MII_CNTL_ISOLATE  0x0400
-#define MII_CNTL_PWRDWN   0x0800
-#define	MII_CNTL_AUTO     0x1000
-#define MII_CNTL_F100     0x2000
-#define	MII_CNTL_LPBK     0x4000
-#define MII_CNTL_RESET    0x8000
-
-/* MII Status register bit  */
-#define	MII_STAT_EXT        0x0001 
-#define MII_STAT_JAB        0x0002
-#define	MII_STAT_LINK       0x0004
-#define MII_STAT_CAN_AUTO   0x0008
-#define	MII_STAT_FAULT      0x0010 
-#define MII_STAT_AUTO_DONE  0x0020
-#define	MII_STAT_CAN_T      0x0800
-#define MII_STAT_CAN_T_FDX  0x1000
-#define	MII_STAT_CAN_TX     0x2000 
-#define MII_STAT_CAN_TX_FDX 0x4000
-#define	MII_STAT_CAN_T4     0x8000
-
-
-#define		MII_ID1_OUI_LO		0xFC00	/* low bits of OUI mask */
-#define		MII_ID1_MODEL		0x03F0	/* model number */
-#define		MII_ID1_REV		0x000F	/* model number */
-
-/* MII NWAY Register Bits ...
-   valid for the ANAR (Auto-Negotiation Advertisement) and
-   ANLPAR (Auto-Negotiation Link Partner) registers */
-#define	MII_NWAY_NODE_SEL 0x001f
-#define MII_NWAY_CSMA_CD  0x0001
-#define	MII_NWAY_T	  0x0020
-#define MII_NWAY_T_FDX    0x0040
-#define	MII_NWAY_TX       0x0080
-#define MII_NWAY_TX_FDX   0x0100
-#define	MII_NWAY_T4       0x0200 
-#define MII_NWAY_PAUSE    0x0400 
-#define	MII_NWAY_RF       0x2000 /* Remote Fault */
-#define MII_NWAY_ACK      0x4000 /* Remote Acknowledge */
-#define	MII_NWAY_NP       0x8000 /* Next Page (Enable) */
-
-/* mii stsout register bits */
-#define	MII_STSOUT_LINK_FAIL 0x4000
-#define	MII_STSOUT_SPD       0x0080
-#define MII_STSOUT_DPLX      0x0040
-
-/* mii stsics register bits */
-#define	MII_STSICS_SPD       0x8000
-#define MII_STSICS_DPLX      0x4000
-#define	MII_STSICS_LINKSTS   0x0001
-
-/* mii stssum register bits */
-#define	MII_STSSUM_LINK  0x0008
-#define MII_STSSUM_DPLX  0x0004
-#define	MII_STSSUM_AUTO  0x0002
-#define MII_STSSUM_SPD   0x0001
-
-/* lsi phy status register */
-#define MII_LSI_PHY_STAT_FDX	0x0040
-#define MII_LSI_PHY_STAT_SPD	0x0080
-
-/* amd phy status register */
-#define MII_AMD_PHY_STAT_FDX	0x0800
-#define MII_AMD_PHY_STAT_SPD	0x0400
-
-/* intel phy status register */
-#define MII_INTEL_PHY_STAT_FDX	0x0200
-#define MII_INTEL_PHY_STAT_SPD	0x4000
-
-/* Auxilliary Control/Status Register */
-#define MII_AUX_FDX      0x0001
-#define MII_AUX_100      0x0002
-#define MII_AUX_F100     0x0004
-#define MII_AUX_ANEG     0x0008
-
-typedef struct mii_phy {
-	struct mii_phy * next;
-	struct mii_chip_info * chip_info;
-	u16 status;
-	u32 *mii_control_reg;
-	u32 *mii_data_reg;
-} mii_phy_t;
-
-struct phy_ops {
-	int (*phy_init) (struct net_device *, int);
-	int (*phy_reset) (struct net_device *, int);
-	int (*phy_status) (struct net_device *, int, u16 *, u16 *);
-};
-
 /* 
  * Data Buffer Descriptor. Data buffers must be aligned on 32 byte 
  * boundary for both, receive and transmit.
@@ -200,7 +86,6 @@
 
 
 struct au1000_private {
-	
 	db_dest_t *pDBfree;
 	db_dest_t db[NUM_RX_BUFFS+NUM_TX_BUFFS];
 	volatile rx_dma_t *rx_dma_ring[NUM_RX_DMA];
@@ -213,9 +98,16 @@
 	u32 tx_full;
 
 	int mac_id;
-	mii_phy_t *mii;
-	struct mii_if_info mii_if;
-	struct phy_ops *phy_ops;
+
+	int mac_enabled;       /* whether MAC is currently enabled and running (req. for mdio) */
+
+	int old_link;          /* used by au1000_adjust_link */
+	int old_speed;
+	int old_duplex;
+
+	int phy_addr;          /* phy address */
+	struct phy_device *phy_dev;
+	struct mii_bus mii_bus;
 	
 	/* These variables are just for quick access to certain regs addresses. */
 	volatile mac_reg_t *mac;  /* mac registers                      */   
@@ -227,11 +119,9 @@
 	u8 *hash_table;
 	u32 hash_mode;
 	u32 intr_work_done; /* number of Rx and Tx pkts processed in the isr */
-	int phy_addr;          /* phy address */
 	u32 options;           /* User-settable misc. driver options. */
 	u32 drv_flags;
-	int want_autoneg;
+
 	struct net_device_stats stats;
-	struct timer_list timer;
 	spinlock_t lock;       /* Serialise access to device */
 };

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* r8169: only works in 10mbit/half duplex
From: Johannes Berg @ 2006-05-09  1:12 UTC (permalink / raw)
  To: romieu; +Cc: netdev

Hi,

New system, new trouble -- I got a system with a r8169 integrated (AOpen
MiniPC) and this card only works if I force it to 10mbit half duplex by
using media=1 while loading it.

Even ethtool appears to have no effect, and under windows the device
also is only used in 10mbit mode (only installed that to find out!).

At 100mbit, sending works (I can see the dhcp packets for example) but
receiving doesn't, and the tx/rx led blinks frantically when I enable it
(via auto negotiation or manually). Haven't tried gbit yet.

I'm thinking this is a hardware failure, is there any reason to believe
otherwise?

Thanks,
johannes

^ permalink raw reply

* Please pull updated network drivers
From: Stephen Hemminger @ 2006-05-08 23:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev

These fixes are for 2.6.17, please excuse my git learning curve.
I have about had it for today.

git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/netdev-2.6.git upstream

Daniel Drake:
      softmac: don't reassociate if user asked for deauthentication
      softmac: make non-operational after being stopped

David Woodhouse:
      bcm43xx: Fix access to non-existent PHY registers

Herbert Valerio Riedel:
      au1000_eth.c: use ether_crc() from <linux/crc32.h>

Jean Delvare:
      ieee80211: Fix A band channel count (resent)

Jens Osterkamp:
      spidernet: introduce new setting
      spidernet: enable support for bcm5461 ethernet phy

Michael Buesch:
      bcm43xx: fix iwmode crash when down
      bcm43xx: Fix array overrun in bcm43xx_geo_init

Sergei Shtylyov:
      Fix RTL8019AS init for Toshiba RBTX49xx boards

Stefano Brivio:
      bcm43xx: check for valid MAC address in SPROM

Stephen Hemminger:
      sky2: backout NAPI reschedule
      sky2: status irq hang fix
      sky2: tx ring index mask fix
      sky2: use mask instead of modulo operation
      sky2: edge triggered workaround enhancement
      sky2: dont write status ring
      sky2: synchronize irq on remove
      Add more support for the Yukon Ultra chip found in dual core
centino laptops. sky2: version 1.3
      Merge branch 'upstream-fixes' of
git://git.kernel.org/.../linville/wireless-2.6

^ permalink raw reply

* iproute2 git repository
From: Stephen Hemminger @ 2006-05-08 22:34 UTC (permalink / raw)
  To: netdev

I moved iproute2 out of CVS. New home is:
	git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

Will keep CVS tree up to date until the next release, after that it is will
rest in peace.

^ permalink raw reply

* Re: RFC: new WIP version of au1000_eth.c phylib conversion (was Re: RFC: au1000_etc.c phylib rewrite)
From: Andy Fleming @ 2006-05-08 22:24 UTC (permalink / raw)
  To: Herbert Valerio Riedel
  Cc: Mark Schank, ppopov, sshtylyov, linux-mips, jgarzik, netdev,
	Ralf Baechle, Robin H. Johnson
In-Reply-To: <1146734223.31241.44.camel@localhost.localdomain>


On May 4, 2006, at 04:17, Herbert Valerio Riedel wrote:

> Hello,
>
> I've tried to adapt the PHY detection code to allow for dynamic  
> runtime
> configuration (with fallback to search for the 2nd MAC PHY on the 1st
> MAC's MII bus), as well as selectable static PHY configuration through
> Kconfig (e.g. for supporting PHYs w/o MII connection)

Comments inline, below:

> diff --git a/drivers/net/au1000_eth.c b/drivers/net/au1000_eth.c
> index 0823cb8..8c0b26f 100644
> --- a/drivers/net/au1000_eth.c
> +++ b/drivers/net/au1000_eth.c
> @@ -9,6 +9,9 @@

[snip]

> -/* FIXME
> - * All of the PHY code really should be detached from the MAC
> - * code.
> - */
> -

[snip]

Nothing to say so far, except seeing all those "-" signs was quite  
the thrill, since one of the goals of the phylib was to lead to  
reduced complexity.  That said, it looked like there were about a  
dozen PHY-specific code blocks in there.  I saw you submit one PHY  
driver.  Were there others in there that could be ported?

[snip]

> +static int mdiobus_read(struct mii_bus *bus, int phy_addr, int  
> regnum)
> {
> -	int i, val;
> +	struct net_device *const dev = bus->priv; /* beware: bus->phy_map 
> [phy_addr].attached_dev == dev does _NOT_ hold always  */
> +	enable_mac(dev, 0); /* make sure MAC associated with this mii_bus  
> is enabled */
> +	return mdio_read(dev, phy_addr, regnum);
> +}


Why is attached_dev not always correct?  I'm not sure if I'm not  
understanding the hardware (I'm unfamiliar with this NIC), or if  
you've misinterpreted the meaning of the attached_dev field.  It's  
supposed to be a connection between the network device and a PHY,  
mainly used for allowing the PHY to signal state changes back to the  
ethernet device.  Is it actually the case that there is one MAC being  
used for two PHYs at the same time?  If so, how do you resolve which  
PHY's state gets used at any given moment?

The same question applies for the code in mdiobus_write()

[snip]

> -static int mii_probe (struct net_device * dev)
> +static int mdiobus_reset(struct mii_bus *bus)
> {
> -	struct au1000_private *aup = (struct au1000_private *) dev->priv;
> -	int phy_addr;
> -#ifdef CONFIG_MIPS_BOSPORUS
> -	int phy_found=0;
> -#endif
> +	struct net_device *dev = bus->priv;
> -	/* search for total of 32 possible mii phy addresses */
> -	for (phy_addr = 0; phy_addr < 32; phy_addr++) {
> -		u16 mii_status;
> -		u16 phy_id0, phy_id1;
> -		int i;
> +	enable_mac(dev, 0); /* make sure MAC associated with this mii_bus  
> is enabled */
>

Do you need to call enable_mac() every time?  If it needs to be up,  
wouldn't it be easier to make sure it's up during bus initialization?

[snip]

> 	aup->mac->control = control;
> @@ -1685,57 +808,75 @@ static int au1000_init(struct net_device
>

[snip]

> +
> +	if (phydev->link && (aup->old_speed != phydev->speed)) {
> +		// speed changed
> +
> +		switch(phydev->speed) {
> +		case 10:
> +		case 100:
> +			break;
> +		default:
> +			printk(KERN_WARNING
> +			       "%s: Speed (%d) is not 10/100/1000 ??\n",
> +			       dev->name, phydev->speed);
> +			break;
> 		}


Might want to change that to be "...not 10/100..." or add a case for  
1000.

[snip]

> +	spin_unlock_irqrestore(&aup->lock, flags);
> +	if (status_change) {
> +		phy_print_status(phydev);
> +	}
> }


Stylistic issue (I've seen it a couple times, at least):  don't use  
"{" and "}" if your block only has one line.
ie:
	if (status_change)
		phy_print_status(phydev);




^ permalink raw reply

* [patch 7/9] sky2: synchronize irq on remove
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-sync-irq.patch --]
[-- Type: text/plain, Size: 453 bytes --]

Need to make sure interrupt is not racing with unregister of
network device.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

--- sky2.orig/drivers/net/sky2.c	2006-05-02 09:49:47.000000000 -0700
+++ sky2/drivers/net/sky2.c	2006-05-02 10:02:32.000000000 -0700
@@ -3327,6 +3327,8 @@
 	del_timer_sync(&hw->idle_timer);
 
 	sky2_write32(hw, B0_IMSK, 0);
+	synchronize_irq(hw->pdev->irq);
+
 	dev0 = hw->dev[0];
 	dev1 = hw->dev[1];
 	if (dev1)

--


^ permalink raw reply

* [patch 9/9] sky2: version 1.3
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-1.4.patch --]
[-- Type: text/plain, Size: 319 bytes --]

Update version number, to track changes.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

--- sky2.orig/drivers/net/sky2.c
+++ sky2/drivers/net/sky2.c
@@ -51,7 +51,7 @@
 #include "sky2.h"
 
 #define DRV_NAME		"sky2"
-#define DRV_VERSION		"1.2"
+#define DRV_VERSION		"1.3"
 #define PFX			DRV_NAME " "
 
 /*

--


^ permalink raw reply

* [patch 5/9] sky2: edge triggered workaround enhancement
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-timer-tweak.patch --]
[-- Type: text/plain, Size: 2111 bytes --]

Need to make the edge-triggered workaround timer faster to get marginally
better peformance. The test_and_set_bit in schedule_prep() acts as a barrier
already. Make it a module parameter so that laptops who are concerned
about power can set it to 0; and user's stuck with broken BIOS's
can turn the driver into pure polling.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>


--- sky2.orig/drivers/net/sky2.c	2006-05-02 09:49:37.000000000 -0700
+++ sky2/drivers/net/sky2.c	2006-05-02 09:49:38.000000000 -0700
@@ -98,6 +98,10 @@
 module_param(disable_msi, int, 0);
 MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");
 
+static int idle_timeout = 100;
+module_param(idle_timeout, int, 0);
+MODULE_PARM_DESC(idle_timeout, "Idle timeout workaround for lost interrupts (ms)");
+
 static const struct pci_device_id sky2_id_table[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) },
@@ -2092,12 +2096,13 @@
  */
 static void sky2_idle(unsigned long arg)
 {
-	struct net_device *dev = (struct net_device *) arg;
+	struct sky2_hw *hw = (struct sky2_hw *) arg;
+	struct net_device *dev = hw->dev[0];
 
-	local_irq_disable();
 	if (__netif_rx_schedule_prep(dev))
 		__netif_rx_schedule(dev);
-	local_irq_enable();
+
+	mod_timer(&hw->idle_timer, jiffies + msecs_to_jiffies(idle_timeout));
 }
 
 
@@ -2145,8 +2150,6 @@
 	if (work_done >= work_limit)
 		return 1;
 
-	mod_timer(&hw->idle_timer, jiffies + HZ);
-
 	netif_rx_complete(dev0);
 
 	status = sky2_read32(hw, B0_Y2_SP_LISR);
@@ -2167,8 +2170,6 @@
 	prefetch(&hw->st_le[hw->st_idx]);
 	if (likely(__netif_rx_schedule_prep(dev0)))
 		__netif_rx_schedule(dev0);
-	else
-		printk(KERN_DEBUG PFX "irq race detected\n");
 
 	return IRQ_HANDLED;
 }
@@ -3290,7 +3291,10 @@
 
 	sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
 
-	setup_timer(&hw->idle_timer, sky2_idle, (unsigned long) dev);
+	setup_timer(&hw->idle_timer, sky2_idle, (unsigned long) hw);
+	if (idle_timeout > 0)
+		mod_timer(&hw->idle_timer,
+			  jiffies + msecs_to_jiffies(idle_timeout));
 
 	pci_set_drvdata(pdev, hw);
 

--


^ permalink raw reply

* [patch 8/9] Add more support for the Yukon Ultra chip found in dual core centino laptops.
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-ultra-phy.patch --]
[-- Type: text/plain, Size: 6601 bytes --]

The newest Yukon Ultra chipset's require more special tweaks.
They seem to be like the Yukon XL chipsets. This code is transliterated
from the latest SysKonnect driver; I don't have any Ultra hardware.

Signed-off-by: Stephe Hemminger <shemminger@osdl.org>


--- sky2.orig/drivers/net/sky2.c
+++ sky2/drivers/net/sky2.c
@@ -304,7 +304,8 @@ static void sky2_phy_init(struct sky2_hw
 	struct sky2_port *sky2 = netdev_priv(hw->dev[port]);
 	u16 ctrl, ct1000, adv, pg, ledctrl, ledover;
 
-	if (sky2->autoneg == AUTONEG_ENABLE && hw->chip_id != CHIP_ID_YUKON_XL) {
+	if (sky2->autoneg == AUTONEG_ENABLE &&
+	    (hw->chip_id != CHIP_ID_YUKON_XL || hw->chip_id == CHIP_ID_YUKON_EC_U)) {
 		u16 ectrl = gm_phy_read(hw, port, PHY_MARV_EXT_CTRL);
 
 		ectrl &= ~(PHY_M_EC_M_DSC_MSK | PHY_M_EC_S_DSC_MSK |
@@ -332,7 +333,7 @@ static void sky2_phy_init(struct sky2_hw
 			ctrl |= PHY_M_PC_MDI_XMODE(PHY_M_PC_ENA_AUTO);
 
 			if (sky2->autoneg == AUTONEG_ENABLE &&
-			    hw->chip_id == CHIP_ID_YUKON_XL) {
+			    (hw->chip_id == CHIP_ID_YUKON_XL || hw->chip_id == CHIP_ID_YUKON_EC_U)) {
 				ctrl &= ~PHY_M_PC_DSC_MSK;
 				ctrl |= PHY_M_PC_DSC(2) | PHY_M_PC_DOWN_S_ENA;
 			}
@@ -448,10 +449,11 @@ static void sky2_phy_init(struct sky2_hw
 		gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 3);
 
 		/* set LED Function Control register */
-		gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, (PHY_M_LEDC_LOS_CTRL(1) |	/* LINK/ACT */
-							   PHY_M_LEDC_INIT_CTRL(7) |	/* 10 Mbps */
-							   PHY_M_LEDC_STA1_CTRL(7) |	/* 100 Mbps */
-							   PHY_M_LEDC_STA0_CTRL(7)));	/* 1000 Mbps */
+		gm_phy_write(hw, port, PHY_MARV_PHY_CTRL,
+			     (PHY_M_LEDC_LOS_CTRL(1) |	/* LINK/ACT */
+			      PHY_M_LEDC_INIT_CTRL(7) |	/* 10 Mbps */
+			      PHY_M_LEDC_STA1_CTRL(7) |	/* 100 Mbps */
+			      PHY_M_LEDC_STA0_CTRL(7)));	/* 1000 Mbps */
 
 		/* set Polarity Control register */
 		gm_phy_write(hw, port, PHY_MARV_PHY_STAT,
@@ -465,6 +467,25 @@ static void sky2_phy_init(struct sky2_hw
 		/* restore page register */
 		gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg);
 		break;
+	case CHIP_ID_YUKON_EC_U:
+		pg = gm_phy_read(hw, port, PHY_MARV_EXT_ADR);
+
+		/* select page 3 to access LED control register */
+		gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 3);
+
+		/* set LED Function Control register */
+		gm_phy_write(hw, port, PHY_MARV_PHY_CTRL,
+			     (PHY_M_LEDC_LOS_CTRL(1) |	/* LINK/ACT */
+			      PHY_M_LEDC_INIT_CTRL(8) |	/* 10 Mbps */
+			      PHY_M_LEDC_STA1_CTRL(7) |	/* 100 Mbps */
+			      PHY_M_LEDC_STA0_CTRL(7)));/* 1000 Mbps */
+
+		/* set Blink Rate in LED Timer Control Register */
+		gm_phy_write(hw, port, PHY_MARV_INT_MASK,
+			     ledctrl | PHY_M_LED_BLINK_RT(BLINK_84MS));
+		/* restore page register */
+		gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg);
+		break;
 
 	default:
 		/* set Tx LED (LED_TX) to blink mode on Rx OR Tx activity */
@@ -473,19 +494,21 @@ static void sky2_phy_init(struct sky2_hw
 		ledover |= PHY_M_LED_MO_RX(MO_LED_OFF);
 	}
 
-	if (hw->chip_id == CHIP_ID_YUKON_EC_U && hw->chip_rev >= 2) {
+	if (hw->chip_id == CHIP_ID_YUKON_EC_U && hw->chip_rev == CHIP_REV_YU_EC_A1) {
 		/* apply fixes in PHY AFE */
-		gm_phy_write(hw, port, 22, 255);
+		pg = gm_phy_read(hw, port, PHY_MARV_EXT_ADR);
+		gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 255);
+
 		/* increase differential signal amplitude in 10BASE-T */
-		gm_phy_write(hw, port, 24, 0xaa99);
-		gm_phy_write(hw, port, 23, 0x2011);
+		gm_phy_write(hw, port, 0x18, 0xaa99);
+		gm_phy_write(hw, port, 0x17, 0x2011);
 
 		/* fix for IEEE A/B Symmetry failure in 1000BASE-T */
-		gm_phy_write(hw, port, 24, 0xa204);
-		gm_phy_write(hw, port, 23, 0x2002);
+		gm_phy_write(hw, port, 0x18, 0xa204);
+		gm_phy_write(hw, port, 0x17, 0x2002);
 
 		/* set page register to 0 */
-		gm_phy_write(hw, port, 22, 0);
+		gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg);
 	} else {
 		gm_phy_write(hw, port, PHY_MARV_LED_CTRL, ledctrl);
 
@@ -559,6 +582,11 @@ static void sky2_mac_init(struct sky2_hw
 
 		if (sky2->duplex == DUPLEX_FULL)
 			reg |= GM_GPCR_DUP_FULL;
+
+		/* turn off pause in 10/100mbps half duplex */
+		else if (sky2->speed != SPEED_1000 &&
+			 hw->chip_id != CHIP_ID_YUKON_EC_U)
+			sky2->tx_pause = sky2->rx_pause = 0;
 	} else
 		reg = GM_GPCR_SPEED_1000 | GM_GPCR_SPEED_100 | GM_GPCR_DUP_FULL;
 
@@ -1504,17 +1532,26 @@ static void sky2_link_up(struct sky2_por
 	sky2_write8(hw, SK_REG(port, LNK_LED_REG),
 		    LINKLED_ON | LINKLED_BLINK_OFF | LINKLED_LINKSYNC_OFF);
 
-	if (hw->chip_id == CHIP_ID_YUKON_XL) {
+	if (hw->chip_id == CHIP_ID_YUKON_XL || hw->chip_id == CHIP_ID_YUKON_EC_U) {
 		u16 pg = gm_phy_read(hw, port, PHY_MARV_EXT_ADR);
+		u16 led = PHY_M_LEDC_LOS_CTRL(1);	/* link active */
+
+		switch(sky2->speed) {
+		case SPEED_10:
+			led |= PHY_M_LEDC_INIT_CTRL(7);
+			break;
+
+		case SPEED_100:
+			led |= PHY_M_LEDC_STA1_CTRL(7);
+			break;
+
+		case SPEED_1000:
+			led |= PHY_M_LEDC_STA0_CTRL(7);
+			break;
+		}
 
 		gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 3);
-		gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, PHY_M_LEDC_LOS_CTRL(1) |	/* LINK/ACT */
-			     PHY_M_LEDC_INIT_CTRL(sky2->speed ==
-						  SPEED_10 ? 7 : 0) |
-			     PHY_M_LEDC_STA1_CTRL(sky2->speed ==
-						  SPEED_100 ? 7 : 0) |
-			     PHY_M_LEDC_STA0_CTRL(sky2->speed ==
-						  SPEED_1000 ? 7 : 0));
+		gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, led);
 		gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg);
 	}
 
@@ -1589,7 +1626,7 @@ static int sky2_autoneg_done(struct sky2
 	sky2->speed = sky2_phy_speed(hw, aux);
 
 	/* Pause bits are offset (9..8) */
-	if (hw->chip_id == CHIP_ID_YUKON_XL)
+	if (hw->chip_id == CHIP_ID_YUKON_XL || hw->chip_id == CHIP_ID_YUKON_EC_U)
 		aux >>= 6;
 
 	sky2->rx_pause = (aux & PHY_M_PS_RX_P_EN) != 0;
@@ -2226,13 +2263,6 @@ static int __devinit sky2_reset(struct s
 		return -EOPNOTSUPP;
 	}
 
-	/* This chip is new and not tested yet */
-	if (hw->chip_id == CHIP_ID_YUKON_EC_U) {
-		pr_info(PFX "%s: is a version of Yukon 2 chipset that has not been tested yet.\n",
-			pci_name(hw->pdev));
-		pr_info("Please report success/failure to maintainer <shemminger@osdl.org>\n");
-	}
-
 	/* disable ASF */
 	if (hw->chip_id <= CHIP_ID_YUKON_EC) {
 		sky2_write8(hw, B28_Y2_ASF_STAT_CMD, Y2_ASF_RESET);
--- sky2.orig/drivers/net/sky2.h
+++ sky2/drivers/net/sky2.h
@@ -378,6 +378,9 @@ enum {
 	CHIP_REV_YU_EC_A1    = 0,  /* Chip Rev. for Yukon-EC A1/A0 */
 	CHIP_REV_YU_EC_A2    = 1,  /* Chip Rev. for Yukon-EC A2 */
 	CHIP_REV_YU_EC_A3    = 2,  /* Chip Rev. for Yukon-EC A3 */
+
+	CHIP_REV_YU_EC_U_A0  = 0,
+	CHIP_REV_YU_EC_U_A1  = 1,
 };
 
 /*	B2_Y2_CLK_GATE	 8 bit	Clock Gating (Yukon-2 only) */

--


^ permalink raw reply

* [patch 0/9] sky2 update
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev

Bug fixes for sky2 driver:
 * fix NAPI related race that caused hangs
 * possible fixes for Yukon Ultra PHY support
 * performance improvement of ring management
 * fix race with irq on module removal

--


^ permalink raw reply

* [patch 6/9] sky2: dont write status ring
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-status-ring.patch --]
[-- Type: text/plain, Size: 1428 bytes --]

It is more efficient not to write the status ring from the
processor and just read the active portion.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>


--- sky2.orig/drivers/net/sky2.c	2006-05-02 09:49:38.000000000 -0700
+++ sky2/drivers/net/sky2.c	2006-05-02 09:49:42.000000000 -0700
@@ -1865,35 +1865,28 @@
 static int sky2_status_intr(struct sky2_hw *hw, int to_do)
 {
 	int work_done = 0;
+	u16 hwidx = sky2_read16(hw, STAT_PUT_IDX);
 
 	rmb();
 
-	for(;;) {
+	while (hw->st_idx != hwidx) {
 		struct sky2_status_le *le  = hw->st_le + hw->st_idx;
 		struct net_device *dev;
 		struct sky2_port *sky2;
 		struct sk_buff *skb;
 		u32 status;
 		u16 length;
-		u8  link, opcode;
-
-		opcode = le->opcode;
-		if (!opcode)
-			break;
-		opcode &= ~HW_OWNER;
 
 		hw->st_idx = RING_NEXT(hw->st_idx, STATUS_RING_SIZE);
-		le->opcode = 0;
 
-		link = le->link;
-		BUG_ON(link >= 2);
-		dev = hw->dev[link];
+		BUG_ON(le->link >= 2);
+		dev = hw->dev[le->link];
 
 		sky2 = netdev_priv(dev);
 		length = le->length;
 		status = le->status;
 
-		switch (opcode) {
+		switch (le->opcode & ~HW_OWNER) {
 		case OP_RXSTAT:
 			skb = sky2_receive(sky2, length, status);
 			if (!skb)
@@ -1944,8 +1937,8 @@
 		default:
 			if (net_ratelimit())
 				printk(KERN_WARNING PFX
-				       "unknown status opcode 0x%x\n", opcode);
-			break;
+				       "unknown status opcode 0x%x\n", le->opcode);
+			goto exit_loop;
 		}
 	}
 

--


^ permalink raw reply

* [patch 1/9] sky2: backout NAPI reschedule
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-backout-resched.patch --]
[-- Type: text/plain, Size: 2584 bytes --]

This is a backout of earlier patch.

The whole rescheduling hack was a bad idea. It doesn't really solve
the problem and it makes the code more complicated for no good reason.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>


--- sky2.orig/drivers/net/sky2.c
+++ sky2/drivers/net/sky2.c
@@ -2105,7 +2105,6 @@ static int sky2_poll(struct net_device *
 	int work_done = 0;
 	u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
 
- restart_poll:
 	if (unlikely(status & ~Y2_IS_STAT_BMU)) {
 		if (status & Y2_IS_HW_ERR)
 			sky2_hw_intr(hw);
@@ -2136,7 +2135,7 @@ static int sky2_poll(struct net_device *
 	}
 
 	if (status & Y2_IS_STAT_BMU) {
-		work_done += sky2_status_intr(hw, work_limit - work_done);
+		work_done = sky2_status_intr(hw, work_limit);
 		*budget -= work_done;
 		dev0->quota -= work_done;
 
@@ -2148,22 +2147,9 @@ static int sky2_poll(struct net_device *
 
 	mod_timer(&hw->idle_timer, jiffies + HZ);
 
-	local_irq_disable();
-	__netif_rx_complete(dev0);
+	netif_rx_complete(dev0);
 
 	status = sky2_read32(hw, B0_Y2_SP_LISR);
-
-	if (unlikely(status)) {
-		/* More work pending, try and keep going */
-		if (__netif_rx_schedule_prep(dev0)) {
-			__netif_rx_reschedule(dev0, work_done);
-			status = sky2_read32(hw, B0_Y2_SP_EISR);
-			local_irq_enable();
-			goto restart_poll;
-		}
-	}
-
-	local_irq_enable();
 	return 0;
 }
 
@@ -2181,6 +2167,8 @@ static irqreturn_t sky2_intr(int irq, vo
 	prefetch(&hw->st_le[hw->st_idx]);
 	if (likely(__netif_rx_schedule_prep(dev0)))
 		__netif_rx_schedule(dev0);
+	else
+		printk(KERN_DEBUG PFX "irq race detected\n");
 
 	return IRQ_HANDLED;
 }
--- sky2.orig/include/linux/netdevice.h
+++ sky2/include/linux/netdevice.h
@@ -829,21 +829,19 @@ static inline void netif_rx_schedule(str
 		__netif_rx_schedule(dev);
 }
 
-
-static inline void  __netif_rx_reschedule(struct net_device *dev, int undo)
-{
-	dev->quota += undo;
-	list_add_tail(&dev->poll_list, &__get_cpu_var(softnet_data).poll_list);
-	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
-}
-
-/* Try to reschedule poll. Called by dev->poll() after netif_rx_complete(). */
+/* Try to reschedule poll. Called by dev->poll() after netif_rx_complete().
+ * Do not inline this?
+ */
 static inline int netif_rx_reschedule(struct net_device *dev, int undo)
 {
 	if (netif_rx_schedule_prep(dev)) {
 		unsigned long flags;
+
+		dev->quota += undo;
+
 		local_irq_save(flags);
-		__netif_rx_reschedule(dev, undo);
+		list_add_tail(&dev->poll_list, &__get_cpu_var(softnet_data).poll_list);
+		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 		local_irq_restore(flags);
 		return 1;
 	}

--


^ permalink raw reply

* [patch 2/9] sky2: status irq hang fix
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-status-irq-race.patch --]
[-- Type: text/plain, Size: 2417 bytes --]

The status interrupt flag should be cleared before processing,
not afterwards to avoid race. Need to process in poll routine
even if no new interrupt status. This is a normal occurrence when
more than 64 frames (NAPI weight) are processed in one poll routine.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>


--- sky2.orig/drivers/net/sky2.c	2006-05-02 09:42:18.000000000 -0700
+++ sky2/drivers/net/sky2.c	2006-05-02 09:46:39.000000000 -0700
@@ -2105,45 +2105,42 @@
 	int work_done = 0;
 	u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
 
-	if (unlikely(status & ~Y2_IS_STAT_BMU)) {
-		if (status & Y2_IS_HW_ERR)
-			sky2_hw_intr(hw);
+	if (status & Y2_IS_HW_ERR)
+		sky2_hw_intr(hw);
 
-		if (status & Y2_IS_IRQ_PHY1)
-			sky2_phy_intr(hw, 0);
+	if (status & Y2_IS_IRQ_PHY1)
+		sky2_phy_intr(hw, 0);
 
-		if (status & Y2_IS_IRQ_PHY2)
-			sky2_phy_intr(hw, 1);
+	if (status & Y2_IS_IRQ_PHY2)
+		sky2_phy_intr(hw, 1);
 
-		if (status & Y2_IS_IRQ_MAC1)
-			sky2_mac_intr(hw, 0);
+	if (status & Y2_IS_IRQ_MAC1)
+		sky2_mac_intr(hw, 0);
 
-		if (status & Y2_IS_IRQ_MAC2)
-			sky2_mac_intr(hw, 1);
+	if (status & Y2_IS_IRQ_MAC2)
+		sky2_mac_intr(hw, 1);
 
-		if (status & Y2_IS_CHK_RX1)
-			sky2_descriptor_error(hw, 0, "receive", Y2_IS_CHK_RX1);
+	if (status & Y2_IS_CHK_RX1)
+		sky2_descriptor_error(hw, 0, "receive", Y2_IS_CHK_RX1);
 
-		if (status & Y2_IS_CHK_RX2)
-			sky2_descriptor_error(hw, 1, "receive", Y2_IS_CHK_RX2);
+	if (status & Y2_IS_CHK_RX2)
+		sky2_descriptor_error(hw, 1, "receive", Y2_IS_CHK_RX2);
 
-		if (status & Y2_IS_CHK_TXA1)
-			sky2_descriptor_error(hw, 0, "transmit", Y2_IS_CHK_TXA1);
+	if (status & Y2_IS_CHK_TXA1)
+		sky2_descriptor_error(hw, 0, "transmit", Y2_IS_CHK_TXA1);
 
-		if (status & Y2_IS_CHK_TXA2)
-			sky2_descriptor_error(hw, 1, "transmit", Y2_IS_CHK_TXA2);
-	}
+	if (status & Y2_IS_CHK_TXA2)
+		sky2_descriptor_error(hw, 1, "transmit", Y2_IS_CHK_TXA2);
 
-	if (status & Y2_IS_STAT_BMU) {
-		work_done = sky2_status_intr(hw, work_limit);
-		*budget -= work_done;
-		dev0->quota -= work_done;
+	if (status & Y2_IS_STAT_BMU)
+		sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
 
-		if (work_done >= work_limit)
-			return 1;
+	work_done = sky2_status_intr(hw, work_limit);
+	*budget -= work_done;
+	dev0->quota -= work_done;
 
-		sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
-	}
+	if (work_done >= work_limit)
+		return 1;
 
 	mod_timer(&hw->idle_timer, jiffies + HZ);
 

--


^ permalink raw reply

* [patch 3/9] sky2: tx ring index mask fix
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-tx-index.patch --]
[-- Type: text/plain, Size: 648 bytes --]

Mask for transmit ring status was picking up bits from the
unused sync ring.  They were always zero, so far...
Also, make sure to remind self not to make tx ring too big.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

--- sky2.orig/drivers/net/sky2.c
+++ sky2/drivers/net/sky2.c
@@ -1927,7 +1927,8 @@ static int sky2_status_intr(struct sky2_
 
 		case OP_TXINDEXLE:
 			/* TX index reports status for both ports */
-			sky2_tx_done(hw->dev[0], status & 0xffff);
+			BUILD_BUG_ON(TX_RING_SIZE > 0x1000);
+			sky2_tx_done(hw->dev[0], status & 0xfff);
 			if (hw->dev[1])
 				sky2_tx_done(hw->dev[1],
 				     ((status >> 24) & 0xff)

--


^ permalink raw reply

* [patch 4/9] sky2: use mask instead of modulo operation
From: Stephen Hemminger @ 2006-05-08 22:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060508221125.177816000@localhost.localdomain>

[-- Attachment #1: sky2-ring-math.patch --]
[-- Type: text/plain, Size: 2294 bytes --]

Gcc isn't smart enough to know that it can do a modulo
operation with power of 2 constant by doing a mask.
So add macro to do it for us.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

--- sky2.orig/drivers/net/sky2.c
+++ sky2/drivers/net/sky2.c
@@ -79,6 +79,8 @@
 #define NAPI_WEIGHT		64
 #define PHY_RETRIES		1000
 
+#define RING_NEXT(x,s)	(((x)+1) & ((s)-1))
+
 static const u32 default_msg =
     NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK
     | NETIF_MSG_TIMER | NETIF_MSG_TX_ERR | NETIF_MSG_RX_ERR
@@ -719,7 +721,7 @@ static inline struct sky2_tx_le *get_tx_
 {
 	struct sky2_tx_le *le = sky2->tx_le + sky2->tx_prod;
 
-	sky2->tx_prod = (sky2->tx_prod + 1) % TX_RING_SIZE;
+	sky2->tx_prod = RING_NEXT(sky2->tx_prod, TX_RING_SIZE);
 	return le;
 }
 
@@ -735,7 +737,7 @@ static inline void sky2_put_idx(struct s
 static inline struct sky2_rx_le *sky2_next_rx(struct sky2_port *sky2)
 {
 	struct sky2_rx_le *le = sky2->rx_le + sky2->rx_put;
-	sky2->rx_put = (sky2->rx_put + 1) % RX_LE_SIZE;
+	sky2->rx_put = RING_NEXT(sky2->rx_put, RX_LE_SIZE);
 	return le;
 }
 
@@ -1078,7 +1080,7 @@ err_out:
 /* Modular subtraction in ring */
 static inline int tx_dist(unsigned tail, unsigned head)
 {
-	return (head - tail) % TX_RING_SIZE;
+	return (head - tail) & (TX_RING_SIZE - 1);
 }
 
 /* Number of list elements available for next tx */
@@ -1255,7 +1257,7 @@ static int sky2_xmit_frame(struct sk_buf
 		le->opcode = OP_BUFFER | HW_OWNER;
 
 		fre = sky2->tx_ring
-		    + ((re - sky2->tx_ring) + i + 1) % TX_RING_SIZE;
+		    + RING_NEXT((re - sky2->tx_ring) + i, TX_RING_SIZE);
 		pci_unmap_addr_set(fre, mapaddr, mapping);
 	}
 
@@ -1315,7 +1317,7 @@ static void sky2_tx_complete(struct sky2
 
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			struct tx_ring_info *fre;
-			fre = sky2->tx_ring + (put + i + 1) % TX_RING_SIZE;
+			fre = sky2->tx_ring + RING_NEXT(put + i, TX_RING_SIZE);
 			pci_unmap_page(pdev, pci_unmap_addr(fre, mapaddr),
 				       skb_shinfo(skb)->frags[i].size,
 				       PCI_DMA_TODEVICE);
@@ -1876,7 +1878,7 @@ static int sky2_status_intr(struct sky2_
 			break;
 		opcode &= ~HW_OWNER;
 
-		hw->st_idx = (hw->st_idx + 1) % STATUS_RING_SIZE;
+		hw->st_idx = RING_NEXT(hw->st_idx, STATUS_RING_SIZE);
 		le->opcode = 0;
 
 		link = le->link;

--


^ permalink raw reply

* [PATCH 9/9] [I/OAT] TCP recv offload to I/OAT
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Locks down user pages and sets up for DMA in tcp_recvmsg, then calls
dma_async_try_early_copy in tcp_v4_do_rcv

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 net/ipv4/tcp.c       |  103 ++++++++++++++++++++++++++++++++++++++++++++------
 net/ipv4/tcp_input.c |   74 +++++++++++++++++++++++++++++++++---
 net/ipv4/tcp_ipv4.c  |   18 ++++++++-
 net/ipv6/tcp_ipv6.c  |   12 +++++-
 4 files changed, 185 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4e067d2..ff6ccda 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -263,7 +263,7 @@
 #include <net/tcp.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
-
+#include <net/netdma.h>
 
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
@@ -1110,6 +1110,7 @@ int tcp_recvmsg(struct kiocb *iocb, stru
 	int target;		/* Read at least this many bytes */
 	long timeo;
 	struct task_struct *user_recv = NULL;
+	int copied_early = 0;
 
 	lock_sock(sk);
 
@@ -1133,6 +1134,17 @@ int tcp_recvmsg(struct kiocb *iocb, stru
 
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
 
+#ifdef CONFIG_NET_DMA
+	tp->ucopy.dma_chan = NULL;
+	preempt_disable();
+	if ((len > sysctl_tcp_dma_copybreak) && !(flags & MSG_PEEK) &&
+	    !sysctl_tcp_low_latency && __get_cpu_var(softnet_data.net_dma)) {
+		preempt_enable_no_resched();
+		tp->ucopy.pinned_list = dma_pin_iovec_pages(msg->msg_iov, len);
+	} else
+		preempt_enable_no_resched();
+#endif
+
 	do {
 		struct sk_buff *skb;
 		u32 offset;
@@ -1274,6 +1286,10 @@ int tcp_recvmsg(struct kiocb *iocb, stru
 		} else
 			sk_wait_data(sk, &timeo);
 
+#ifdef CONFIG_NET_DMA
+		tp->ucopy.wakeup = 0;
+#endif
+
 		if (user_recv) {
 			int chunk;
 
@@ -1329,13 +1345,39 @@ do_prequeue:
 		}
 
 		if (!(flags & MSG_TRUNC)) {
-			err = skb_copy_datagram_iovec(skb, offset,
-						      msg->msg_iov, used);
-			if (err) {
-				/* Exception. Bailout! */
-				if (!copied)
-					copied = -EFAULT;
-				break;
+#ifdef CONFIG_NET_DMA
+			if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
+				tp->ucopy.dma_chan = get_softnet_dma();
+
+			if (tp->ucopy.dma_chan) {
+				tp->ucopy.dma_cookie = dma_skb_copy_datagram_iovec(
+					tp->ucopy.dma_chan, skb, offset,
+					msg->msg_iov, used,
+					tp->ucopy.pinned_list);
+
+				if (tp->ucopy.dma_cookie < 0) {
+
+					printk(KERN_ALERT "dma_cookie < 0\n");
+
+					/* Exception. Bailout! */
+					if (!copied)
+						copied = -EFAULT;
+					break;
+				}
+				if ((offset + used) == skb->len)
+					copied_early = 1;
+
+			} else
+#endif
+			{
+				err = skb_copy_datagram_iovec(skb, offset,
+						msg->msg_iov, used);
+				if (err) {
+					/* Exception. Bailout! */
+					if (!copied)
+						copied = -EFAULT;
+					break;
+				}
 			}
 		}
 
@@ -1355,15 +1397,19 @@ skip_copy:
 
 		if (skb->h.th->fin)
 			goto found_fin_ok;
-		if (!(flags & MSG_PEEK))
-			sk_eat_skb(sk, skb, 0);
+		if (!(flags & MSG_PEEK)) {
+			sk_eat_skb(sk, skb, copied_early);
+			copied_early = 0;
+		}
 		continue;
 
 	found_fin_ok:
 		/* Process the FIN. */
 		++*seq;
-		if (!(flags & MSG_PEEK))
-			sk_eat_skb(sk, skb, 0);
+		if (!(flags & MSG_PEEK)) {
+			sk_eat_skb(sk, skb, copied_early);
+			copied_early = 0;
+		}
 		break;
 	} while (len > 0);
 
@@ -1386,6 +1432,36 @@ skip_copy:
 		tp->ucopy.len = 0;
 	}
 
+#ifdef CONFIG_NET_DMA
+	if (tp->ucopy.dma_chan) {
+		struct sk_buff *skb;
+		dma_cookie_t done, used;
+
+		dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+
+		while (dma_async_memcpy_complete(tp->ucopy.dma_chan,
+		                                 tp->ucopy.dma_cookie, &done,
+		                                 &used) == DMA_IN_PROGRESS) {
+			/* do partial cleanup of sk_async_wait_queue */
+			while ((skb = skb_peek(&sk->sk_async_wait_queue)) &&
+			       (dma_async_is_complete(skb->dma_cookie, done,
+			                              used) == DMA_SUCCESS)) {
+				__skb_dequeue(&sk->sk_async_wait_queue);
+				kfree_skb(skb);
+			}
+		}
+
+		/* Safe to free early-copied skbs now */
+		__skb_queue_purge(&sk->sk_async_wait_queue);
+		dma_chan_put(tp->ucopy.dma_chan);
+		tp->ucopy.dma_chan = NULL;
+	}
+	if (tp->ucopy.pinned_list) {
+		dma_unpin_iovec_pages(tp->ucopy.pinned_list);
+		tp->ucopy.pinned_list = NULL;
+	}
+#endif
+
 	/* According to UNIX98, msg_name/msg_namelen are ignored
 	 * on connected socket. I was just happy when found this 8) --ANK
 	 */
@@ -1658,6 +1734,9 @@ int tcp_disconnect(struct sock *sk, int 
 	__skb_queue_purge(&sk->sk_receive_queue);
 	sk_stream_writequeue_purge(sk);
 	__skb_queue_purge(&tp->out_of_order_queue);
+#ifdef CONFIG_NET_DMA
+	__skb_queue_purge(&sk->sk_async_wait_queue);
+#endif
 
 	inet->dport = 0;
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9f0cca4..05c4c15 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -71,6 +71,7 @@
 #include <net/inet_common.h>
 #include <linux/ipsec.h>
 #include <asm/unaligned.h>
+#include <net/netdma.h>
 
 int sysctl_tcp_timestamps = 1;
 int sysctl_tcp_window_scaling = 1;
@@ -3785,6 +3786,50 @@ static inline int tcp_checksum_complete_
 		__tcp_checksum_complete_user(sk, skb);
 }
 
+#ifdef CONFIG_NET_DMA
+static int tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb, int hlen)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	int chunk = skb->len - hlen;
+	int dma_cookie;
+	int copied_early = 0;
+
+	if (tp->ucopy.wakeup)
+          	return 0;
+
+	if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
+		tp->ucopy.dma_chan = get_softnet_dma();
+
+	if (tp->ucopy.dma_chan && skb->ip_summed == CHECKSUM_UNNECESSARY) {
+
+		dma_cookie = dma_skb_copy_datagram_iovec(tp->ucopy.dma_chan,
+			skb, hlen, tp->ucopy.iov, chunk, tp->ucopy.pinned_list);
+
+		if (dma_cookie < 0)
+			goto out;
+
+		tp->ucopy.dma_cookie = dma_cookie;
+		copied_early = 1;
+
+		tp->ucopy.len -= chunk;
+		tp->copied_seq += chunk;
+		tcp_rcv_space_adjust(sk);
+
+		if ((tp->ucopy.len == 0) ||
+		    (tcp_flag_word(skb->h.th) & TCP_FLAG_PSH) ||
+		    (atomic_read(&sk->sk_rmem_alloc) > (sk->sk_rcvbuf >> 1))) {
+			tp->ucopy.wakeup = 1;
+			sk->sk_data_ready(sk, 0);
+		}
+	} else if (chunk > 0) {
+		tp->ucopy.wakeup = 1;
+		sk->sk_data_ready(sk, 0);
+	}
+out:
+	return copied_early;
+}
+#endif /* CONFIG_NET_DMA */
+
 /*
  *	TCP receive function for the ESTABLISHED state. 
  *
@@ -3901,14 +3946,23 @@ int tcp_rcv_established(struct sock *sk,
 			}
 		} else {
 			int eaten = 0;
+			int copied_early = 0;
 
-			if (tp->ucopy.task == current &&
-			    tp->copied_seq == tp->rcv_nxt &&
-			    len - tcp_header_len <= tp->ucopy.len &&
-			    sock_owned_by_user(sk)) {
-				__set_current_state(TASK_RUNNING);
+			if (tp->copied_seq == tp->rcv_nxt &&
+			    len - tcp_header_len <= tp->ucopy.len) {
+#ifdef CONFIG_NET_DMA
+				if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
+					copied_early = 1;
+					eaten = 1;
+				}
+#endif
+				if (tp->ucopy.task == current && sock_owned_by_user(sk) && !copied_early) {
+					__set_current_state(TASK_RUNNING);
 
-				if (!tcp_copy_to_iovec(sk, skb, tcp_header_len)) {
+					if (!tcp_copy_to_iovec(sk, skb, tcp_header_len))
+						eaten = 1;
+				}
+				if (eaten) {
 					/* Predicted packet is in window by definition.
 					 * seq == rcv_nxt and rcv_wup <= rcv_nxt.
 					 * Hence, check seq<=rcv_wup reduces to:
@@ -3924,8 +3978,9 @@ int tcp_rcv_established(struct sock *sk,
 					__skb_pull(skb, tcp_header_len);
 					tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
 					NET_INC_STATS_BH(LINUX_MIB_TCPHPHITSTOUSER);
-					eaten = 1;
 				}
+				if (copied_early)
+					tcp_cleanup_rbuf(sk, skb->len);
 			}
 			if (!eaten) {
 				if (tcp_checksum_complete_user(sk, skb))
@@ -3966,6 +4021,11 @@ int tcp_rcv_established(struct sock *sk,
 
 			__tcp_ack_snd_check(sk, 0);
 no_ack:
+#ifdef CONFIG_NET_DMA
+			if (copied_early)
+				__skb_queue_tail(&sk->sk_async_wait_queue, skb);
+			else
+#endif
 			if (eaten)
 				__kfree_skb(skb);
 			else
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 672950e..25ecc6e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -71,6 +71,7 @@
 #include <net/inet_common.h>
 #include <net/timewait_sock.h>
 #include <net/xfrm.h>
+#include <net/netdma.h>
 
 #include <linux/inet.h>
 #include <linux/ipv6.h>
@@ -1091,8 +1092,18 @@ process:
 	bh_lock_sock(sk);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
-		if (!tcp_prequeue(sk, skb))
+#ifdef CONFIG_NET_DMA
+		struct tcp_sock *tp = tcp_sk(sk);
+		if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
+			tp->ucopy.dma_chan = get_softnet_dma();
+		if (tp->ucopy.dma_chan)
 			ret = tcp_v4_do_rcv(sk, skb);
+		else
+#endif
+		{
+			if (!tcp_prequeue(sk, skb))
+			ret = tcp_v4_do_rcv(sk, skb);
+		}
 	} else
 		sk_add_backlog(sk, skb);
 	bh_unlock_sock(sk);
@@ -1296,6 +1307,11 @@ int tcp_v4_destroy_sock(struct sock *sk)
 	/* Cleans up our, hopefully empty, out_of_order_queue. */
   	__skb_queue_purge(&tp->out_of_order_queue);
 
+#ifdef CONFIG_NET_DMA
+	/* Cleans up our sk_async_wait_queue */
+  	__skb_queue_purge(&sk->sk_async_wait_queue);
+#endif
+
 	/* Clean prequeue, it must be empty really */
 	__skb_queue_purge(&tp->ucopy.prequeue);
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 301eee7..a50eb30 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1218,8 +1218,16 @@ process:
 	bh_lock_sock(sk);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
-		if (!tcp_prequeue(sk, skb))
-			ret = tcp_v6_do_rcv(sk, skb);
+#ifdef CONFIG_NET_DMA
+                struct tcp_sock *tp = tcp_sk(sk);
+                if (tp->ucopy.dma_chan)
+                        ret = tcp_v6_do_rcv(sk, skb);
+                else
+#endif
+		{
+			if (!tcp_prequeue(sk, skb))
+				ret = tcp_v6_do_rcv(sk, skb);
+		}
 	} else
 		sk_add_backlog(sk, skb);
 	bh_unlock_sock(sk);

^ permalink raw reply related

* [PATCH 7/9] [I/OAT] make sk_eat_skb I/OAT aware
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Add an extra argument to sk_eat_skb, and make it move early copied packets
to the async_wait_queue instead of freeing them.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 include/net/sock.h |   13 ++++++++++++-
 net/dccp/proto.c   |    4 ++--
 net/ipv4/tcp.c     |    8 ++++----
 net/llc/af_llc.c   |    2 +-
 4 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 90c65cb..75b0e97 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1273,11 +1273,22 @@ sock_recv_timestamp(struct msghdr *msg, 
  * This routine must be called with interrupts disabled or with the socket
  * locked so that the sk_buff queue operation is ok.
 */
-static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
+#ifdef CONFIG_NET_DMA
+static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb, int copied_early)
+{
+	__skb_unlink(skb, &sk->sk_receive_queue);
+	if (!copied_early)
+		__kfree_skb(skb);
+	else
+		__skb_queue_tail(&sk->sk_async_wait_queue, skb);
+}
+#else
+static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb, int copied_early)
 {
 	__skb_unlink(skb, &sk->sk_receive_queue);
 	__kfree_skb(skb);
 }
+#endif
 
 extern void sock_enable_timestamp(struct sock *sk);
 extern int sock_get_timestamp(struct sock *, struct timeval __user *);
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 2e0ee83..5317fd3 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -719,7 +719,7 @@ int dccp_recvmsg(struct kiocb *iocb, str
 		}
 		dccp_pr_debug("packet_type=%s\n",
 			      dccp_packet_name(dh->dccph_type));
-		sk_eat_skb(sk, skb);
+		sk_eat_skb(sk, skb, 0);
 verify_sock_status:
 		if (sock_flag(sk, SOCK_DONE)) {
 			len = 0;
@@ -773,7 +773,7 @@ verify_sock_status:
 		}
 	found_fin_ok:
 		if (!(flags & MSG_PEEK))
-			sk_eat_skb(sk, skb);
+			sk_eat_skb(sk, skb, 0);
 		break;
 	} while (1);
 out:
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1c0cfd7..4e067d2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1072,11 +1072,11 @@ int tcp_read_sock(struct sock *sk, read_
 				break;
 		}
 		if (skb->h.th->fin) {
-			sk_eat_skb(sk, skb);
+			sk_eat_skb(sk, skb, 0);
 			++seq;
 			break;
 		}
-		sk_eat_skb(sk, skb);
+		sk_eat_skb(sk, skb, 0);
 		if (!desc->count)
 			break;
 	}
@@ -1356,14 +1356,14 @@ skip_copy:
 		if (skb->h.th->fin)
 			goto found_fin_ok;
 		if (!(flags & MSG_PEEK))
-			sk_eat_skb(sk, skb);
+			sk_eat_skb(sk, skb, 0);
 		continue;
 
 	found_fin_ok:
 		/* Process the FIN. */
 		++*seq;
 		if (!(flags & MSG_PEEK))
-			sk_eat_skb(sk, skb);
+			sk_eat_skb(sk, skb, 0);
 		break;
 	} while (len > 0);
 
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 5a04db7..7465170 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -789,7 +789,7 @@ static int llc_ui_recvmsg(struct kiocb *
 			continue;
 
 		if (!(flags & MSG_PEEK)) {
-			sk_eat_skb(sk, skb);
+			sk_eat_skb(sk, skb, 0);
 			*seq = 0;
 		}
 	} while (len > 0);

^ permalink raw reply related

* [PATCH 1/9] [I/OAT] DMA memcpy subsystem
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Provides an API for offloading memory copies to DMA devices

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 drivers/Kconfig           |    2 
 drivers/Makefile          |    1 
 drivers/dma/Kconfig       |   13 +
 drivers/dma/Makefile      |    1 
 drivers/dma/dmaengine.c   |  408 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dmaengine.h |  337 +++++++++++++++++++++++++++++++++++++
 6 files changed, 762 insertions(+), 0 deletions(-)

diff --git a/drivers/Kconfig b/drivers/Kconfig
index aeb5ab2..8b11ceb 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -72,4 +72,6 @@ source "drivers/edac/Kconfig"
 
 source "drivers/rtc/Kconfig"
 
+source "drivers/dma/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 447d8e6..3c51703 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -74,3 +74,4 @@ obj-$(CONFIG_SGI_SN)		+= sn/
 obj-y				+= firmware/
 obj-$(CONFIG_CRYPTO)		+= crypto/
 obj-$(CONFIG_SUPERH)		+= sh/
+obj-$(CONFIG_DMA_ENGINE)	+= dma/
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
new file mode 100644
index 0000000..f9ac4bc
--- /dev/null
+++ b/drivers/dma/Kconfig
@@ -0,0 +1,13 @@
+#
+# DMA engine configuration
+#
+
+menu "DMA Engine support"
+
+config DMA_ENGINE
+	bool "Support for DMA engines"
+	---help---
+	  DMA engines offload copy operations from the CPU to dedicated
+	  hardware, allowing the copies to happen asynchronously.
+
+endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
new file mode 100644
index 0000000..10b7391
--- /dev/null
+++ b/drivers/dma/Makefile
@@ -0,0 +1 @@
+obj-y += dmaengine.o
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
new file mode 100644
index 0000000..473c47b
--- /dev/null
+++ b/drivers/dma/dmaengine.c
@@ -0,0 +1,408 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This code implements the DMA subsystem. It provides a HW-neutral interface
+ * for other kernel code to use asynchronous memory copy capabilities,
+ * if present, and allows different HW DMA drivers to register as providing
+ * this capability.
+ *
+ * Due to the fact we are accelerating what is already a relatively fast
+ * operation, the code goes to great lengths to avoid additional overhead,
+ * such as locking.
+ *
+ * LOCKING:
+ *
+ * The subsystem keeps two global lists, dma_device_list and dma_client_list.
+ * Both of these are protected by a mutex, dma_list_mutex.
+ *
+ * Each device has a channels list, which runs unlocked but is never modified
+ * once the device is registered, it's just setup by the driver.
+ *
+ * Each client has a channels list, it's only modified under the client->lock
+ * and in an RCU callback, so it's safe to read under rcu_read_lock().
+ *
+ * Each device has a kref, which is initialized to 1 when the device is
+ * registered. A kref_put is done for each class_device registered.  When the
+ * class_device is released, the coresponding kref_put is done in the release
+ * method. Every time one of the device's channels is allocated to a client,
+ * a kref_get occurs.  When the channel is freed, the coresponding kref_put
+ * happens. The device's release function does a completion, so
+ * unregister_device does a remove event, class_device_unregister, a kref_put
+ * for the first reference, then waits on the completion for all other
+ * references to finish.
+ *
+ * Each channel has an open-coded implementation of Rusty Russell's "bigref,"
+ * with a kref and a per_cpu local_t.  A single reference is set when on an
+ * ADDED event, and removed with a REMOVE event.  Net DMA client takes an
+ * extra reference per outstanding transaction.  The relase function does a
+ * kref_put on the device. -ChrisL
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/dmaengine.h>
+#include <linux/hardirq.h>
+#include <linux/spinlock.h>
+#include <linux/percpu.h>
+#include <linux/rcupdate.h>
+#include <linux/mutex.h>
+
+static DEFINE_MUTEX(dma_list_mutex);
+static LIST_HEAD(dma_device_list);
+static LIST_HEAD(dma_client_list);
+
+/* --- sysfs implementation --- */
+
+static ssize_t show_memcpy_count(struct class_device *cd, char *buf)
+{
+	struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+	unsigned long count = 0;
+	int i;
+
+	for_each_cpu(i)
+		count += per_cpu_ptr(chan->local, i)->memcpy_count;
+
+	return sprintf(buf, "%lu\n", count);
+}
+
+static ssize_t show_bytes_transferred(struct class_device *cd, char *buf)
+{
+	struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+	unsigned long count = 0;
+	int i;
+
+	for_each_cpu(i)
+		count += per_cpu_ptr(chan->local, i)->bytes_transferred;
+
+	return sprintf(buf, "%lu\n", count);
+}
+
+static ssize_t show_in_use(struct class_device *cd, char *buf)
+{
+	struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+
+	return sprintf(buf, "%d\n", (chan->client ? 1 : 0));
+}
+
+static struct class_device_attribute dma_class_attrs[] = {
+	__ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+	__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
+	__ATTR(in_use, S_IRUGO, show_in_use, NULL),
+	__ATTR_NULL
+};
+
+static void dma_async_device_cleanup(struct kref *kref);
+
+static void dma_class_dev_release(struct class_device *cd)
+{
+	struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+	kref_put(&chan->device->refcount, dma_async_device_cleanup);
+}
+
+static struct class dma_devclass = {
+	.name            = "dma",
+	.class_dev_attrs = dma_class_attrs,
+	.release = dma_class_dev_release,
+};
+
+/* --- client and device registration --- */
+
+/**
+ * dma_client_chan_alloc - try to allocate a channel to a client
+ * @client: &dma_client
+ *
+ * Called with dma_list_mutex held.
+ */
+static struct dma_chan *dma_client_chan_alloc(struct dma_client *client)
+{
+	struct dma_device *device;
+	struct dma_chan *chan;
+	unsigned long flags;
+	int desc;	/* allocated descriptor count */
+
+	/* Find a channel, any DMA engine will do */
+	list_for_each_entry(device, &dma_device_list, global_node) {
+		list_for_each_entry(chan, &device->channels, device_node) {
+			if (chan->client)
+				continue;
+
+			desc = chan->device->device_alloc_chan_resources(chan);
+			if (desc >= 0) {
+				kref_get(&device->refcount);
+				kref_init(&chan->refcount);
+				chan->slow_ref = 0;
+				INIT_RCU_HEAD(&chan->rcu);
+				chan->client = client;
+				spin_lock_irqsave(&client->lock, flags);
+				list_add_tail_rcu(&chan->client_node,
+				                  &client->channels);
+				spin_unlock_irqrestore(&client->lock, flags);
+				return chan;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * dma_client_chan_free - release a DMA channel
+ * @chan: &dma_chan
+ */
+void dma_chan_cleanup(struct kref *kref)
+{
+	struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
+	chan->device->device_free_chan_resources(chan);
+	chan->client = NULL;
+	kref_put(&chan->device->refcount, dma_async_device_cleanup);
+}
+
+static void dma_chan_free_rcu(struct rcu_head *rcu)
+{
+	struct dma_chan *chan = container_of(rcu, struct dma_chan, rcu);
+	int bias = 0x7FFFFFFF;
+	int i;
+	for_each_cpu(i)
+		bias -= local_read(&per_cpu_ptr(chan->local, i)->refcount);
+	atomic_sub(bias, &chan->refcount.refcount);
+	kref_put(&chan->refcount, dma_chan_cleanup);
+}
+
+static void dma_client_chan_free(struct dma_chan *chan)
+{
+	atomic_add(0x7FFFFFFF, &chan->refcount.refcount);
+	chan->slow_ref = 1;
+	call_rcu(&chan->rcu, dma_chan_free_rcu);
+}
+
+/**
+ * dma_chans_rebalance - reallocate channels to clients
+ *
+ * When the number of DMA channel in the system changes,
+ * channels need to be rebalanced among clients
+ */
+static void dma_chans_rebalance(void)
+{
+	struct dma_client *client;
+	struct dma_chan *chan;
+	unsigned long flags;
+
+	mutex_lock(&dma_list_mutex);
+
+	list_for_each_entry(client, &dma_client_list, global_node) {
+		while (client->chans_desired > client->chan_count) {
+			chan = dma_client_chan_alloc(client);
+			if (!chan)
+				break;
+			client->chan_count++;
+			client->event_callback(client,
+	                                       chan,
+	                                       DMA_RESOURCE_ADDED);
+		}
+		while (client->chans_desired < client->chan_count) {
+			spin_lock_irqsave(&client->lock, flags);
+			chan = list_entry(client->channels.next,
+			                  struct dma_chan,
+			                  client_node);
+			list_del_rcu(&chan->client_node);
+			spin_unlock_irqrestore(&client->lock, flags);
+			client->chan_count--;
+			client->event_callback(client,
+			                       chan,
+			                       DMA_RESOURCE_REMOVED);
+			dma_client_chan_free(chan);
+		}
+	}
+
+	mutex_unlock(&dma_list_mutex);
+}
+
+/**
+ * dma_async_client_register - allocate and register a &dma_client
+ * @event_callback: callback for notification of channel addition/removal
+ */
+struct dma_client *dma_async_client_register(dma_event_callback event_callback)
+{
+	struct dma_client *client;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return NULL;
+
+	INIT_LIST_HEAD(&client->channels);
+	spin_lock_init(&client->lock);
+	client->chans_desired = 0;
+	client->chan_count = 0;
+	client->event_callback = event_callback;
+
+	mutex_lock(&dma_list_mutex);
+	list_add_tail(&client->global_node, &dma_client_list);
+	mutex_unlock(&dma_list_mutex);
+
+	return client;
+}
+
+/**
+ * dma_async_client_unregister - unregister a client and free the &dma_client
+ * @client:
+ *
+ * Force frees any allocated DMA channels, frees the &dma_client memory
+ */
+void dma_async_client_unregister(struct dma_client *client)
+{
+	struct dma_chan *chan;
+
+	if (!client)
+		return;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(chan, &client->channels, client_node)
+		dma_client_chan_free(chan);
+	rcu_read_unlock();
+
+	mutex_lock(&dma_list_mutex);
+	list_del(&client->global_node);
+	mutex_unlock(&dma_list_mutex);
+
+	kfree(client);
+	dma_chans_rebalance();
+}
+
+/**
+ * dma_async_client_chan_request - request DMA channels
+ * @client: &dma_client
+ * @number: count of DMA channels requested
+ *
+ * Clients call dma_async_client_chan_request() to specify how many
+ * DMA channels they need, 0 to free all currently allocated.
+ * The resulting allocations/frees are indicated to the client via the
+ * event callback.
+ */
+void dma_async_client_chan_request(struct dma_client *client,
+			unsigned int number)
+{
+	client->chans_desired = number;
+	dma_chans_rebalance();
+}
+
+/**
+ * dma_async_device_register -
+ * @device: &dma_device
+ */
+int dma_async_device_register(struct dma_device *device)
+{
+	static int id;
+	int chancnt = 0;
+	struct dma_chan* chan;
+
+	if (!device)
+		return -ENODEV;
+
+	init_completion(&device->done);
+	kref_init(&device->refcount);
+	device->dev_id = id++;
+
+	/* represent channels in sysfs. Probably want devs too */
+	list_for_each_entry(chan, &device->channels, device_node) {
+		chan->local = alloc_percpu(typeof(*chan->local));
+		if (chan->local == NULL)
+			continue;
+
+		chan->chan_id = chancnt++;
+		chan->class_dev.class = &dma_devclass;
+		chan->class_dev.dev = NULL;
+		snprintf(chan->class_dev.class_id, BUS_ID_SIZE, "dma%dchan%d",
+		         device->dev_id, chan->chan_id);
+
+		kref_get(&device->refcount);
+		class_device_register(&chan->class_dev);
+	}
+
+	mutex_lock(&dma_list_mutex);
+	list_add_tail(&device->global_node, &dma_device_list);
+	mutex_unlock(&dma_list_mutex);
+
+	dma_chans_rebalance();
+
+	return 0;
+}
+
+/**
+ * dma_async_device_unregister -
+ * @device: &dma_device
+ */
+static void dma_async_device_cleanup(struct kref *kref)
+{
+	struct dma_device *device;
+
+	device = container_of(kref, struct dma_device, refcount);
+	complete(&device->done);
+}
+
+void dma_async_device_unregister(struct dma_device* device)
+{
+	struct dma_chan *chan;
+	unsigned long flags;
+
+	mutex_lock(&dma_list_mutex);
+	list_del(&device->global_node);
+	mutex_unlock(&dma_list_mutex);
+
+	list_for_each_entry(chan, &device->channels, device_node) {
+		if (chan->client) {
+			spin_lock_irqsave(&chan->client->lock, flags);
+			list_del(&chan->client_node);
+			chan->client->chan_count--;
+			spin_unlock_irqrestore(&chan->client->lock, flags);
+			chan->client->event_callback(chan->client,
+			                             chan,
+			                             DMA_RESOURCE_REMOVED);
+			dma_client_chan_free(chan);
+		}
+		class_device_unregister(&chan->class_dev);
+	}
+	dma_chans_rebalance();
+
+	kref_put(&device->refcount, dma_async_device_cleanup);
+	wait_for_completion(&device->done);
+}
+
+static int __init dma_bus_init(void)
+{
+	mutex_init(&dma_list_mutex);
+	return class_register(&dma_devclass);
+}
+
+subsys_initcall(dma_bus_init);
+
+EXPORT_SYMBOL(dma_async_client_register);
+EXPORT_SYMBOL(dma_async_client_unregister);
+EXPORT_SYMBOL(dma_async_client_chan_request);
+EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
+EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
+EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
+EXPORT_SYMBOL(dma_async_memcpy_complete);
+EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
+EXPORT_SYMBOL(dma_async_device_register);
+EXPORT_SYMBOL(dma_async_device_unregister);
+EXPORT_SYMBOL(dma_chan_cleanup);
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
new file mode 100644
index 0000000..3078154
--- /dev/null
+++ b/include/linux/dmaengine.h
@@ -0,0 +1,337 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef DMAENGINE_H
+#define DMAENGINE_H
+#include <linux/config.h>
+#ifdef CONFIG_DMA_ENGINE
+
+#include <linux/device.h>
+#include <linux/uio.h>
+#include <linux/kref.h>
+#include <linux/completion.h>
+#include <linux/rcupdate.h>
+
+/**
+ * enum dma_event - resource PNP/power managment events
+ * @DMA_RESOURCE_SUSPEND: DMA device going into low power state
+ * @DMA_RESOURCE_RESUME: DMA device returning to full power
+ * @DMA_RESOURCE_ADDED: DMA device added to the system
+ * @DMA_RESOURCE_REMOVED: DMA device removed from the system
+ */
+enum dma_event {
+	DMA_RESOURCE_SUSPEND,
+	DMA_RESOURCE_RESUME,
+	DMA_RESOURCE_ADDED,
+	DMA_RESOURCE_REMOVED,
+};
+
+/**
+ * typedef dma_cookie_t
+ *
+ * if dma_cookie_t is >0 it's a DMA request cookie, <0 it's an error code
+ */
+typedef s32 dma_cookie_t;
+
+#define dma_submit_error(cookie) ((cookie) < 0 ? 1 : 0)
+
+/**
+ * enum dma_status - DMA transaction status
+ * @DMA_SUCCESS: transaction completed successfully
+ * @DMA_IN_PROGRESS: transaction not yet processed
+ * @DMA_ERROR: transaction failed
+ */
+enum dma_status {
+	DMA_SUCCESS,
+	DMA_IN_PROGRESS,
+	DMA_ERROR,
+};
+
+/**
+ * struct dma_chan_percpu - the per-CPU part of struct dma_chan
+ * @refcount: local_t used for open-coded "bigref" counting
+ * @memcpy_count: transaction counter
+ * @bytes_transferred: byte counter
+ */
+
+struct dma_chan_percpu {
+	local_t refcount;
+	/* stats */
+	unsigned long memcpy_count;
+	unsigned long bytes_transferred;
+};
+
+/**
+ * struct dma_chan - devices supply DMA channels, clients use them
+ * @client: ptr to the client user of this chan, will be NULL when unused
+ * @device: ptr to the dma device who supplies this channel, always !NULL
+ * @cookie: last cookie value returned to client
+ * @chan_id:
+ * @class_dev:
+ * @refcount: kref, used in "bigref" slow-mode
+ * @slow_ref:
+ * @rcu:
+ * @client_node: used to add this to the client chan list
+ * @device_node: used to add this to the device chan list
+ * @local: per-cpu pointer to a struct dma_chan_percpu
+ */
+struct dma_chan {
+	struct dma_client *client;
+	struct dma_device *device;
+	dma_cookie_t cookie;
+
+	/* sysfs */
+	int chan_id;
+	struct class_device class_dev;
+
+	struct kref refcount;
+	int slow_ref;
+	struct rcu_head rcu;
+
+	struct list_head client_node;
+	struct list_head device_node;
+	struct dma_chan_percpu *local;
+};
+
+void dma_chan_cleanup(struct kref *kref);
+
+static inline void dma_chan_get(struct dma_chan *chan)
+{
+	if (unlikely(chan->slow_ref))
+		kref_get(&chan->refcount);
+	else {
+		local_inc(&(per_cpu_ptr(chan->local, get_cpu())->refcount));
+		put_cpu();
+	}
+}
+
+static inline void dma_chan_put(struct dma_chan *chan)
+{
+	if (unlikely(chan->slow_ref))
+		kref_put(&chan->refcount, dma_chan_cleanup);
+	else {
+		local_dec(&(per_cpu_ptr(chan->local, get_cpu())->refcount));
+		put_cpu();
+	}
+}
+
+/*
+ * typedef dma_event_callback - function pointer to a DMA event callback
+ */
+typedef void (*dma_event_callback) (struct dma_client *client,
+		struct dma_chan *chan, enum dma_event event);
+
+/**
+ * struct dma_client - info on the entity making use of DMA services
+ * @event_callback: func ptr to call when something happens
+ * @chan_count: number of chans allocated
+ * @chans_desired: number of chans requested. Can be +/- chan_count
+ * @lock: protects access to the channels list
+ * @channels: the list of DMA channels allocated
+ * @global_node: list_head for global dma_client_list
+ */
+struct dma_client {
+	dma_event_callback	event_callback;
+	unsigned int		chan_count;
+	unsigned int		chans_desired;
+
+	spinlock_t		lock;
+	struct list_head	channels;
+	struct list_head	global_node;
+};
+
+/**
+ * struct dma_device - info on the entity supplying DMA services
+ * @chancnt: how many DMA channels are supported
+ * @channels: the list of struct dma_chan
+ * @global_node: list_head for global dma_device_list
+ * @refcount:
+ * @done:
+ * @dev_id:
+ * Other func ptrs: used to make use of this device's capabilities
+ */
+struct dma_device {
+
+	unsigned int chancnt;
+	struct list_head channels;
+	struct list_head global_node;
+
+	struct kref refcount;
+	struct completion done;
+
+	int dev_id;
+
+	int (*device_alloc_chan_resources)(struct dma_chan *chan);
+	void (*device_free_chan_resources)(struct dma_chan *chan);
+	dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
+			void *dest, void *src, size_t len);
+	dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
+			struct page *page, unsigned int offset, void *kdata,
+			size_t len);
+	dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
+			struct page *dest_pg, unsigned int dest_off,
+			struct page *src_pg, unsigned int src_off, size_t len);
+	enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
+			dma_cookie_t cookie, dma_cookie_t *last,
+			dma_cookie_t *used);
+	void (*device_memcpy_issue_pending)(struct dma_chan *chan);
+};
+
+/* --- public DMA engine API --- */
+
+struct dma_client *dma_async_client_register(dma_event_callback event_callback);
+void dma_async_client_unregister(struct dma_client *client);
+void dma_async_client_chan_request(struct dma_client *client,
+		unsigned int number);
+
+/**
+ * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * @chan: DMA channel to offload copy to
+ * @dest: destination address (virtual)
+ * @src: source address (virtual)
+ * @len: length
+ *
+ * Both @dest and @src must be mappable to a bus address according to the
+ * DMA mapping API rules for streaming mappings.
+ * Both @dest and @src must stay memory resident (kernel memory or locked
+ * user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
+	void *dest, void *src, size_t len)
+{
+	int cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+}
+
+/**
+ * dma_async_memcpy_buf_to_pg - offloaded copy
+ * @chan: DMA channel to offload copy to
+ * @page: destination page
+ * @offset: offset in page to copy to
+ * @kdata: source address (virtual)
+ * @len: length
+ *
+ * Both @page/@offset and @kdata must be mappable to a bus address according
+ * to the DMA mapping API rules for streaming mappings.
+ * Both @page/@offset and @kdata must stay memory resident (kernel memory or
+ * locked user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
+	struct page *page, unsigned int offset, void *kdata, size_t len)
+{
+	int cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
+	                                             kdata, len);
+}
+
+/**
+ * dma_async_memcpy_buf_to_pg - offloaded copy
+ * @chan: DMA channel to offload copy to
+ * @dest_page: destination page
+ * @dest_off: offset in page to copy to
+ * @src_page: source page
+ * @src_off: offset in page to copy from
+ * @len: length
+ *
+ * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
+ * address according to the DMA mapping API rules for streaming mappings.
+ * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
+ * (kernel memory or locked user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
+	struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
+	unsigned int src_off, size_t len)
+{
+	int cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
+	                                            src_pg, src_off, len);
+}
+
+/**
+ * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * @chan:
+ *
+ * This allows drivers to push copies to HW in batches,
+ * reducing MMIO writes where possible.
+ */
+static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+{
+	return chan->device->device_memcpy_issue_pending(chan);
+}
+
+/**
+ * dma_async_memcpy_complete - poll for transaction completion
+ * @chan: DMA channel
+ * @cookie: transaction identifier to check status of
+ * @last: returns last completed cookie, can be NULL
+ * @used: returns last issued cookie, can be NULL
+ *
+ * If @last and @used are passed in, upon return they reflect the driver
+ * internal state and can be used with dma_async_is_complete() to check
+ * the status of multiple cookies without re-checking hardware state.
+ */
+static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
+	dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+{
+	return chan->device->device_memcpy_complete(chan, cookie, last, used);
+}
+
+/**
+ * dma_async_is_complete - test a cookie against chan state
+ * @cookie: transaction identifier to test status of
+ * @last_complete: last know completed transaction
+ * @last_used: last cookie value handed out
+ *
+ * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * the test logic is seperated for lightweight testing of multiple cookies
+ */
+static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
+			dma_cookie_t last_complete, dma_cookie_t last_used)
+{
+	if (last_complete <= last_used) {
+		if ((cookie <= last_complete) || (cookie > last_used))
+			return DMA_SUCCESS;
+	} else {
+		if ((cookie <= last_complete) && (cookie > last_used))
+			return DMA_SUCCESS;
+	}
+	return DMA_IN_PROGRESS;
+}
+
+
+/* --- DMA device --- */
+
+int dma_async_device_register(struct dma_device *device);
+void dma_async_device_unregister(struct dma_device *device);
+
+#endif /* CONFIG_DMA_ENGINE */
+#endif /* DMAENGINE_H */

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox