Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next-2.6 1/2] niu: Enable GRO by default.
From: David Miller @ 2010-04-21  2:08 UTC (permalink / raw)
  To: netdev


This was merely an oversight when I added the napi_gro_receive()
calls.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/niu.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/niu.c b/drivers/net/niu.c
index ef94022..493e25c 100644
--- a/drivers/net/niu.c
+++ b/drivers/net/niu.c
@@ -9838,7 +9838,7 @@ static int __devinit niu_pci_init_one(struct pci_dev *pdev,
 		}
 	}
 
-	dev->features |= (NETIF_F_SG | NETIF_F_HW_CSUM);
+	dev->features |= (NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_GRO);
 
 	np->regs = pci_ioremap_bar(pdev, 0);
 	if (!np->regs) {
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH net-next-2.6 2/2] tg3: Enable GRO by default.
From: David Miller @ 2010-04-21  2:08 UTC (permalink / raw)
  To: netdev


This was merely an oversight when I added the *_gro_receive()
calls.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/tg3.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 0fea685..7724d7e 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -12993,6 +12993,7 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
 		tp->dev->features |= NETIF_F_IP_CSUM | NETIF_F_SG;
 		if (tp->tg3_flags3 & TG3_FLG3_5755_PLUS)
 			tp->dev->features |= NETIF_F_IPV6_CSUM;
+		tp->dev->features |= NETIF_F_GRO;
 	}
 
 	/* Determine TSO capabilities */
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH 1/2] ehea: error handling improvement
From: David Miller @ 2010-04-21  2:16 UTC (permalink / raw)
  To: osstklei; +Cc: raisch, themann, linux-kernel, linuxppc-dev, netdev
In-Reply-To: <4BCC47AB.2090600@de.ibm.com>

From: Thomas Klein <osstklei@de.ibm.com>
Date: Mon, 19 Apr 2010 14:08:11 +0200

> Reset a port's resources only if they're actually in an error state
> 
> Signed-off-by: Thomas Klein <tklein@de.ibm.com>
> ---
> 
> Patch created against 2.6.34-rc4

There are several problems with these patches:

1) They are corrupted by your email client, lines unchanged
   begin with one space character instead of two.  Therefore
   even 'patch' wouldn't accept these changes.

2) The double slash in the patch file paths make git not
   accept the change.  Please don't put double-slashes in
   your patch paths as that canonically means "/".

3) These are not appropriate for net-2.6 as we are deep in
   the -rcX series at this point and only the most diabolical
   bug fixes are appropriate.  Therefore, please generate these
   against net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6] cleanup: remove two unnecessary exports in skbuff.c.
From: David Miller @ 2010-04-21  2:26 UTC (permalink / raw)
  To: ramirose; +Cc: netdev
In-Reply-To: <l2geb3ff54b1004190250qfe606c28i80dab5f156128f4c@mail.gmail.com>

From: Rami Rosen <ramirose@gmail.com>
Date: Mon, 19 Apr 2010 12:50:19 +0300

> There is no need to export skb_under_panic() and skb_over_panic() in
> skbuff.c, since these methods are used only in
> skbuff.c ; this patch removes these two exports.
> 
> Signed-off-by: Rami Rosen <ramirose@gmail.com>

If this is indeed the case, you should also mark these functions
'static' and remove the extern declarations of them from
include/linux/skbuff.h

Please make those changes and resubmit your patch (against
net-next-2.6 since that's where it belongs at this point).

Thanks.

^ permalink raw reply

* Re: [PATCH RFC: linux-next 1/2] irq: Add CPU mask affinity hint callback framework
From: David Miller @ 2010-04-21  2:28 UTC (permalink / raw)
  To: peter.p.waskiewicz.jr; +Cc: tglx, arjan, netdev, linux-kernel
In-Reply-To: <20100419045741.30276.23233.stgit@ppwaskie-hc2.jf.intel.com>

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Date: Sun, 18 Apr 2010 21:57:41 -0700

> This patch adds a callback function pointer to the irq_desc
> structure, along with a registration function and a read-only
> proc entry for each interrupt.
> 
> This affinity_hint handle for each interrupt can be used by
> underlying drivers that need a better mechanism to control
> interrupt affinity.  The underlying driver can register a
> callback for the interrupt, which will allow the driver to
> provide the CPU mask for the interrupt to anything that
> requests it.  The intent is to extend the userspace daemon,
> irqbalance, to help hint to it a preferred CPU mask to balance
> the interrupt into.
> 
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>

I'll leave it to the IRQ layer experts whether this is
appropriate or not, it doesn't look too bad to me.

^ permalink raw reply

* Re: [PATCH] gianfar: Wait for both RX and TX to stop
From: Kumar Gala @ 2010-04-21  4:22 UTC (permalink / raw)
  To: David Miller; +Cc: timur.tabi, afleming, netdev
In-Reply-To: <20100420.180646.216759318.davem@davemloft.net>


On Apr 20, 2010, at 8:06 PM, David Miller wrote:

> From: Timur Tabi <timur.tabi@gmail.com>
> Date: Tue, 20 Apr 2010 10:01:48 -0500
> 
>> On Mon, Apr 19, 2010 at 11:43 PM, Kumar Gala <galak@kernel.crashing.org> wrote:
>> 
>>> spin_event_timeout doesn't make sense for this.  The patch is fine.
>> 
>> Can you please elaborate on that?  I don't understand why you think
>> that.  spin_event_timeout() takes an expression and a timeout, and
>> loops over the expression calling cpu_relax(), just like this loop
>> does.
> 
> Indeed it does, Kumar this request seems reasonable.

Are we saying that cpu_relax() is useless and should be removed if we are spinning on a HW register?

Its fatally buggy HW if the bits never clear or get set in the few conditions that cpu_relax() are being used.

- k

^ permalink raw reply

* Re: 2.6.34-rc5: Reported regressions from 2.6.33
From: Rafael J. Wysocki @ 2010-04-21  5:14 UTC (permalink / raw)
  To: Ben Gamari
  Cc: Linux Kernel Mailing List, Maciej Rutecki, Andrew Morton,
	Linus Torvalds, Kernel Testers List, Network Development,
	Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List,
	DRI
In-Reply-To: <4bce5cc3.966bdc0a.2ef7.fffff5de@mx.google.com>

On Wednesday 21 April 2010, Ben Gamari wrote:
> On Tue, 20 Apr 2010 05:15:57 +0200 (CEST), "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > This message contains a list of some regressions from 2.6.33,
> > for which there are no fixes in the mainline known to the tracking team.
> > If any of them have been fixed already, please let us know.
> > 
> > If you know of any other unresolved regressions from 2.6.33, please let us
> > know either and we'll add them to the list.  Also, please let us know
> > if any of the entries below are invalid.
> > 
> 
> I have recently reported this suspend regression on my Dell laptop hardware.
> 
> References: http://lkml.org/lkml/2010/4/18/20
> Bug-report: https://bugzilla.kernel.org/show_bug.cgi?id=15820

This has been added to the list now.  Please check my comment in the Bugzilla
entry.

Rafael

^ permalink raw reply

* Re: 2.6.34-rc5: Reported regressions from 2.6.33
From: Rafael J. Wysocki @ 2010-04-21  5:15 UTC (permalink / raw)
  To: Nick Bowler
  Cc: Linux Kernel Mailing List, Maciej Rutecki, Andrew Morton,
	Linus Torvalds, Kernel Testers List, Network Development,
	Linux ACPI, Linux PM List, Linux SCSI List, Linux Wireless List,
	DRI
In-Reply-To: <20100420135636.GA10674-7BP4RkwGw0uXmMXjJBpWqg@public.gmane.org>

On Tuesday 20 April 2010, Nick Bowler wrote:
> On 05:15 Tue 20 Apr     , Rafael J. Wysocki wrote:
> > If you know of any other unresolved regressions from 2.6.33, please let us
> > know either and we'll add them to the list.  Also, please let us know
> > if any of the entries below are invalid.
> 
> Please list these two similar regressions from 2.6.33 in the r600 DRM:
> 
>  * r600 CS checker rejects GL_DEPTH_TEST w/o depth buffer:
>            https://bugs.freedesktop.org/show_bug.cgi?id=27571
> 
>  * r600 CS checker rejects narrow FBO renderbuffers:
>            https://bugs.freedesktop.org/show_bug.cgi?id=27609

Do you want to me to add them as one entry or as two separate bugs?

Rafael

^ permalink raw reply

* [PATCH net-next-2.6] cleanup: remove two unnecessary exports (skbuff).
From: Rami Rosen @ 2010-04-21  5:20 UTC (permalink / raw)
  To: davem, netdev

[-- Attachment #1: Type: text/plain, Size: 357 bytes --]

Hi,

	
There is no need to export skb_under_panic() and skb_over_panic() in
skbuff.c, since these methods are used only in skbuff.c ; this patch
removes these
two exports. It also marks these functions as 'static' and removeS the extern
declarations of them from include/linux/skbuff.h


Regards,
Rami Rosen


Signed-off-by: Rami Rosen <ramirose@gmail.com>

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 2167 bytes --]

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 38501d2..82f5116 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -470,10 +470,6 @@ extern int	       skb_cow_data(struct sk_buff *skb, int tailbits,
 				    struct sk_buff **trailer);
 extern int	       skb_pad(struct sk_buff *skb, int pad);
 #define dev_kfree_skb(a)	consume_skb(a)
-extern void	      skb_over_panic(struct sk_buff *skb, int len,
-				     void *here);
-extern void	      skb_under_panic(struct sk_buff *skb, int len,
-				      void *here);
 
 extern int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
 			int getfrag(void *from, char *to, int offset,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bdea0ef..4218ff4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -117,7 +117,7 @@ static const struct pipe_buf_operations sock_pipe_buf_ops = {
  *
  *	Out of line support code for skb_put(). Not user callable.
  */
-void skb_over_panic(struct sk_buff *skb, int sz, void *here)
+static void skb_over_panic(struct sk_buff *skb, int sz, void *here)
 {
 	printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p "
 			  "data:%p tail:%#lx end:%#lx dev:%s\n",
@@ -126,7 +126,6 @@ void skb_over_panic(struct sk_buff *skb, int sz, void *here)
 	       skb->dev ? skb->dev->name : "<NULL>");
 	BUG();
 }
-EXPORT_SYMBOL(skb_over_panic);
 
 /**
  *	skb_under_panic	- 	private function
@@ -137,7 +136,7 @@ EXPORT_SYMBOL(skb_over_panic);
  *	Out of line support code for skb_push(). Not user callable.
  */
 
-void skb_under_panic(struct sk_buff *skb, int sz, void *here)
+static void skb_under_panic(struct sk_buff *skb, int sz, void *here)
 {
 	printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p "
 			  "data:%p tail:%#lx end:%#lx dev:%s\n",
@@ -146,7 +145,6 @@ void skb_under_panic(struct sk_buff *skb, int sz, void *here)
 	       skb->dev ? skb->dev->name : "<NULL>");
 	BUG();
 }
-EXPORT_SYMBOL(skb_under_panic);
 
 /* 	Allocate a new skbuff. We do this ourselves so we can fill in a few
  *	'private' fields and also do memory statistics to find all the

^ permalink raw reply related

* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Greg Ungerer @ 2010-04-21  5:42 UTC (permalink / raw)
  To: Bryan Wu
  Cc: s.hauer, gerg, amit.kucheria, netdev, kernel-team, linux-kernel,
	linux-arm-kernel, w.sang
In-Reply-To: <1269597052-10104-1-git-send-email-bryan.wu@canonical.com>


Hi Bryan,

Bryan Wu wrote:
> BugLink: http://bugs.launchpad.net/bugs/457878
> 
>  - removed old MII phy control code
>  - add phylib supporting
>  - add ethtool interface to make user space NetworkManager works
> 
> Tested on Freescale i.MX51 Babbage board.
> 
> This patch is based on a patch from Frederic Rodo <fred.rodo@gmail.com>
> 
> Cc: Frederic Rodo <fred.rodo@gmail.com>
> Signed-off-by: Bryan Wu <bryan.wu@canonical.com>

I tested this on a ColdFire based M5208EVB board (which has internal
FEC interface), and it seemed to work ok. Not all of the PHY's
listed internal to this code are supported by phylib (though I think
most will fall back to generic and be ok).

So I am ok with it:

Acked-by: Greg Ungerer <gerg@uclinux.org>

Regards
Greg



> ---
>  drivers/net/Kconfig |    1 +
>  drivers/net/fec.c   | 1125 ++++++++++++---------------------------------------
>  2 files changed, 253 insertions(+), 873 deletions(-)
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 0ba5b8e..41f6a70 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -1916,6 +1916,7 @@ config FEC
>  	bool "FEC ethernet controller (of ColdFire and some i.MX CPUs)"
>  	depends on M523x || M527x || M5272 || M528x || M520x || M532x || \
>  		MACH_MX27 || ARCH_MX35 || ARCH_MX25 || ARCH_MX5
> +	select PHYLIB
>  	help
>  	  Say Y here if you want to use the built-in 10/100 Fast ethernet
>  	  controller on some Motorola ColdFire and Freescale i.MX processors.
> diff --git a/drivers/net/fec.c b/drivers/net/fec.c
> index 9f98c1c..fca1f66 100644
> --- a/drivers/net/fec.c
> +++ b/drivers/net/fec.c
> @@ -40,6 +40,7 @@
>  #include <linux/irq.h>
>  #include <linux/clk.h>
>  #include <linux/platform_device.h>
> +#include <linux/phy.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -61,7 +62,6 @@
>   * Define the fixed address of the FEC hardware.
>   */
>  #if defined(CONFIG_M5272)
> -#define HAVE_mii_link_interrupt
>  
>  static unsigned char	fec_mac_default[] = {
>  	0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> @@ -86,23 +86,6 @@ static unsigned char	fec_mac_default[] = {
>  #endif
>  #endif /* CONFIG_M5272 */
>  
> -/* Forward declarations of some structures to support different PHYs */
> -
> -typedef struct {
> -	uint mii_data;
> -	void (*funct)(uint mii_reg, struct net_device *dev);
> -} phy_cmd_t;
> -
> -typedef struct {
> -	uint id;
> -	char *name;
> -
> -	const phy_cmd_t *config;
> -	const phy_cmd_t *startup;
> -	const phy_cmd_t *ack_int;
> -	const phy_cmd_t *shutdown;
> -} phy_info_t;
> -
>  /* The number of Tx and Rx buffers.  These are allocated from the page
>   * pool.  The code may assume these are power of two, so it it best
>   * to keep them that size.
> @@ -189,29 +172,21 @@ struct fec_enet_private {
>  	uint	tx_full;
>  	/* hold while accessing the HW like ringbuffer for tx/rx but not MAC */
>  	spinlock_t hw_lock;
> -	/* hold while accessing the mii_list_t() elements */
> -	spinlock_t mii_lock;
> -
> -	uint	phy_id;
> -	uint	phy_id_done;
> -	uint	phy_status;
> -	uint	phy_speed;
> -	phy_info_t const	*phy;
> -	struct work_struct phy_task;
>  
> -	uint	sequence_done;
> -	uint	mii_phy_task_queued;
> +	struct  platform_device *pdev;
>  
> -	uint	phy_addr;
> +	int	opened;
>  
> +	/* Phylib and MDIO interface */
> +	struct  mii_bus *mii_bus;
> +	struct  phy_device *phy_dev;
> +	int     mii_timeout;
> +	uint    phy_speed;
>  	int	index;
> -	int	opened;
>  	int	link;
> -	int	old_link;
>  	int	full_duplex;
>  };
>  
> -static void fec_enet_mii(struct net_device *dev);
>  static irqreturn_t fec_enet_interrupt(int irq, void * dev_id);
>  static void fec_enet_tx(struct net_device *dev);
>  static void fec_enet_rx(struct net_device *dev);
> @@ -219,67 +194,20 @@ static int fec_enet_close(struct net_device *dev);
>  static void fec_restart(struct net_device *dev, int duplex);
>  static void fec_stop(struct net_device *dev);
>  
> +/* FEC MII MMFR bits definition */
> +#define FEC_MMFR_ST		(1 << 30)
> +#define FEC_MMFR_OP_READ	(2 << 28)
> +#define FEC_MMFR_OP_WRITE	(1 << 28)
> +#define FEC_MMFR_PA(v)		((v & 0x1f) << 23)
> +#define FEC_MMFR_RA(v)		((v & 0x1f) << 18)
> +#define FEC_MMFR_TA		(2 << 16)
> +#define FEC_MMFR_DATA(v)	(v & 0xffff)
>  
> -/* MII processing.  We keep this as simple as possible.  Requests are
> - * placed on the list (if there is room).  When the request is finished
> - * by the MII, an optional function may be called.
> - */
> -typedef struct mii_list {
> -	uint	mii_regval;
> -	void	(*mii_func)(uint val, struct net_device *dev);
> -	struct	mii_list *mii_next;
> -} mii_list_t;
> -
> -#define		NMII	20
> -static mii_list_t	mii_cmds[NMII];
> -static mii_list_t	*mii_free;
> -static mii_list_t	*mii_head;
> -static mii_list_t	*mii_tail;
> -
> -static int	mii_queue(struct net_device *dev, int request,
> -				void (*func)(uint, struct net_device *));
> -
> -/* Make MII read/write commands for the FEC */
> -#define mk_mii_read(REG)	(0x60020000 | ((REG & 0x1f) << 18))
> -#define mk_mii_write(REG, VAL)	(0x50020000 | ((REG & 0x1f) << 18) | \
> -						(VAL & 0xffff))
> -#define mk_mii_end	0
> +#define FEC_MII_TIMEOUT		10000
>  
>  /* Transmitter timeout */
>  #define TX_TIMEOUT (2 * HZ)
>  
> -/* Register definitions for the PHY */
> -
> -#define MII_REG_CR          0  /* Control Register                         */
> -#define MII_REG_SR          1  /* Status Register                          */
> -#define MII_REG_PHYIR1      2  /* PHY Identification Register 1            */
> -#define MII_REG_PHYIR2      3  /* PHY Identification Register 2            */
> -#define MII_REG_ANAR        4  /* A-N Advertisement Register               */
> -#define MII_REG_ANLPAR      5  /* A-N Link Partner Ability Register        */
> -#define MII_REG_ANER        6  /* A-N Expansion Register                   */
> -#define MII_REG_ANNPTR      7  /* A-N Next Page Transmit Register          */
> -#define MII_REG_ANLPRNPR    8  /* A-N Link Partner Received Next Page Reg. */
> -
> -/* values for phy_status */
> -
> -#define PHY_CONF_ANE	0x0001  /* 1 auto-negotiation enabled */
> -#define PHY_CONF_LOOP	0x0002  /* 1 loopback mode enabled */
> -#define PHY_CONF_SPMASK	0x00f0  /* mask for speed */
> -#define PHY_CONF_10HDX	0x0010  /* 10 Mbit half duplex supported */
> -#define PHY_CONF_10FDX	0x0020  /* 10 Mbit full duplex supported */
> -#define PHY_CONF_100HDX	0x0040  /* 100 Mbit half duplex supported */
> -#define PHY_CONF_100FDX	0x0080  /* 100 Mbit full duplex supported */
> -
> -#define PHY_STAT_LINK	0x0100  /* 1 up - 0 down */
> -#define PHY_STAT_FAULT	0x0200  /* 1 remote fault */
> -#define PHY_STAT_ANC	0x0400  /* 1 auto-negotiation complete	*/
> -#define PHY_STAT_SPMASK	0xf000  /* mask for speed */
> -#define PHY_STAT_10HDX	0x1000  /* 10 Mbit half duplex selected	*/
> -#define PHY_STAT_10FDX	0x2000  /* 10 Mbit full duplex selected	*/
> -#define PHY_STAT_100HDX	0x4000  /* 100 Mbit half duplex selected */
> -#define PHY_STAT_100FDX	0x8000  /* 100 Mbit full duplex selected */
> -
> -
>  static int
>  fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
> @@ -406,12 +334,6 @@ fec_enet_interrupt(int irq, void * dev_id)
>  			ret = IRQ_HANDLED;
>  			fec_enet_tx(dev);
>  		}
> -
> -		if (int_events & FEC_ENET_MII) {
> -			ret = IRQ_HANDLED;
> -			fec_enet_mii(dev);
> -		}
> -
>  	} while (int_events);
>  
>  	return ret;
> @@ -607,827 +529,312 @@ rx_processing_done:
>  	spin_unlock(&fep->hw_lock);
>  }
>  
> -/* called from interrupt context */
> -static void
> -fec_enet_mii(struct net_device *dev)
> -{
> -	struct	fec_enet_private *fep;
> -	mii_list_t	*mip;
> -
> -	fep = netdev_priv(dev);
> -	spin_lock(&fep->mii_lock);
> -
> -	if ((mip = mii_head) == NULL) {
> -		printk("MII and no head!\n");
> -		goto unlock;
> -	}
> -
> -	if (mip->mii_func != NULL)
> -		(*(mip->mii_func))(readl(fep->hwp + FEC_MII_DATA), dev);
> -
> -	mii_head = mip->mii_next;
> -	mip->mii_next = mii_free;
> -	mii_free = mip;
> -
> -	if ((mip = mii_head) != NULL)
> -		writel(mip->mii_regval, fep->hwp + FEC_MII_DATA);
> -
> -unlock:
> -	spin_unlock(&fep->mii_lock);
> -}
> -
> -static int
> -mii_queue_unlocked(struct net_device *dev, int regval,
> -		void (*func)(uint, struct net_device *))
> +/* ------------------------------------------------------------------------- */
> +#ifdef CONFIG_M5272
> +static void __inline__ fec_get_mac(struct net_device *dev)
>  {
> -	struct fec_enet_private *fep;
> -	mii_list_t	*mip;
> -	int		retval;
> -
> -	/* Add PHY address to register command */
> -	fep = netdev_priv(dev);
> +	struct fec_enet_private *fep = netdev_priv(dev);
> +	unsigned char *iap, tmpaddr[ETH_ALEN];
>  
> -	regval |= fep->phy_addr << 23;
> -	retval = 0;
> -
> -	if ((mip = mii_free) != NULL) {
> -		mii_free = mip->mii_next;
> -		mip->mii_regval = regval;
> -		mip->mii_func = func;
> -		mip->mii_next = NULL;
> -		if (mii_head) {
> -			mii_tail->mii_next = mip;
> -			mii_tail = mip;
> -		} else {
> -			mii_head = mii_tail = mip;
> -			writel(regval, fep->hwp + FEC_MII_DATA);
> -		}
> +	if (FEC_FLASHMAC) {
> +		/*
> +		 * Get MAC address from FLASH.
> +		 * If it is all 1's or 0's, use the default.
> +		 */
> +		iap = (unsigned char *)FEC_FLASHMAC;
> +		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
> +		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
> +			iap = fec_mac_default;
> +		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
> +		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
> +			iap = fec_mac_default;
>  	} else {
> -		retval = 1;
> +		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
> +		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
> +		iap = &tmpaddr[0];
>  	}
>  
> -	return retval;
> -}
> -
> -static int
> -mii_queue(struct net_device *dev, int regval,
> -		void (*func)(uint, struct net_device *))
> -{
> -	struct fec_enet_private *fep;
> -	unsigned long   flags;
> -	int             retval;
> -	fep = netdev_priv(dev);
> -	spin_lock_irqsave(&fep->mii_lock, flags);
> -	retval = mii_queue_unlocked(dev, regval, func);
> -	spin_unlock_irqrestore(&fep->mii_lock, flags);
> -	return retval;
> -}
> -
> -static void mii_do_cmd(struct net_device *dev, const phy_cmd_t *c)
> -{
> -	if(!c)
> -		return;
> +	memcpy(dev->dev_addr, iap, ETH_ALEN);
>  
> -	for (; c->mii_data != mk_mii_end; c++)
> -		mii_queue(dev, c->mii_data, c->funct);
> +	/* Adjust MAC if using default MAC address */
> +	if (iap == fec_mac_default)
> +		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
>  }
> +#endif
>  
> -static void mii_parse_sr(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_STAT_LINK | PHY_STAT_FAULT | PHY_STAT_ANC);
> -
> -	if (mii_reg & 0x0004)
> -		status |= PHY_STAT_LINK;
> -	if (mii_reg & 0x0010)
> -		status |= PHY_STAT_FAULT;
> -	if (mii_reg & 0x0020)
> -		status |= PHY_STAT_ANC;
> -	*s = status;
> -}
> +/* ------------------------------------------------------------------------- */
>  
> -static void mii_parse_cr(uint mii_reg, struct net_device *dev)
> +/*
> + * Phy section
> + */
> +static void fec_enet_adjust_link(struct net_device *dev)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_CONF_ANE | PHY_CONF_LOOP);
> -
> -	if (mii_reg & 0x1000)
> -		status |= PHY_CONF_ANE;
> -	if (mii_reg & 0x4000)
> -		status |= PHY_CONF_LOOP;
> -	*s = status;
> -}
> +	struct phy_device *phy_dev = fep->phy_dev;
> +	unsigned long flags;
>  
> -static void mii_parse_anar(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_CONF_SPMASK);
> -
> -	if (mii_reg & 0x0020)
> -		status |= PHY_CONF_10HDX;
> -	if (mii_reg & 0x0040)
> -		status |= PHY_CONF_10FDX;
> -	if (mii_reg & 0x0080)
> -		status |= PHY_CONF_100HDX;
> -	if (mii_reg & 0x00100)
> -		status |= PHY_CONF_100FDX;
> -	*s = status;
> -}
> +	int status_change = 0;
>  
> -/* ------------------------------------------------------------------------- */
> -/* The Level one LXT970 is used by many boards				     */
> +	spin_lock_irqsave(&fep->hw_lock, flags);
>  
> -#define MII_LXT970_MIRROR    16  /* Mirror register           */
> -#define MII_LXT970_IER       17  /* Interrupt Enable Register */
> -#define MII_LXT970_ISR       18  /* Interrupt Status Register */
> -#define MII_LXT970_CONFIG    19  /* Configuration Register    */
> -#define MII_LXT970_CSR       20  /* Chip Status Register      */
> +	/* Prevent a state halted on mii error */
> +	if (fep->mii_timeout && phy_dev->state == PHY_HALTED) {
> +		phy_dev->state = PHY_RESUMING;
> +		goto spin_unlock;
> +	}
>  
> -static void mii_parse_lxt970_csr(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	/* Duplex link change */
> +	if (phy_dev->link) {
> +		if (fep->full_duplex != phy_dev->duplex) {
> +			fec_restart(dev, phy_dev->duplex);
> +			status_change = 1;
> +		}
> +	}
>  
> -	status = *s & ~(PHY_STAT_SPMASK);
> -	if (mii_reg & 0x0800) {
> -		if (mii_reg & 0x1000)
> -			status |= PHY_STAT_100FDX;
> +	/* Link on or off change */
> +	if (phy_dev->link != fep->link) {
> +		fep->link = phy_dev->link;
> +		if (phy_dev->link)
> +			fec_restart(dev, phy_dev->duplex);
>  		else
> -			status |= PHY_STAT_100HDX;
> -	} else {
> -		if (mii_reg & 0x1000)
> -			status |= PHY_STAT_10FDX;
> -		else
> -			status |= PHY_STAT_10HDX;
> +			fec_stop(dev);
> +		status_change = 1;
>  	}
> -	*s = status;
> -}
> -
> -static phy_cmd_t const phy_cmd_lxt970_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_startup[] = { /* enable interrupts */
> -		{ mk_mii_write(MII_LXT970_IER, 0x0002), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_ack_int[] = {
> -		/* read SR and ISR to acknowledge */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_LXT970_ISR), NULL },
> -
> -		/* find out the current status */
> -		{ mk_mii_read(MII_LXT970_CSR), mii_parse_lxt970_csr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt970_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_LXT970_IER, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_lxt970 = {
> -	.id = 0x07810000,
> -	.name = "LXT970",
> -	.config = phy_cmd_lxt970_config,
> -	.startup = phy_cmd_lxt970_startup,
> -	.ack_int = phy_cmd_lxt970_ack_int,
> -	.shutdown = phy_cmd_lxt970_shutdown
> -};
>  
> -/* ------------------------------------------------------------------------- */
> -/* The Level one LXT971 is used on some of my custom boards                  */
> -
> -/* register definitions for the 971 */
> +spin_unlock:
> +	spin_unlock_irqrestore(&fep->hw_lock, flags);
>  
> -#define MII_LXT971_PCR       16  /* Port Control Register     */
> -#define MII_LXT971_SR2       17  /* Status Register 2         */
> -#define MII_LXT971_IER       18  /* Interrupt Enable Register */
> -#define MII_LXT971_ISR       19  /* Interrupt Status Register */
> -#define MII_LXT971_LCR       20  /* LED Control Register      */
> -#define MII_LXT971_TCR       30  /* Transmit Control Register */
> +	if (status_change)
> +		phy_print_status(phy_dev);
> +}
>  
>  /*
> - * I had some nice ideas of running the MDIO faster...
> - * The 971 should support 8MHz and I tried it, but things acted really
> - * weird, so 2.5 MHz ought to be enough for anyone...
> + * NOTE: a MII transaction is during around 25 us, so polling it...
>   */
> -
> -static void mii_parse_lxt971_sr2(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	struct fec_enet_private *fep = bus->priv;
> +	int timeout = FEC_MII_TIMEOUT;
>  
> -	status = *s & ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
> +	fep->mii_timeout = 0;
>  
> -	if (mii_reg & 0x0400) {
> -		fep->link = 1;
> -		status |= PHY_STAT_LINK;
> -	} else {
> -		fep->link = 0;
> -	}
> -	if (mii_reg & 0x0080)
> -		status |= PHY_STAT_ANC;
> -	if (mii_reg & 0x4000) {
> -		if (mii_reg & 0x0200)
> -			status |= PHY_STAT_100FDX;
> -		else
> -			status |= PHY_STAT_100HDX;
> -	} else {
> -		if (mii_reg & 0x0200)
> -			status |= PHY_STAT_10FDX;
> -		else
> -			status |= PHY_STAT_10HDX;
> +	/* clear MII end of transfer bit*/
> +	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
> +
> +	/* start a read op */
> +	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> +		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> +		FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
> +
> +	/* wait for end of transfer */
> +	while (!(readl(fep->hwp + FEC_IEVENT) & FEC_ENET_MII)) {
> +		cpu_relax();
> +		if (timeout-- < 0) {
> +			fep->mii_timeout = 1;
> +			printk(KERN_ERR "FEC: MDIO read timeout\n");
> +			return -ETIMEDOUT;
> +		}
>  	}
> -	if (mii_reg & 0x0008)
> -		status |= PHY_STAT_FAULT;
>  
> -	*s = status;
> +	/* return value */
> +	return FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
>  }
>  
> -static phy_cmd_t const phy_cmd_lxt971_config[] = {
> -		/* limit to 10MBit because my prototype board
> -		 * doesn't work with 100. */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_LXT971_IER, 0x00f2), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_write(MII_LXT971_LCR, 0xd422), NULL }, /* LED config */
> -		/* Somehow does the 971 tell me that the link is down
> -		 * the first read after power-up.
> -		 * read here to get a valid value in ack_int */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_ack_int[] = {
> -		/* acknowledge the int before reading status ! */
> -		{ mk_mii_read(MII_LXT971_ISR), NULL },
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_lxt971_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_LXT971_IER, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_lxt971 = {
> -	.id = 0x0001378e,
> -	.name = "LXT971",
> -	.config = phy_cmd_lxt971_config,
> -	.startup = phy_cmd_lxt971_startup,
> -	.ack_int = phy_cmd_lxt971_ack_int,
> -	.shutdown = phy_cmd_lxt971_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* The Quality Semiconductor QS6612 is used on the RPX CLLF                  */
> -
> -/* register definitions */
> -
> -#define MII_QS6612_MCR       17  /* Mode Control Register      */
> -#define MII_QS6612_FTR       27  /* Factory Test Register      */
> -#define MII_QS6612_MCO       28  /* Misc. Control Register     */
> -#define MII_QS6612_ISR       29  /* Interrupt Source Register  */
> -#define MII_QS6612_IMR       30  /* Interrupt Mask Register    */
> -#define MII_QS6612_PCR       31  /* 100BaseTx PHY Control Reg. */
> -
> -static void mii_parse_qs6612_pcr(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> +			   u16 value)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> +	struct fec_enet_private *fep = bus->priv;
> +	int timeout = FEC_MII_TIMEOUT;
>  
> -	status = *s & ~(PHY_STAT_SPMASK);
> +	fep->mii_timeout = 0;
>  
> -	switch((mii_reg >> 2) & 7) {
> -	case 1: status |= PHY_STAT_10HDX; break;
> -	case 2: status |= PHY_STAT_100HDX; break;
> -	case 5: status |= PHY_STAT_10FDX; break;
> -	case 6: status |= PHY_STAT_100FDX; break;
> -}
> -
> -	*s = status;
> -}
> -
> -static phy_cmd_t const phy_cmd_qs6612_config[] = {
> -		/* The PHY powers up isolated on the RPX,
> -		 * so send a command to allow operation.
> -		 */
> -		{ mk_mii_write(MII_QS6612_PCR, 0x0dc0), NULL },
> -
> -		/* parse cr and anar to get some info */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_QS6612_IMR, 0x003a), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_ack_int[] = {
> -		/* we need to read ISR, SR and ANER to acknowledge */
> -		{ mk_mii_read(MII_QS6612_ISR), NULL },
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_REG_ANER), NULL },
> -
> -		/* read pcr to get info */
> -		{ mk_mii_read(MII_QS6612_PCR), mii_parse_qs6612_pcr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_qs6612_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_QS6612_IMR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_qs6612 = {
> -	.id = 0x00181440,
> -	.name = "QS6612",
> -	.config = phy_cmd_qs6612_config,
> -	.startup = phy_cmd_qs6612_startup,
> -	.ack_int = phy_cmd_qs6612_ack_int,
> -	.shutdown = phy_cmd_qs6612_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* AMD AM79C874 phy                                                          */
> +	/* clear MII end of transfer bit*/
> +	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>  
> -/* register definitions for the 874 */
> +	/* start a read op */
> +	writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> +		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> +		FEC_MMFR_TA | FEC_MMFR_DATA(value),
> +		fep->hwp + FEC_MII_DATA);
> +
> +	/* wait for end of transfer */
> +	while (!(readl(fep->hwp + FEC_IEVENT) & FEC_ENET_MII)) {
> +		cpu_relax();
> +		if (timeout-- < 0) {
> +			fep->mii_timeout = 1;
> +			printk(KERN_ERR "FEC: MDIO write timeout\n");
> +			return -ETIMEDOUT;
> +		}
> +	}
>  
> -#define MII_AM79C874_MFR       16  /* Miscellaneous Feature Register */
> -#define MII_AM79C874_ICSR      17  /* Interrupt/Status Register      */
> -#define MII_AM79C874_DR        18  /* Diagnostic Register            */
> -#define MII_AM79C874_PMLR      19  /* Power and Loopback Register    */
> -#define MII_AM79C874_MCR       21  /* ModeControl Register           */
> -#define MII_AM79C874_DC        23  /* Disconnect Counter             */
> -#define MII_AM79C874_REC       24  /* Recieve Error Counter          */
> +	return 0;
> +}
>  
> -static void mii_parse_am79c874_dr(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_reset(struct mii_bus *bus)
>  {
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -	uint status;
> -
> -	status = *s & ~(PHY_STAT_SPMASK | PHY_STAT_ANC);
> -
> -	if (mii_reg & 0x0080)
> -		status |= PHY_STAT_ANC;
> -	if (mii_reg & 0x0400)
> -		status |= ((mii_reg & 0x0800) ? PHY_STAT_100FDX : PHY_STAT_100HDX);
> -	else
> -		status |= ((mii_reg & 0x0800) ? PHY_STAT_10FDX : PHY_STAT_10HDX);
> -
> -	*s = status;
> +	return 0;
>  }
>  
> -static phy_cmd_t const phy_cmd_am79c874_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_AM79C874_ICSR, 0xff00), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_ack_int[] = {
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
> -		/* we only need to read ISR to acknowledge */
> -		{ mk_mii_read(MII_AM79C874_ICSR), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_am79c874_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_AM79C874_ICSR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_am79c874 = {
> -	.id = 0x00022561,
> -	.name = "AM79C874",
> -	.config = phy_cmd_am79c874_config,
> -	.startup = phy_cmd_am79c874_startup,
> -	.ack_int = phy_cmd_am79c874_ack_int,
> -	.shutdown = phy_cmd_am79c874_shutdown
> -};
> -
> -
> -/* ------------------------------------------------------------------------- */
> -/* Kendin KS8721BL phy                                                       */
> -
> -/* register definitions for the 8721 */
> -
> -#define MII_KS8721BL_RXERCR	21
> -#define MII_KS8721BL_ICSR	27
> -#define	MII_KS8721BL_PHYCR	31
> -
> -static phy_cmd_t const phy_cmd_ks8721bl_config[] = {
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_startup[] = {  /* enable interrupts */
> -		{ mk_mii_write(MII_KS8721BL_ICSR, 0xff00), NULL },
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_ack_int[] = {
> -		/* find out the current status */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		/* we only need to read ISR to acknowledge */
> -		{ mk_mii_read(MII_KS8721BL_ICSR), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_cmd_t const phy_cmd_ks8721bl_shutdown[] = { /* disable interrupts */
> -		{ mk_mii_write(MII_KS8721BL_ICSR, 0x0000), NULL },
> -		{ mk_mii_end, }
> -	};
> -static phy_info_t const phy_info_ks8721bl = {
> -	.id = 0x00022161,
> -	.name = "KS8721BL",
> -	.config = phy_cmd_ks8721bl_config,
> -	.startup = phy_cmd_ks8721bl_startup,
> -	.ack_int = phy_cmd_ks8721bl_ack_int,
> -	.shutdown = phy_cmd_ks8721bl_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* register definitions for the DP83848 */
> -
> -#define MII_DP8384X_PHYSTST    16  /* PHY Status Register */
> -
> -static void mii_parse_dp8384x_sr2(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mii_probe(struct net_device *dev)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> -
> -	*s &= ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
> -
> -	/* Link up */
> -	if (mii_reg & 0x0001) {
> -		fep->link = 1;
> -		*s |= PHY_STAT_LINK;
> -	} else
> -		fep->link = 0;
> -	/* Status of link */
> -	if (mii_reg & 0x0010)   /* Autonegotioation complete */
> -		*s |= PHY_STAT_ANC;
> -	if (mii_reg & 0x0002) {   /* 10MBps? */
> -		if (mii_reg & 0x0004)   /* Full Duplex? */
> -			*s |= PHY_STAT_10FDX;
> -		else
> -			*s |= PHY_STAT_10HDX;
> -	} else {                  /* 100 Mbps? */
> -		if (mii_reg & 0x0004)   /* Full Duplex? */
> -			*s |= PHY_STAT_100FDX;
> -		else
> -			*s |= PHY_STAT_100HDX;
> -	}
> -	if (mii_reg & 0x0008)
> -		*s |= PHY_STAT_FAULT;
> -}
> -
> -static phy_info_t phy_info_dp83848= {
> -	0x020005c9,
> -	"DP83848",
> +	struct phy_device *phy_dev = NULL;
> +	int phy_addr;
>  
> -	(const phy_cmd_t []) {  /* config */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_read(MII_DP8384X_PHYSTST), mii_parse_dp8384x_sr2 },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) {  /* startup - enable interrupts */
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* ack_int - never happens, no interrupt */
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) {  /* shutdown */
> -		{ mk_mii_end, }
> -	},
> -};
> +	/* find the first phy */
> +	for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++) {
> +		if (fep->mii_bus->phy_map[phy_addr]) {
> +			phy_dev = fep->mii_bus->phy_map[phy_addr];
> +			break;
> +		}
> +	}
>  
> -static phy_info_t phy_info_lan8700 = {
> -	0x0007C0C,
> -	"LAN8700",
> -	(const phy_cmd_t []) { /* config */
> -		{ mk_mii_read(MII_REG_CR), mii_parse_cr },
> -		{ mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* startup */
> -		{ mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> -		{ mk_mii_read(MII_REG_SR), mii_parse_sr },
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* act_int */
> -		{ mk_mii_end, }
> -	},
> -	(const phy_cmd_t []) { /* shutdown */
> -		{ mk_mii_end, }
> -	},
> -};
> -/* ------------------------------------------------------------------------- */
> +	if (!phy_dev) {
> +		printk(KERN_ERR "%s: no PHY found\n", dev->name);
> +		return -ENODEV;
> +	}
>  
> -static phy_info_t const * const phy_info[] = {
> -	&phy_info_lxt970,
> -	&phy_info_lxt971,
> -	&phy_info_qs6612,
> -	&phy_info_am79c874,
> -	&phy_info_ks8721bl,
> -	&phy_info_dp83848,
> -	&phy_info_lan8700,
> -	NULL
> -};
> +	/* attach the mac to the phy */
> +	phy_dev = phy_connect(dev, dev_name(&phy_dev->dev),
> +			     &fec_enet_adjust_link, 0,
> +			     PHY_INTERFACE_MODE_MII);
> +	if (IS_ERR(phy_dev)) {
> +		printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
> +		return PTR_ERR(phy_dev);
> +	}
>  
> -/* ------------------------------------------------------------------------- */
> -#ifdef HAVE_mii_link_interrupt
> -static irqreturn_t
> -mii_link_interrupt(int irq, void * dev_id);
> +	/* mask with MAC supported features */
> +	phy_dev->supported &= PHY_BASIC_FEATURES;
> +	phy_dev->advertising = phy_dev->supported;
>  
> -/*
> - *	This is specific to the MII interrupt setup of the M5272EVB.
> - */
> -static void __inline__ fec_request_mii_intr(struct net_device *dev)
> -{
> -	if (request_irq(66, mii_link_interrupt, IRQF_DISABLED, "fec(MII)", dev) != 0)
> -		printk("FEC: Could not allocate fec(MII) IRQ(66)!\n");
> -}
> +	fep->phy_dev = phy_dev;
> +	fep->link = 0;
> +	fep->full_duplex = 0;
>  
> -static void __inline__ fec_disable_phy_intr(struct net_device *dev)
> -{
> -	free_irq(66, dev);
> +	return 0;
>  }
> -#endif
>  
> -#ifdef CONFIG_M5272
> -static void __inline__ fec_get_mac(struct net_device *dev)
> +static int fec_enet_mii_init(struct platform_device *pdev)
>  {
> +	struct net_device *dev = platform_get_drvdata(pdev);
>  	struct fec_enet_private *fep = netdev_priv(dev);
> -	unsigned char *iap, tmpaddr[ETH_ALEN];
> +	int err = -ENXIO, i;
>  
> -	if (FEC_FLASHMAC) {
> -		/*
> -		 * Get MAC address from FLASH.
> -		 * If it is all 1's or 0's, use the default.
> -		 */
> -		iap = (unsigned char *)FEC_FLASHMAC;
> -		if ((iap[0] == 0) && (iap[1] == 0) && (iap[2] == 0) &&
> -		    (iap[3] == 0) && (iap[4] == 0) && (iap[5] == 0))
> -			iap = fec_mac_default;
> -		if ((iap[0] == 0xff) && (iap[1] == 0xff) && (iap[2] == 0xff) &&
> -		    (iap[3] == 0xff) && (iap[4] == 0xff) && (iap[5] == 0xff))
> -			iap = fec_mac_default;
> -	} else {
> -		*((unsigned long *) &tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
> -		*((unsigned short *) &tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH) >> 16);
> -		iap = &tmpaddr[0];
> -	}
> -
> -	memcpy(dev->dev_addr, iap, ETH_ALEN);
> -
> -	/* Adjust MAC if using default MAC address */
> -	if (iap == fec_mac_default)
> -		 dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
> -}
> -#endif
> +	fep->mii_timeout = 0;
>  
> -/* ------------------------------------------------------------------------- */
> -
> -static void mii_display_status(struct net_device *dev)
> -{
> -	struct fec_enet_private *fep = netdev_priv(dev);
> -	volatile uint *s = &(fep->phy_status);
> +	/*
> +	 * Set MII speed to 2.5 MHz
> +	 */
> +	fep->phy_speed = ((((clk_get_rate(fep->clk) / 2 + 4999999)
> +					/ 2500000) / 2) & 0x3F) << 1;
> +	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
>  
> -	if (!fep->link && !fep->old_link) {
> -		/* Link is still down - don't print anything */
> -		return;
> +	fep->mii_bus = mdiobus_alloc();
> +	if (fep->mii_bus == NULL) {
> +		err = -ENOMEM;
> +		goto err_out;
>  	}
>  
> -	printk("%s: status: ", dev->name);
> -
> -	if (!fep->link) {
> -		printk("link down");
> -	} else {
> -		printk("link up");
> -
> -		switch(*s & PHY_STAT_SPMASK) {
> -		case PHY_STAT_100FDX: printk(", 100MBit Full Duplex"); break;
> -		case PHY_STAT_100HDX: printk(", 100MBit Half Duplex"); break;
> -		case PHY_STAT_10FDX: printk(", 10MBit Full Duplex"); break;
> -		case PHY_STAT_10HDX: printk(", 10MBit Half Duplex"); break;
> -		default:
> -			printk(", Unknown speed/duplex");
> -		}
> -
> -		if (*s & PHY_STAT_ANC)
> -			printk(", auto-negotiation complete");
> +	fep->mii_bus->name = "fec_enet_mii_bus";
> +	fep->mii_bus->read = fec_enet_mdio_read;
> +	fep->mii_bus->write = fec_enet_mdio_write;
> +	fep->mii_bus->reset = fec_enet_mdio_reset;
> +	snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id);
> +	fep->mii_bus->priv = fep;
> +	fep->mii_bus->parent = &pdev->dev;
> +
> +	fep->mii_bus->irq = kmalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
> +	if (!fep->mii_bus->irq) {
> +		err = -ENOMEM;
> +		goto err_out_free_mdiobus;
>  	}
>  
> -	if (*s & PHY_STAT_FAULT)
> -		printk(", remote fault");
> -
> -	printk(".\n");
> -}
> -
> -static void mii_display_config(struct work_struct *work)
> -{
> -	struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
> -	struct net_device *dev = fep->netdev;
> -	uint status = fep->phy_status;
> +	for (i = 0; i < PHY_MAX_ADDR; i++)
> +		fep->mii_bus->irq[i] = PHY_POLL;
>  
> -	/*
> -	** When we get here, phy_task is already removed from
> -	** the workqueue.  It is thus safe to allow to reuse it.
> -	*/
> -	fep->mii_phy_task_queued = 0;
> -	printk("%s: config: auto-negotiation ", dev->name);
> -
> -	if (status & PHY_CONF_ANE)
> -		printk("on");
> -	else
> -		printk("off");
> +	platform_set_drvdata(dev, fep->mii_bus);
>  
> -	if (status & PHY_CONF_100FDX)
> -		printk(", 100FDX");
> -	if (status & PHY_CONF_100HDX)
> -		printk(", 100HDX");
> -	if (status & PHY_CONF_10FDX)
> -		printk(", 10FDX");
> -	if (status & PHY_CONF_10HDX)
> -		printk(", 10HDX");
> -	if (!(status & PHY_CONF_SPMASK))
> -		printk(", No speed/duplex selected?");
> +	if (mdiobus_register(fep->mii_bus))
> +		goto err_out_free_mdio_irq;
>  
> -	if (status & PHY_CONF_LOOP)
> -		printk(", loopback enabled");
> +	if (fec_enet_mii_probe(dev) != 0)
> +		goto err_out_unregister_bus;
>  
> -	printk(".\n");
> +	return 0;
>  
> -	fep->sequence_done = 1;
> +err_out_unregister_bus:
> +	mdiobus_unregister(fep->mii_bus);
> +err_out_free_mdio_irq:
> +	kfree(fep->mii_bus->irq);
> +err_out_free_mdiobus:
> +	mdiobus_free(fep->mii_bus);
> +err_out:
> +	return err;
>  }
>  
> -static void mii_relink(struct work_struct *work)
> +static void fec_enet_mii_remove(struct fec_enet_private *fep)
>  {
> -	struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
> -	struct net_device *dev = fep->netdev;
> -	int duplex;
> -
> -	/*
> -	** When we get here, phy_task is already removed from
> -	** the workqueue.  It is thus safe to allow to reuse it.
> -	*/
> -	fep->mii_phy_task_queued = 0;
> -	fep->link = (fep->phy_status & PHY_STAT_LINK) ? 1 : 0;
> -	mii_display_status(dev);
> -	fep->old_link = fep->link;
> -
> -	if (fep->link) {
> -		duplex = 0;
> -		if (fep->phy_status
> -		    & (PHY_STAT_100FDX | PHY_STAT_10FDX))
> -			duplex = 1;
> -		fec_restart(dev, duplex);
> -	} else
> -		fec_stop(dev);
> +	if (fep->phy_dev)
> +		phy_disconnect(fep->phy_dev);
> +	mdiobus_unregister(fep->mii_bus);
> +	kfree(fep->mii_bus->irq);
> +	mdiobus_free(fep->mii_bus);
>  }
>  
> -/* mii_queue_relink is called in interrupt context from mii_link_interrupt */
> -static void mii_queue_relink(uint mii_reg, struct net_device *dev)
> +static int fec_enet_get_settings(struct net_device *dev,
> +				  struct ethtool_cmd *cmd)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
>  
> -	/*
> -	 * We cannot queue phy_task twice in the workqueue.  It
> -	 * would cause an endless loop in the workqueue.
> -	 * Fortunately, if the last mii_relink entry has not yet been
> -	 * executed now, it will do the job for the current interrupt,
> -	 * which is just what we want.
> -	 */
> -	if (fep->mii_phy_task_queued)
> -		return;
> +	if (!phydev)
> +		return -ENODEV;
>  
> -	fep->mii_phy_task_queued = 1;
> -	INIT_WORK(&fep->phy_task, mii_relink);
> -	schedule_work(&fep->phy_task);
> +	return phy_ethtool_gset(phydev, cmd);
>  }
>  
> -/* mii_queue_config is called in interrupt context from fec_enet_mii */
> -static void mii_queue_config(uint mii_reg, struct net_device *dev)
> +static int fec_enet_set_settings(struct net_device *dev,
> +				 struct ethtool_cmd *cmd)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
>  
> -	if (fep->mii_phy_task_queued)
> -		return;
> +	if (!phydev)
> +		return -ENODEV;
>  
> -	fep->mii_phy_task_queued = 1;
> -	INIT_WORK(&fep->phy_task, mii_display_config);
> -	schedule_work(&fep->phy_task);
> +	return phy_ethtool_sset(phydev, cmd);
>  }
>  
> -phy_cmd_t const phy_cmd_relink[] = {
> -	{ mk_mii_read(MII_REG_CR), mii_queue_relink },
> -	{ mk_mii_end, }
> -	};
> -phy_cmd_t const phy_cmd_config[] = {
> -	{ mk_mii_read(MII_REG_CR), mii_queue_config },
> -	{ mk_mii_end, }
> -	};
> -
> -/* Read remainder of PHY ID. */
> -static void
> -mii_discover_phy3(uint mii_reg, struct net_device *dev)
> +static void fec_enet_get_drvinfo(struct net_device *dev,
> +				 struct ethtool_drvinfo *info)
>  {
> -	struct fec_enet_private *fep;
> -	int i;
> -
> -	fep = netdev_priv(dev);
> -	fep->phy_id |= (mii_reg & 0xffff);
> -	printk("fec: PHY @ 0x%x, ID 0x%08x", fep->phy_addr, fep->phy_id);
> -
> -	for(i = 0; phy_info[i]; i++) {
> -		if(phy_info[i]->id == (fep->phy_id >> 4))
> -			break;
> -	}
> -
> -	if (phy_info[i])
> -		printk(" -- %s\n", phy_info[i]->name);
> -	else
> -		printk(" -- unknown PHY!\n");
> +	struct fec_enet_private *fep = netdev_priv(dev);
>  
> -	fep->phy = phy_info[i];
> -	fep->phy_id_done = 1;
> +	strcpy(info->driver, fep->pdev->dev.driver->name);
> +	strcpy(info->version, "Revision: 1.0");
> +	strcpy(info->bus_info, dev_name(&dev->dev));
>  }
>  
> -/* Scan all of the MII PHY addresses looking for someone to respond
> - * with a valid ID.  This usually happens quickly.
> - */
> -static void
> -mii_discover_phy(uint mii_reg, struct net_device *dev)
> -{
> -	struct fec_enet_private *fep;
> -	uint phytype;
> -
> -	fep = netdev_priv(dev);
> -
> -	if (fep->phy_addr < 32) {
> -		if ((phytype = (mii_reg & 0xffff)) != 0xffff && phytype != 0) {
> -
> -			/* Got first part of ID, now get remainder */
> -			fep->phy_id = phytype << 16;
> -			mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR2),
> -							mii_discover_phy3);
> -		} else {
> -			fep->phy_addr++;
> -			mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR1),
> -							mii_discover_phy);
> -		}
> -	} else {
> -		printk("FEC: No PHY device found.\n");
> -		/* Disable external MII interface */
> -		writel(0, fep->hwp + FEC_MII_SPEED);
> -		fep->phy_speed = 0;
> -#ifdef HAVE_mii_link_interrupt
> -		fec_disable_phy_intr(dev);
> -#endif
> -	}
> -}
> +static struct ethtool_ops fec_enet_ethtool_ops = {
> +	.get_settings		= fec_enet_get_settings,
> +	.set_settings		= fec_enet_set_settings,
> +	.get_drvinfo		= fec_enet_get_drvinfo,
> +	.get_link		= ethtool_op_get_link,
> +};
>  
> -/* This interrupt occurs when the PHY detects a link change */
> -#ifdef HAVE_mii_link_interrupt
> -static irqreturn_t
> -mii_link_interrupt(int irq, void * dev_id)
> +static int fec_enet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
>  {
> -	struct	net_device *dev = dev_id;
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	struct phy_device *phydev = fep->phy_dev;
>  
> -	mii_do_cmd(dev, fep->phy->ack_int);
> -	mii_do_cmd(dev, phy_cmd_relink);  /* restart and display status */
> +	if (!netif_running(dev))
> +		return -EINVAL;
>  
> -	return IRQ_HANDLED;
> +	if (!phydev)
> +		return -ENODEV;
> +
> +	return phy_mii_ioctl(phydev, if_mii(rq), cmd);
>  }
> -#endif
>  
>  static void fec_enet_free_buffers(struct net_device *dev)
>  {
> @@ -1509,35 +916,8 @@ fec_enet_open(struct net_device *dev)
>  	if (ret)
>  		return ret;
>  
> -	fep->sequence_done = 0;
> -	fep->link = 0;
> -
> -	fec_restart(dev, 1);
> -
> -	if (fep->phy) {
> -		mii_do_cmd(dev, fep->phy->ack_int);
> -		mii_do_cmd(dev, fep->phy->config);
> -		mii_do_cmd(dev, phy_cmd_config);  /* display configuration */
> -
> -		/* Poll until the PHY tells us its configuration
> -		 * (not link state).
> -		 * Request is initiated by mii_do_cmd above, but answer
> -		 * comes by interrupt.
> -		 * This should take about 25 usec per register at 2.5 MHz,
> -		 * and we read approximately 5 registers.
> -		 */
> -		while(!fep->sequence_done)
> -			schedule();
> -
> -		mii_do_cmd(dev, fep->phy->startup);
> -	}
> -
> -	/* Set the initial link state to true. A lot of hardware
> -	 * based on this device does not implement a PHY interrupt,
> -	 * so we are never notified of link change.
> -	 */
> -	fep->link = 1;
> -
> +	/* schedule a link state check */
> +	phy_start(fep->phy_dev);
>  	netif_start_queue(dev);
>  	fep->opened = 1;
>  	return 0;
> @@ -1550,6 +930,7 @@ fec_enet_close(struct net_device *dev)
>  
>  	/* Don't know what to do yet. */
>  	fep->opened = 0;
> +	phy_stop(fep->phy_dev);
>  	netif_stop_queue(dev);
>  	fec_stop(dev);
>  
> @@ -1666,6 +1047,7 @@ static const struct net_device_ops fec_netdev_ops = {
>  	.ndo_validate_addr	= eth_validate_addr,
>  	.ndo_tx_timeout		= fec_timeout,
>  	.ndo_set_mac_address	= fec_set_mac_address,
> +	.ndo_do_ioctl           = fec_enet_ioctl,
>  };
>  
>   /*
> @@ -1689,7 +1071,6 @@ static int fec_enet_init(struct net_device *dev, int index)
>  	}
>  
>  	spin_lock_init(&fep->hw_lock);
> -	spin_lock_init(&fep->mii_lock);
>  
>  	fep->index = index;
>  	fep->hwp = (void __iomem *)dev->base_addr;
> @@ -1716,16 +1097,10 @@ static int fec_enet_init(struct net_device *dev, int index)
>  	fep->rx_bd_base = cbd_base;
>  	fep->tx_bd_base = cbd_base + RX_RING_SIZE;
>  
> -#ifdef HAVE_mii_link_interrupt
> -	fec_request_mii_intr(dev);
> -#endif
>  	/* The FEC Ethernet specific entries in the device structure */
>  	dev->watchdog_timeo = TX_TIMEOUT;
>  	dev->netdev_ops = &fec_netdev_ops;
> -
> -	for (i=0; i<NMII-1; i++)
> -		mii_cmds[i].mii_next = &mii_cmds[i+1];
> -	mii_free = mii_cmds;
> +	dev->ethtool_ops = &fec_enet_ethtool_ops;
>  
>  	/* Set MII speed to 2.5 MHz */
>  	fep->phy_speed = ((((clk_get_rate(fep->clk) / 2 + 4999999)
> @@ -1760,13 +1135,6 @@ static int fec_enet_init(struct net_device *dev, int index)
>  
>  	fec_restart(dev, 0);
>  
> -	/* Queue up command to detect the PHY and initialize the
> -	 * remainder of the interface.
> -	 */
> -	fep->phy_id_done = 0;
> -	fep->phy_addr = 0;
> -	mii_queue(dev, mk_mii_read(MII_REG_PHYIR1), mii_discover_phy);
> -
>  	return 0;
>  }
>  
> @@ -1835,8 +1203,7 @@ fec_restart(struct net_device *dev, int duplex)
>  	writel(0, fep->hwp + FEC_R_DES_ACTIVE);
>  
>  	/* Enable interrupts we wish to service */
> -	writel(FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII,
> -			fep->hwp + FEC_IMASK);
> +	writel(FEC_ENET_TXF | FEC_ENET_RXF, fep->hwp + FEC_IMASK);
>  }
>  
>  static void
> @@ -1859,7 +1226,6 @@ fec_stop(struct net_device *dev)
>  	/* Clear outstanding MII command interrupts. */
>  	writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>  
> -	writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
>  	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
>  }
>  
> @@ -1891,6 +1257,7 @@ fec_probe(struct platform_device *pdev)
>  	memset(fep, 0, sizeof(*fep));
>  
>  	ndev->base_addr = (unsigned long)ioremap(r->start, resource_size(r));
> +	fep->pdev = pdev;
>  
>  	if (!ndev->base_addr) {
>  		ret = -ENOMEM;
> @@ -1926,13 +1293,24 @@ fec_probe(struct platform_device *pdev)
>  	if (ret)
>  		goto failed_init;
>  
> +	ret = fec_enet_mii_init(pdev);
> +	if (ret)
> +		goto failed_mii_init;
> +
>  	ret = register_netdev(ndev);
>  	if (ret)
>  		goto failed_register;
>  
> +	printk(KERN_INFO "%s: Freescale FEC PHY driver [%s] "
> +		"(mii_bus:phy_addr=%s, irq=%d)\n", ndev->name,
> +		fep->phy_dev->drv->name, dev_name(&fep->phy_dev->dev),
> +		fep->phy_dev->irq);
> +
>  	return 0;
>  
>  failed_register:
> +	fec_enet_mii_remove(fep);
> +failed_mii_init:
>  failed_init:
>  	clk_disable(fep->clk);
>  	clk_put(fep->clk);
> @@ -1959,6 +1337,7 @@ fec_drv_remove(struct platform_device *pdev)
>  	platform_set_drvdata(pdev, NULL);
>  
>  	fec_stop(ndev);
> +	fec_enet_mii_remove(fep);
>  	clk_disable(fep->clk);
>  	clk_put(fep->clk);
>  	iounmap((void __iomem *)ndev->base_addr);


-- 
------------------------------------------------------------------------
Greg Ungerer  --  Principal Engineer        EMAIL:     gerg@snapgear.com
SnapGear Group, McAfee                      PHONE:       +61 7 3435 2888
8 Gardner Close                             FAX:         +61 7 3217 5323
Milton, QLD, 4064, Australia                WEB: http://www.SnapGear.com

^ permalink raw reply

* Re: [PATCH] gianfar: Wait for both RX and TX to stop
From: David Miller @ 2010-04-21  5:36 UTC (permalink / raw)
  To: galak; +Cc: timur.tabi, afleming, netdev
In-Reply-To: <A20F7457-20BC-493C-B800-3933D8FC4D5C@kernel.crashing.org>

From: Kumar Gala <galak@kernel.crashing.org>
Date: Tue, 20 Apr 2010 23:22:19 -0500

> 
> On Apr 20, 2010, at 8:06 PM, David Miller wrote:
> 
>> From: Timur Tabi <timur.tabi@gmail.com>
>> Date: Tue, 20 Apr 2010 10:01:48 -0500
>> 
>>> On Mon, Apr 19, 2010 at 11:43 PM, Kumar Gala <galak@kernel.crashing.org> wrote:
>>> 
>>>> spin_event_timeout doesn't make sense for this.  The patch is fine.
>>> 
>>> Can you please elaborate on that?  I don't understand why you think
>>> that.  spin_event_timeout() takes an expression and a timeout, and
>>> loops over the expression calling cpu_relax(), just like this loop
>>> does.
>> 
>> Indeed it does, Kumar this request seems reasonable.
> 
> Are we saying that cpu_relax() is useless and should be removed if we are spinning on a HW register?

Kumar, take a deep breath and a step back.

spin_event_timeout() does the cpu_relax() too, that's what Timur is
trying to tell you.

The code will be basically identical as far as I can tell.

^ permalink raw reply

* Re: [PATCH net-next-2.6] cleanup: remove two unnecessary exports (skbuff).
From: David Miller @ 2010-04-21  5:40 UTC (permalink / raw)
  To: ramirose; +Cc: netdev
In-Reply-To: <g2teb3ff54b1004202220va58efc7evdb91e2cf38b062d4@mail.gmail.com>

From: Rami Rosen <ramirose@gmail.com>
Date: Wed, 21 Apr 2010 08:20:31 +0300

> There is no need to export skb_under_panic() and skb_over_panic() in
> skbuff.c, since these methods are used only in skbuff.c ; this patch
> removes these
> two exports. It also marks these functions as 'static' and removeS the extern
> declarations of them from include/linux/skbuff.h
> 
> Signed-off-by: Rami Rosen <ramirose@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Eric Dumazet @ 2010-04-21  5:46 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Ben Greear, David Miller, Gaspar Chilingarov, netdev
In-Reply-To: <20100421003022.GA3107@ioremap.net>

Le mercredi 21 avril 2010 à 04:30 +0400, Evgeniy Polyakov a écrit :
> On Wed, Apr 21, 2010 at 02:05:14AM +0200, Eric Dumazet (eric.dumazet@gmail.com) wrote:
> > I believe the bsockets 'optimization' is a bug, we should remove it.
> > 
> > This is a stable candidate (2.6.30+)
> > 
> > [PATCH net-next-2.6] tcp: remove bsockets count
> > 
> > Counting number of bound sockets to avoid a loop is buggy, since we cant
> > know how many IP addresses are in use. When threshold is reached, we try
> > 5 random slots and can fail while there are plenty available ports.
> 
> To return back to exponential bind() times you need to revert the whole
> original patch including magic 5 number, not only bsockets.
> 
> But actual problem is not in this digit, but in a deeper logic.
> Previously we scanned the whole table, now we have 5 attempts to
> find out at least one bucket (without conflict) we will insert
> new socket into. Apparently for large number of addresses it is possible
> that all 5 times we will randomly select those buckets which conflicts.
> As dumb solution we can increase 'attempt' number to infinite one, or
> fallback to whole-table-search after several random attempts, which is a
> bit more clever I think.
> 

Hmm, maybe I am blind, but on the case the threshold is reached, we dont
have 5 attempts "to find out at least one bucket (without conflict)"

We just take the first entry from the random starting point, _without_
checking we have a conflict.

if (net_eq(ib_net(tb), net) && tb->port == rover) {
	if (tb->fastreuse > 0 &&
	    sk->sk_reuse &&
	    sk->sk_state != TCP_LISTEN &&
	    (tb->num_owners < smallest_size || smallest_size == -1)) {
		smallest_size = tb->num_owners;
		smallest_rover = rover;
		if (atomic_read(&hashinfo->bsockets) > (high-low)+1) {
			spin_unlock(&head->lock);
			snum = smallest_rover;   // We select this, without checking for
conflicts.
			goto have_snum;
		}
	}


Then we goto to "have_snum" label

Then we realize (selected_IP, randomport) is already in use.
End of first try.

We redo the thing 5 times, so we only look at 5 slots out of
32000-64000.

Maybe the fix would need to check if there is a conflict before doing
the "goto have_snum"

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index e0a3e35..0498daf 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -120,9 +120,11 @@ again:
 						smallest_size = tb->num_owners;
 						smallest_rover = rover;
 						if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) {
-							spin_unlock(&head->lock);
-							snum = smallest_rover;
-							goto have_snum;
+							if (!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb))
+								spin_unlock(&head->lock);
+								snum = smallest_rover;
+								goto have_snum;
+							}
 						}
 					}
 					goto next;



^ permalink raw reply related

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: Eric Dumazet @ 2010-04-21  6:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100418.024601.85853662.davem@davemloft.net>

Le dimanche 18 avril 2010 à 02:46 -0700, David Miller a écrit :

> Looks good, applied, thanks Eric.

Hmm, looking at the GSO stuff, I believe we should not call
skb_orphan_try() on gso skbs ?

Thanks !

[PATCH net-next-2.6] net: Dont call skb_orphan_try() for GSO

At this point, skb->destructor is not the original one (stored in
DEV_GSO_CB(skb)->destructor)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index b31d5d6..200d1e8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1937,7 +1937,6 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
-		skb_orphan_try(nskb);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)



^ permalink raw reply related

* [PATCH] ethernet: print protocol in host byte order
From: Johannes Berg @ 2010-04-21  7:06 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet

Eric's recent patch added __force, but this
place would seem to require actually doing
a byte order conversion so the printk is
consistent across architectures.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
 net/ethernet/eth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 3584696..0c0d272 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -136,7 +136,7 @@ int eth_rebuild_header(struct sk_buff *skb)
 	default:
 		printk(KERN_DEBUG
 		       "%s: unable to resolve type %X addresses.\n",
-		       dev->name, (__force int)eth->h_proto);
+		       dev->name, ntohs(eth->h_proto));
 
 		memcpy(eth->h_source, dev->dev_addr, ETH_ALEN);
 		break;



^ permalink raw reply related

* ipv6: Fix tcp_v6_send_response checksum
From: Herbert Xu @ 2010-04-21  7:07 UTC (permalink / raw)
  To: David S. Miller, netdev

Hi:

ipv6: Fix tcp_v6_send_response checksum

My recent patch to remove the open-coded checksum sequence in
tcp_v6_send_response broke it as we did not set the transport
header pointer on the new packet.

Instead we had set the transport header on the original packet,
which is unnecessary and unexpected.

So this patch removes that and instead sets the transport header
on the new packet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c92ebe8..075f540 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1015,7 +1015,7 @@ static void tcp_v6_send_response(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
 	skb_reserve(buff, MAX_HEADER + sizeof(struct ipv6hdr) + tot_len);

 	t1 = (struct tcphdr *) skb_push(buff, tot_len);
-	skb_reset_transport_header(skb);
+	skb_reset_transport_header(buff);

 	/* Swap the send and the receive. */
 	memset(t1, 0, sizeof(*t1));

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* Re: [PATCH] net: ipv6 bind to device issue
From: Jiri Olsa @ 2010-04-21  7:21 UTC (permalink / raw)
  To: Brian Haley
  Cc: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet,
	netdev
In-Reply-To: <4BCDEED3.7040901@hp.com>

On Tue, Apr 20, 2010 at 02:13:39PM -0400, Brian Haley wrote:
> Jiri Olsa wrote:
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index c2438e8..7bf7717 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -815,7 +815,7 @@ struct dst_entry * ip6_route_output(struct net *net, struct sock *sk,
> >  {
> >  	int flags = 0;
> >  
> > -	if (rt6_need_strict(&fl->fl6_dst))
> > +	if (rt6_need_strict(&fl->fl6_dst) || fl->oif)
> >  		flags |= RT6_LOOKUP_F_IFACE;
> >  
> >  	if (!ipv6_addr_any(&fl->fl6_src))
> 
> Actually, looking at this again, we might want to swap the order
> here since fl->oif should be filled-in for most link-local and
> multicast requests calling this:
> 
> 	if (fl->oif || rt6_need_strict(&fl->fl6_dst))
> 
> Just a thought, but it potentially saves a call to determine
> the scope of the address.
> 
> -Brian

I think it's a good idea, attaching the changed patch

thanks,
jirka
---

The issue raises when having 2 NICs both assigned the same
IPv6 global address.

If a sender binds to a particular NIC (SO_BINDTODEVICE),
the outgoing traffic is being sent via the first found.
The bonded device is thus not taken into an account during the
routing.


>From the ip6_route_output function:

If the binding address is multicast, linklocal or loopback,
the RT6_LOOKUP_F_IFACE bit is set, but not for global address.

So binding global address will neglect SO_BINDTODEVICE-binded device,
because the fib6_rule_lookup function path won't check for the
flowi::oif field and take first route that fits.

Following patch should handle the issue.

wbr,
jirka


Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Scott Otto <scott.otto@alcatel-lucent.com>
---
 net/ipv6/route.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c2438e8..05ebd78 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -815,7 +815,7 @@ struct dst_entry * ip6_route_output(struct net *net, struct sock *sk,
 {
 	int flags = 0;
 
-	if (rt6_need_strict(&fl->fl6_dst))
+	if (fl->oif || rt6_need_strict(&fl->fl6_dst))
 		flags |= RT6_LOOKUP_F_IFACE;
 
 	if (!ipv6_addr_any(&fl->fl6_src))


^ permalink raw reply related

* Re: ipv6: Fix tcp_v6_send_response checksum
From: David Miller @ 2010-04-21  7:49 UTC (permalink / raw)
  To: herbert; +Cc: netdev, cratiu
In-Reply-To: <20100421070737.GA30517@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 21 Apr 2010 15:07:37 +0800

> ipv6: Fix tcp_v6_send_response checksum

I put this into net-2.6 and modified the commit message since, as we
found, this incorrect transport header reset was added there to fix
IPSEC.

I'm convinced that Cosmin didn't test the patch he actually sent out
:-)

Thanks!

--------------------
ipv6: Fix tcp_v6_send_response transport header setting.

My recent patch to remove the open-coded checksum sequence in
tcp_v6_send_response broke it as we did not set the transport
header pointer on the new packet.

Actually, there is code there trying to set the transport
header properly, but it sets it for the wrong skb ('skb'
instead of 'buff').

This bug was introduced by commit
a8fdf2b331b38d61fb5f11f3aec4a4f9fb2dedcb ("ipv6: Fix
tcp_v6_send_response(): it didn't set skb transport header")

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/tcp_ipv6.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c92ebe8..075f540 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1015,7 +1015,7 @@ static void tcp_v6_send_response(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
 	skb_reserve(buff, MAX_HEADER + sizeof(struct ipv6hdr) + tot_len);
 
 	t1 = (struct tcphdr *) skb_push(buff, tot_len);
-	skb_reset_transport_header(skb);
+	skb_reset_transport_header(buff);
 
 	/* Swap the send and the receive. */
 	memset(t1, 0, sizeof(*t1));
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] ipv4: handle GARPs specially when updating neighbors
From: Sasha Levin @ 2010-04-21  8:02 UTC (permalink / raw)
  To: netdev@vger.kernel.org

From: Sasha Levin <sasha@comsleep.com>

We are currently testing IP fail-over on storage devices, and have observed an issue with the IP transfer from one device to another.

Assuming we have 2 storage devices A and B, and a server C which uses the storage, the scenario is:

1. Device A sends an ARP request which server C sees – server C updates it’s ARP table with the MAC of device A.
2. Device A fails, Device B takes over the IP and sends out a GARP.
3. Even though device C sees the GARP, it ignores it and keeps trying to communicate with device A until the entry is removed from its cache and a new ARP request is generated.

The code which causes this is located in arp_process@/net/ipv4/arp.c:

override = time_after(jiffies, n->updated + n->parms->locktime);

/* Broadcast replies and request packets
   do not assert neighbour reachability.
 */
if (arp->ar_op != htons(ARPOP_REPLY) ||
    skb->pkt_type != PACKET_HOST)
        state = NUD_STALE;
neigh_update(n, sha, state, override ? NEIGH_UPDATE_F_OVERRIDE : 0);
neigh_release(n);

According to the code, this scenario happens because the kernel ignores any ARP updates which happened in a short period after the previous ARP update. The reason which was stated in the comments is  “If several different ARP replies follows back-to-back, use the FIRST one. It is possible, if several proxy agents are active. Taking the first reply prevents arp trashing and chooses the fastest router.”.

This, however, doesn’t take into account GARPs which are not being sent by ARP proxies anyway and just ignores them too – causing a loss of communication for over a minute until the ARP cache refreshes.

Signed-off-by: Sasha Levin <sasha@comsleep.com>
---
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 1a9dd66..caa2093 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -876,8 +876,11 @@ static int arp_process(struct sk_buff *skb)
 		   use the FIRST one. It is possible, if several proxy
 		   agents are active. Taking the first reply prevents
 		   arp trashing and chooses the fastest router.
+
+		   GARPs are always updating the cache since they can
+		   originate from different devices with the same IP.
 		 */
-		override = time_after(jiffies, n->updated + n->parms->locktime);
+		override = (sip == tip) || time_after(jiffies, n->updated + n->parms->locktime);

 		/* Broadcast replies and request packets
 		   do not assert neighbour reachability.

^ permalink raw reply related

* Re: [PATCH] NET: Fix an RCU warning in dev_pick_tx()
From: David Miller @ 2010-04-21  8:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dhowells, netdev
In-Reply-To: <1271779414.7895.35.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 20 Apr 2010 18:03:34 +0200

> Le mardi 20 avril 2010 à 11:25 +0100, David Howells a écrit :
>> Fix the following RCU warning in dev_pick_tx():
 ...
> Absolutely right, thanks David
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

I'll apply this, thanks everyone.

> This might conflict with following commit in net-next-2.6, where I chose
> the rcu_dereference_check() alternative

Yead I'll be mindful of this when I do a merge right after
this, thanks for the heads up.

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Evgeniy Polyakov @ 2010-04-21  8:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Greear, David Miller, Gaspar Chilingarov, netdev
In-Reply-To: <1271828799.7895.1287.camel@edumazet-laptop>

On Wed, Apr 21, 2010 at 07:46:39AM +0200, Eric Dumazet (eric.dumazet@gmail.com) wrote:
> 		if (atomic_read(&hashinfo->bsockets) > (high-low)+1) {
> 			spin_unlock(&head->lock);
> 			snum = smallest_rover;   // We select this, without checking for
> conflicts.
> 			goto have_snum;
> 		}
> 	}
> 
> 
> Then we goto to "have_snum" label
> 
> Then we realize (selected_IP, randomport) is already in use.
> End of first try.
> 
> We redo the thing 5 times, so we only look at 5 slots out of
> 32000-64000.

We only break out of the loop in above case when number of sockets is
already more than our range limit. If we would just try 5 random times
out of 1000 in Gaspar's case, we would not be able to select all 1000
sockets.

> Maybe the fix would need to check if there is a conflict before doing
> the "goto have_snum"

I believe this is a useful patch, but it addresses a different issue.
This path should not fire up when we bind to single address.

> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index e0a3e35..0498daf 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -120,9 +120,11 @@ again:
>  						smallest_size = tb->num_owners;
>  						smallest_rover = rover;
>  						if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) {
> -							spin_unlock(&head->lock);
> -							snum = smallest_rover;
> -							goto have_snum;
> +							if (!inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb))
> +								spin_unlock(&head->lock);
> +								snum = smallest_rover;
> +								goto have_snum;
> +							}
>  						}
>  					}
>  					goto next;
> 

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.
From: Michael S. Tsirkin @ 2010-04-21  8:35 UTC (permalink / raw)
  To: Xin, Xiaohui
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	jdike@linux.intel.com, davem@davemloft.net
In-Reply-To: <F2E9EB7348B8264F86B6AB8151CE2D79026FAB1E5F@shsmsx502.ccr.corp.intel.com>

On Tue, Apr 20, 2010 at 10:21:55AM +0800, Xin, Xiaohui wrote:
> Michael,
> 
> >>>>>> What we have not done yet:
> >>>>>> 	packet split support
> >>>>>> 
> >>>>>What does this mean, exactly?
> >>>> We can support 1500MTU, but for jumbo frame, since vhost driver before don't 
> >>>>support mergeable buffer, we cannot try it for multiple sg.
> >>>> 
> >>>I do not see why, vhost currently supports 64K buffers with indirect
> >>>descriptors.
> >>> 
> >> The receive_skb() in guest virtio-net driver will merge the multiple sg to skb frags, how >>can indirect descriptors to that?
> 
> >See add_recvbuf_big.
> 
> I don't mean this, it's for buffer submission. I mean when packet is received, in receive_buf(), mergeable buffer knows which pages received can be hooked in skb frags, it's receive_mergeable() which do this.
> 
> When a NIC driver supports packet split mode, then each ring descriptor contains a skb and a page. When packet is received, if the status is not EOP, then hook the page of the next descriptor to the prev skb. We don't how many frags belongs to one skb. So when guest submit buffers, it should submit multiple pages, and when receive, the guest should know which pages are belongs to one skb and hook them together. I think receive_mergeable() can do this, but I don't see how big->packets handle this. May I miss something here?
> 
> Thanks
> Xiaohui 


Yes, I think this packet split mode probably maps well to mergeable buffer
support. Note that
1. Not all devices support large packets in this way, others might map
   to indirect buffers better
   So we have to figure out how migration is going to work
2. It's up to guest driver whether to enable features such as
   mergeable buffers and indirect buffers
   So we have to figure out how to notify guest which mode
   is optimal for a given device
3. We don't want to depend on jumbo frames for decent performance
   So we probably should support GSO/GRO

-- 
MST

^ permalink raw reply

* Re: 2.6.34-rc5: Reported regressions from 2.6.33
From: Jerome Glisse @ 2010-04-21  8:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nick Bowler, Kernel Testers List, Linux SCSI List,
	Network Development, Linux Wireless List,
	Linux Kernel Mailing List, Linux ACPI, Andrew Morton, DRI,
	Linus Torvalds, Linux PM List, Maciej Rutecki
In-Reply-To: <201004210715.38621.rjw@sisk.pl>

On Wed, Apr 21, 2010 at 07:15:38AM +0200, Rafael J. Wysocki wrote:
> On Tuesday 20 April 2010, Nick Bowler wrote:
> > On 05:15 Tue 20 Apr     , Rafael J. Wysocki wrote:
> > > If you know of any other unresolved regressions from 2.6.33, please let us
> > > know either and we'll add them to the list.  Also, please let us know
> > > if any of the entries below are invalid.
> > 
> > Please list these two similar regressions from 2.6.33 in the r600 DRM:
> > 
> >  * r600 CS checker rejects GL_DEPTH_TEST w/o depth buffer:
> >            https://bugs.freedesktop.org/show_bug.cgi?id=27571
> > 
> >  * r600 CS checker rejects narrow FBO renderbuffers:
> >            https://bugs.freedesktop.org/show_bug.cgi?id=27609
> 
> Do you want to me to add them as one entry or as two separate bugs?
> 
> Rafael
> 

First one is userspace bug, i need to look into the second one.
ie we were lucky the hw didn't lockup without depth buffer and
depth test enabled.

Cheers,
Jerome

^ permalink raw reply

* Re: ipv6: Fix tcp_v6_send_response checksum
From: David Miller @ 2010-04-21  8:58 UTC (permalink / raw)
  To: herbert; +Cc: netdev, cratiu
In-Reply-To: <20100421.004922.193694715.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 21 Apr 2010 00:49:22 -0700 (PDT)

> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Wed, 21 Apr 2010 15:07:37 +0800
> 
>> ipv6: Fix tcp_v6_send_response checksum
> 
> I put this into net-2.6 and modified the commit message since, as we
> found, this incorrect transport header reset was added there to fix
> IPSEC.

Ok, even with this pulled into net-next-2.6 the ipv6 tcp response
checksums are still bad.  The following fix is necessary as
well:

--------------------
tcp: Fix ipv6 checksumming on response packets for real.

Commit 6651ffc8e8bdd5fb4b7d1867c6cfebb4f309512c
("ipv6: Fix tcp_v6_send_response transport header setting.")
fixed one half of why ipv6 tcp response checksums were
invalid, but it's not the whole story.

If we're going to use CHECKSUM_PARTIAL for these things (which we are
since commit 2e8e18ef52e7dd1af0a3bd1f7d990a1d0b249586 "tcp: Set
CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"), we can't be setting
buff->csum as we always have been here in tcp_v6_send_response.  We
need to leave it at zero.

Kill that line and checksums are good again.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/tcp_ipv6.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 78480f4..5d2e430 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1050,8 +1050,6 @@ static void tcp_v6_send_response(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
 	}
 #endif

-	buff->csum = csum_partial(t1, tot_len, 0);
-
 	memset(&fl, 0, sizeof(fl));
 	ipv6_addr_copy(&fl.fl6_dst, &ipv6_hdr(skb)->saddr);
 	ipv6_addr_copy(&fl.fl6_src, &ipv6_hdr(skb)->daddr);
-- 
1.7.0.4

^ permalink raw reply related

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Eric Dumazet @ 2010-04-21  9:02 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Ben Greear, David Miller, Gaspar Chilingarov, netdev
In-Reply-To: <20100421082559.GA32475@ioremap.net>

Le mercredi 21 avril 2010 à 12:25 +0400, Evgeniy Polyakov a écrit :

> I believe this is a useful patch, but it addresses a different issue.
> This path should not fire up when we bind to single address.

Well, the real problem is that following sequence can happen :

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(34000), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(6, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(34002), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(7, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(7, {sa_family=AF_INET, sin_port=htons(34001), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 8
setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(8, {sa_family=AF_INET, sin_port=htons(34002), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 9
setsockopt(9, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(9, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(9, {sa_family=AF_INET, sin_port=htons(34000), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 10
setsockopt(10, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(10, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(10, {sa_family=AF_INET, sin_port=htons(34002), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 11
setsockopt(11, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(11, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(11, {sa_family=AF_INET, sin_port=htons(34001), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0


Note ports are given several times for different sockets.

So several sockets are 'bound' to same IP:port values

At connect() time, we refuse and say address is not available.



Following program to demonstrate the problem.

First time, launch it with an extra agument to setup ip aliases and ip_local_port_range




#include <stdio.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <errno.h>

int listenfd;
int on = 1;

void listener()
{
	if (fork())
		return;

	while (1) {
		struct sockaddr_in addr;
		socklen_t len = sizeof(addr);
		int fd = accept(listenfd, (struct sockaddr *)&addr, &len);
	}
}

int main(int argc, char *argv[])
{
	int i, port, total = 0;
	char cmd[128];
	struct sockaddr_in addr;
	socklen_t len;

	if (argc > 1) {
		for (i = 2; i < 8; i++) {
			sprintf(cmd, "ip addr add 127.0.0.%d/8 dev lo 2>/dev/null", i);
			system(cmd);
		}
		system("echo '34000 34002' >/proc/sys/net/ipv4/ip_local_port_range");
	}
	listenfd = socket(AF_INET, SOCK_STREAM, 0);
	setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
	memset(&addr, 0, sizeof(addr));
	addr.sin_family = AF_INET;
	addr.sin_port = htons(4444);
	if (bind(listenfd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
		perror("bind");
		return 1;
	}
	listen(listenfd, 10);
	listener();

	for (i = 2; i < 8; i++) {
		for (port = 34000; port < 34010; port++) {
			int fd = socket(AF_INET, SOCK_STREAM, 0);
			if (fd == -1) {
				fprintf(stderr, "Could not open socket, errno=%d\n", errno);
				goto end;
			}
			setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
			addr.sin_addr.s_addr = htonl(0x7f000000 + i);
//			addr.sin_port = htons(port);
			addr.sin_port = 0;
			if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
				fprintf(stderr, "Could not bind()\n");
				goto end;
			}
			len = sizeof(addr);
			getsockname(fd, (struct sockaddr *)&addr, &len);
#if 0
			addr.sin_addr.s_addr = htonl(0x7f000001);
			addr.sin_port = htons(4444);
			if ((total < 10) && (connect(fd, (struct sockaddr *)&addr, sizeof(addr)) == -1)) {
				len = sizeof(addr);
				getsockname(fd, (struct sockaddr *)&addr, &len);
				fprintf(stderr, "Could not connect()\n");
				goto end;
			}
#endif
			total++;
		}
	}
end:
	printf("i=127.0.0.%d port=%d (total=%d)\n", i, port, total);
	pause();
}



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox