Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [BUG] bonding : LOCKDEP warning
From: Eric Dumazet @ 2011-10-22  9:19 UTC (permalink / raw)
  To: David Miller; +Cc: ebiederm, netdev
In-Reply-To: <20111022.050914.551507702374659667.davem@davemloft.net>

Le samedi 22 octobre 2011 à 05:09 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 22 Oct 2011 11:05:49 +0200
> 
> > Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Patchwork says "parse error"
> 
> I'll fix it up this time, but please do not use free form
> tags like this in the future.  Thanks.

Strange, it seems quite usual these days, you're the first one to
complain. Maybe compain to Patchwork ?

    Reported-and-tested-by: Shlomo Pongratz <shlomop@mellanox.com>
    Reported-and-tested-by: Simon Kirby <sim@hostway.ca>
    Reported-and-tested-by: Amir Vadai <amirv@dev.mellanox.co.il>
    Reported-and-tested-by: Alexandre Oliva <aoliva@redhat.com>
    Reported-and-tested-by: Rocko Requin <rockorequin@hotmail.com>
    Reported-and-tested-by: Richard Cochran <richardcochran@gmail.com>
    Reported-and-tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
    Reported-and-tested-by: Hor Jiun Shyong <jiunshyong@gmail.com>
    Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
    Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
    Reported-and-tested-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
    Reported-and-tested-by: Jan Teichmann <jan.teichmann@gmail.com>
    Reported-and-tested-by: Arnaud Lacombe <lacombar@gmail.com>
    Reported-and-tested-by: Jim Bray <jimsantelmo@gmail.com>
    Reported-and-tested-by: Muhammad Khurram Khan
    Reported-and-tested-by: Matej Laitl <matej@laitl.cz>
    Reported-and-tested-by: Thomas Seilund <tps@netmaster.dk>
    Reported-and-tested-by: René Fritz <rene@colorcube.de>
    Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net>
    Reported-and-tested-by: William Light <wrl@illest.net>
    Reported-and-tested-by: Xiaotian Feng <xtfeng@gmail.com>
    Reported-and-tested-by: Dave Jones <davej@redhat.com>
    Reported-and-tested-by: Xiaotian Feng <xtfeng@gmail.com>
    Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com>
    Reported-and-tested-by: Pavel Roskin <proski@gnu.org>
    Reported-and-tested-by: Christian Casteyde <casteyde.christian@free.fr>
    Reported-and-tested-by: Sebastian Siewior <sebastian@breakpoint.cc>
...

^ permalink raw reply

* [PATCH 31/49] ISDN: irq: Remove IRQF_DISABLED
From: Yong Zhang @ 2011-10-22  9:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, Armin Schindler, Karsten Keil, netdev
In-Reply-To: <1319277421-9203-1-git-send-email-yong.zhang0@gmail.com>

Since commit [e58aa3d2: genirq: Run irq handlers with interrupts disabled],
We run all interrupt handlers with interrupts disabled
and we even check and yell when an interrupt handler
returns with interrupts enabled (see commit [b738a50a:
genirq: Warn when handler enables interrupts]).

So now this flag is a NOOP and can be removed.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
---
 drivers/isdn/hardware/eicon/divasmain.c |    2 +-
 drivers/isdn/sc/init.c                  |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/isdn/hardware/eicon/divasmain.c b/drivers/isdn/hardware/eicon/divasmain.c
index f332b60..8a0a831 100644
--- a/drivers/isdn/hardware/eicon/divasmain.c
+++ b/drivers/isdn/hardware/eicon/divasmain.c
@@ -481,7 +481,7 @@ void __inline__ outpp(void __iomem *addr, word p)
 int diva_os_register_irq(void *context, byte irq, const char *name)
 {
 	int result = request_irq(irq, diva_os_irq_wrapper,
-				 IRQF_DISABLED | IRQF_SHARED, name, context);
+				 IRQF_SHARED, name, context);
 	return (result);
 }
 
diff --git a/drivers/isdn/sc/init.c b/drivers/isdn/sc/init.c
index ca710ab..a3127fb 100644
--- a/drivers/isdn/sc/init.c
+++ b/drivers/isdn/sc/init.c
@@ -336,7 +336,7 @@ static int __init sc_init(void)
 		 */
 		sc_adapter[cinst]->interrupt = irq[b];
 		if (request_irq(sc_adapter[cinst]->interrupt, interrupt_handler,
-				IRQF_DISABLED, interface->id,
+				0, interface->id,
 				(void *)(unsigned long) cinst))
 		{
 			kfree(sc_adapter[cinst]->channel);
-- 
1.7.1

^ permalink raw reply related

* [PATCH 35/49] net: irq: Remove IRQF_DISABLED
From: Yong Zhang @ 2011-10-22  9:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, Jaroslav Kysela, Breno Leitao, Olof Johansson,
	Nicolas Pitre, Steve Glendinning, Geoff Levand, Thomas Sailer,
	Joerg Reuter, Klaus Kudielka, Jean-Paul Roubelat, Samuel Ortiz,
	Christian Lamparter, John W. Linville, David S. Miller,
	uclinux-dist-devel, netdev, cbe-oss-dev, linux-hams,
	linux-wireless
In-Reply-To: <1319277421-9203-1-git-send-email-yong.zhang0@gmail.com>

Since commit [e58aa3d2: genirq: Run irq handlers with interrupts disabled],
We run all interrupt handlers with interrupts disabled
and we even check and yell when an interrupt handler
returns with interrupts enabled (see commit [b738a50a:
genirq: Warn when handler enables interrupts]).

So now this flag is a NOOP and can be removed.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/adi/bfin_mac.c          |    4 ++--
 drivers/net/ethernet/amd/sun3lance.c         |    2 +-
 drivers/net/ethernet/broadcom/bcm63xx_enet.c |    4 ++--
 drivers/net/ethernet/dec/tulip/de4x5.c       |    2 +-
 drivers/net/ethernet/freescale/fec.c         |    2 +-
 drivers/net/ethernet/hp/hp100.c              |    2 +-
 drivers/net/ethernet/ibm/ehea/ehea_main.c    |    6 +++---
 drivers/net/ethernet/korina.c                |    8 ++++----
 drivers/net/ethernet/lantiq_etop.c           |    4 ++--
 drivers/net/ethernet/marvell/pxa168_eth.c    |    2 +-
 drivers/net/ethernet/micrel/ks8851_mll.c     |    2 +-
 drivers/net/ethernet/natsemi/jazzsonic.c     |    2 +-
 drivers/net/ethernet/natsemi/xtsonic.c       |    2 +-
 drivers/net/ethernet/pasemi/pasemi_mac.c     |    4 ++--
 drivers/net/ethernet/smsc/smc91x.h           |    2 +-
 drivers/net/ethernet/smsc/smsc9420.c         |    2 +-
 drivers/net/ethernet/ti/davinci_emac.c       |    2 +-
 drivers/net/ethernet/toshiba/ps3_gelic_net.c |    2 +-
 drivers/net/hamradio/baycom_ser_fdx.c        |    2 +-
 drivers/net/hamradio/baycom_ser_hdx.c        |    2 +-
 drivers/net/hamradio/scc.c                   |    2 +-
 drivers/net/hamradio/yam.c                   |    2 +-
 drivers/net/irda/bfin_sir.c                  |    4 ++--
 drivers/net/irda/donauboe.c                  |    4 ++--
 drivers/net/irda/sh_irda.c                   |    2 +-
 drivers/net/irda/sh_sir.c                    |    2 +-
 drivers/net/wan/hostess_sv11.c               |    2 +-
 drivers/net/wan/sealevel.c                   |    2 +-
 drivers/net/wireless/p54/p54spi.c            |    2 +-
 include/net/irda/irda_device.h               |    2 +-
 30 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/adi/bfin_mac.c b/drivers/net/ethernet/adi/bfin_mac.c
index b6d69c9..0d3804f 100644
--- a/drivers/net/ethernet/adi/bfin_mac.c
+++ b/drivers/net/ethernet/adi/bfin_mac.c
@@ -531,7 +531,7 @@ static int bfin_mac_ethtool_setwol(struct net_device *dev,
 	if (lp->wol && !lp->irq_wake_requested) {
 		/* register wake irq handler */
 		rc = request_irq(IRQ_MAC_WAKEDET, bfin_mac_wake_interrupt,
-				 IRQF_DISABLED, "EMAC_WAKE", dev);
+				 0, "EMAC_WAKE", dev);
 		if (rc)
 			return rc;
 		lp->irq_wake_requested = true;
@@ -1544,7 +1544,7 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 	/* now, enable interrupts */
 	/* register irq handler */
 	rc = request_irq(IRQ_MAC_RX, bfin_mac_interrupt,
-			IRQF_DISABLED, "EMAC_RX", ndev);
+			0, "EMAC_RX", ndev);
 	if (rc) {
 		dev_err(&pdev->dev, "Cannot request Blackfin MAC RX IRQ!\n");
 		rc = -EBUSY;
diff --git a/drivers/net/ethernet/amd/sun3lance.c b/drivers/net/ethernet/amd/sun3lance.c
index 080b71f..9aee9f7 100644
--- a/drivers/net/ethernet/amd/sun3lance.c
+++ b/drivers/net/ethernet/amd/sun3lance.c
@@ -358,7 +358,7 @@ static int __init lance_probe( struct net_device *dev)
 
 	REGA(CSR0) = CSR0_STOP;
 
-	if (request_irq(LANCE_IRQ, lance_interrupt, IRQF_DISABLED, "SUN3 Lance", dev) < 0) {
+	if (request_irq(LANCE_IRQ, lance_interrupt, 0, "SUN3 Lance", dev) < 0) {
 #ifdef CONFIG_SUN3
 		iounmap((void __iomem *)ioaddr);
 #endif
diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index a11a8ad..25d6875 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -840,13 +840,13 @@ static int bcm_enet_open(struct net_device *dev)
 	if (ret)
 		goto out_phy_disconnect;
 
-	ret = request_irq(priv->irq_rx, bcm_enet_isr_dma, IRQF_DISABLED,
+	ret = request_irq(priv->irq_rx, bcm_enet_isr_dma, 0,
 			  dev->name, dev);
 	if (ret)
 		goto out_freeirq;
 
 	ret = request_irq(priv->irq_tx, bcm_enet_isr_dma,
-			  IRQF_DISABLED, dev->name, dev);
+			  0, dev->name, dev);
 	if (ret)
 		goto out_freeirq_rx;
 
diff --git a/drivers/net/ethernet/dec/tulip/de4x5.c b/drivers/net/ethernet/dec/tulip/de4x5.c
index 871bcaa..7dd7989 100644
--- a/drivers/net/ethernet/dec/tulip/de4x5.c
+++ b/drivers/net/ethernet/dec/tulip/de4x5.c
@@ -1321,7 +1321,7 @@ de4x5_open(struct net_device *dev)
     if (request_irq(dev->irq, de4x5_interrupt, IRQF_SHARED,
 		                                     lp->adapter_name, dev)) {
 	printk("de4x5_open(): Requested IRQ%d is busy - attemping FAST/SHARE...", dev->irq);
-	if (request_irq(dev->irq, de4x5_interrupt, IRQF_DISABLED | IRQF_SHARED,
+	if (request_irq(dev->irq, de4x5_interrupt, IRQF_SHARED,
 			                             lp->adapter_name, dev)) {
 	    printk("\n              Cannot get IRQ- reconfigure your hardware.\n");
 	    disable_ast(dev);
diff --git a/drivers/net/ethernet/freescale/fec.c b/drivers/net/ethernet/freescale/fec.c
index 1124ce0..642c567 100644
--- a/drivers/net/ethernet/freescale/fec.c
+++ b/drivers/net/ethernet/freescale/fec.c
@@ -1573,7 +1573,7 @@ fec_probe(struct platform_device *pdev)
 		irq = platform_get_irq(pdev, i);
 		if (i && irq < 0)
 			break;
-		ret = request_irq(irq, fec_enet_interrupt, IRQF_DISABLED, pdev->name, ndev);
+		ret = request_irq(irq, fec_enet_interrupt, 0, pdev->name, ndev);
 		if (ret) {
 			while (--i >= 0) {
 				irq = platform_get_irq(pdev, i);
diff --git a/drivers/net/ethernet/hp/hp100.c b/drivers/net/ethernet/hp/hp100.c
index 6a5ee07..4c4f3f4 100644
--- a/drivers/net/ethernet/hp/hp100.c
+++ b/drivers/net/ethernet/hp/hp100.c
@@ -1097,7 +1097,7 @@ static int hp100_open(struct net_device *dev)
 	/* New: if bus is PCI or EISA, interrupts might be shared interrupts */
 	if (request_irq(dev->irq, hp100_interrupt,
 			lp->bus == HP100_BUS_PCI || lp->bus ==
-			HP100_BUS_EISA ? IRQF_SHARED : IRQF_DISABLED,
+			HP100_BUS_EISA ? IRQF_SHARED : 0,
 			"hp100", dev)) {
 		printk("hp100: %s: unable to get IRQ %d\n", dev->name, dev->irq);
 		return -EAGAIN;
diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index dfefe80..192a1bb 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -1348,7 +1348,7 @@ static int ehea_reg_interrupts(struct net_device *dev)
 
 	ret = ibmebus_request_irq(port->qp_eq->attr.ist1,
 				  ehea_qp_aff_irq_handler,
-				  IRQF_DISABLED, port->int_aff_name, port);
+				  0, port->int_aff_name, port);
 	if (ret) {
 		netdev_err(dev, "failed registering irq for qp_aff_irq_handler:ist=%X\n",
 			   port->qp_eq->attr.ist1);
@@ -1366,7 +1366,7 @@ static int ehea_reg_interrupts(struct net_device *dev)
 			 "%s-queue%d", dev->name, i);
 		ret = ibmebus_request_irq(pr->eq->attr.ist1,
 					  ehea_recv_irq_handler,
-					  IRQF_DISABLED, pr->int_send_name,
+					  0, pr->int_send_name,
 					  pr);
 		if (ret) {
 			netdev_err(dev, "failed registering irq for ehea_queue port_res_nr:%d, ist=%X\n",
@@ -3521,7 +3521,7 @@ static int __devinit ehea_probe_adapter(struct platform_device *dev,
 		     (unsigned long)adapter);
 
 	ret = ibmebus_request_irq(adapter->neq->attr.ist1,
-				  ehea_interrupt_neq, IRQF_DISABLED,
+				  ehea_interrupt_neq, 0,
 				  "ehea_neq", adapter);
 	if (ret) {
 		dev_err(&dev->dev, "requesting NEQ IRQ failed\n");
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index d8430f4..2a96cd5 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -1001,14 +1001,14 @@ static int korina_open(struct net_device *dev)
 	 * that handles the Done Finished
 	 * Ovr and Und Events */
 	ret = request_irq(lp->rx_irq, korina_rx_dma_interrupt,
-			IRQF_DISABLED, "Korina ethernet Rx", dev);
+			0, "Korina ethernet Rx", dev);
 	if (ret < 0) {
 		printk(KERN_ERR "%s: unable to get Rx DMA IRQ %d\n",
 		    dev->name, lp->rx_irq);
 		goto err_release;
 	}
 	ret = request_irq(lp->tx_irq, korina_tx_dma_interrupt,
-			IRQF_DISABLED, "Korina ethernet Tx", dev);
+			0, "Korina ethernet Tx", dev);
 	if (ret < 0) {
 		printk(KERN_ERR "%s: unable to get Tx DMA IRQ %d\n",
 		    dev->name, lp->tx_irq);
@@ -1017,7 +1017,7 @@ static int korina_open(struct net_device *dev)
 
 	/* Install handler for overrun error. */
 	ret = request_irq(lp->ovr_irq, korina_ovr_interrupt,
-			IRQF_DISABLED, "Ethernet Overflow", dev);
+			0, "Ethernet Overflow", dev);
 	if (ret < 0) {
 		printk(KERN_ERR "%s: unable to get OVR IRQ %d\n",
 		    dev->name, lp->ovr_irq);
@@ -1026,7 +1026,7 @@ static int korina_open(struct net_device *dev)
 
 	/* Install handler for underflow error. */
 	ret = request_irq(lp->und_irq, korina_und_interrupt,
-			IRQF_DISABLED, "Ethernet Underflow", dev);
+			0, "Ethernet Underflow", dev);
 	if (ret < 0) {
 		printk(KERN_ERR "%s: unable to get UND IRQ %d\n",
 		    dev->name, lp->und_irq);
diff --git a/drivers/net/ethernet/lantiq_etop.c b/drivers/net/ethernet/lantiq_etop.c
index 6bb2b95..d9a268c 100644
--- a/drivers/net/ethernet/lantiq_etop.c
+++ b/drivers/net/ethernet/lantiq_etop.c
@@ -280,7 +280,7 @@ ltq_etop_hw_init(struct net_device *dev)
 
 		if (IS_TX(i)) {
 			ltq_dma_alloc_tx(&ch->dma);
-			request_irq(irq, ltq_etop_dma_irq, IRQF_DISABLED,
+			request_irq(irq, ltq_etop_dma_irq, 0,
 				"etop_tx", priv);
 		} else if (IS_RX(i)) {
 			ltq_dma_alloc_rx(&ch->dma);
@@ -289,7 +289,7 @@ ltq_etop_hw_init(struct net_device *dev)
 				if (ltq_etop_alloc_skb(ch))
 					return -ENOMEM;
 			ch->dma.desc = 0;
-			request_irq(irq, ltq_etop_dma_irq, IRQF_DISABLED,
+			request_irq(irq, ltq_etop_dma_irq, 0,
 				"etop_rx", priv);
 		}
 		ch->dma.irq = irq;
diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c b/drivers/net/ethernet/marvell/pxa168_eth.c
index d17d062..4e2b30d 100644
--- a/drivers/net/ethernet/marvell/pxa168_eth.c
+++ b/drivers/net/ethernet/marvell/pxa168_eth.c
@@ -1132,7 +1132,7 @@ static int pxa168_eth_open(struct net_device *dev)
 	int err;
 
 	err = request_irq(dev->irq, pxa168_eth_int_handler,
-			  IRQF_DISABLED, dev->name, dev);
+			  0, dev->name, dev);
 	if (err) {
 		dev_printk(KERN_ERR, &dev->dev, "can't assign irq\n");
 		return -EAGAIN;
diff --git a/drivers/net/ethernet/micrel/ks8851_mll.c b/drivers/net/ethernet/micrel/ks8851_mll.c
index d19c849..a6b427b 100644
--- a/drivers/net/ethernet/micrel/ks8851_mll.c
+++ b/drivers/net/ethernet/micrel/ks8851_mll.c
@@ -899,7 +899,7 @@ static int ks_net_open(struct net_device *netdev)
 	struct ks_net *ks = netdev_priv(netdev);
 	int err;
 
-#define	KS_INT_FLAGS	(IRQF_DISABLED|IRQF_TRIGGER_LOW)
+#define	KS_INT_FLAGS	(IRQF_TRIGGER_LOW)
 	/* lock the card, even if we may not actually do anything
 	 * else at the moment.
 	 */
diff --git a/drivers/net/ethernet/natsemi/jazzsonic.c b/drivers/net/ethernet/natsemi/jazzsonic.c
index fc7c6a9..ecc3467 100644
--- a/drivers/net/ethernet/natsemi/jazzsonic.c
+++ b/drivers/net/ethernet/natsemi/jazzsonic.c
@@ -84,7 +84,7 @@ static int jazzsonic_open(struct net_device* dev)
 {
 	int retval;
 
-	retval = request_irq(dev->irq, sonic_interrupt, IRQF_DISABLED,
+	retval = request_irq(dev->irq, sonic_interrupt, 0,
 				"sonic", dev);
 	if (retval) {
 		printk(KERN_ERR "%s: unable to get IRQ %d.\n",
diff --git a/drivers/net/ethernet/natsemi/xtsonic.c b/drivers/net/ethernet/natsemi/xtsonic.c
index ccf61b9..04de013 100644
--- a/drivers/net/ethernet/natsemi/xtsonic.c
+++ b/drivers/net/ethernet/natsemi/xtsonic.c
@@ -95,7 +95,7 @@ static int xtsonic_open(struct net_device *dev)
 {
 	int retval;
 
-	retval = request_irq(dev->irq, sonic_interrupt, IRQF_DISABLED,
+	retval = request_irq(dev->irq, sonic_interrupt, 0,
 				"sonic", dev);
 	if (retval) {
 		printk(KERN_ERR "%s: unable to get IRQ %d.\n",
diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c b/drivers/net/ethernet/pasemi/pasemi_mac.c
index c6f0056..bb7c5fc 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.c
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.c
@@ -1218,7 +1218,7 @@ static int pasemi_mac_open(struct net_device *dev)
 	snprintf(mac->tx_irq_name, sizeof(mac->tx_irq_name), "%s tx",
 		 dev->name);
 
-	ret = request_irq(mac->tx->chan.irq, pasemi_mac_tx_intr, IRQF_DISABLED,
+	ret = request_irq(mac->tx->chan.irq, pasemi_mac_tx_intr, 0,
 			  mac->tx_irq_name, mac->tx);
 	if (ret) {
 		dev_err(&mac->pdev->dev, "request_irq of irq %d failed: %d\n",
@@ -1229,7 +1229,7 @@ static int pasemi_mac_open(struct net_device *dev)
 	snprintf(mac->rx_irq_name, sizeof(mac->rx_irq_name), "%s rx",
 		 dev->name);
 
-	ret = request_irq(mac->rx->chan.irq, pasemi_mac_rx_intr, IRQF_DISABLED,
+	ret = request_irq(mac->rx->chan.irq, pasemi_mac_rx_intr, 0,
 			  mac->rx_irq_name, mac->rx);
 	if (ret) {
 		dev_err(&mac->pdev->dev, "request_irq of irq %d failed: %d\n",
diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h
index 5f53fbb..e6319f5 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -271,7 +271,7 @@ static inline void mcf_outsw(void *a, unsigned char *p, int l)
 #define SMC_insw(a, r, p, l)	mcf_insw(a + r, p, l)
 #define SMC_outsw(a, r, p, l)	mcf_outsw(a + r, p, l)
 
-#define SMC_IRQ_FLAGS		(IRQF_DISABLED)
+#define SMC_IRQ_FLAGS		(0)
 
 #else
 
diff --git a/drivers/net/ethernet/smsc/smsc9420.c b/drivers/net/ethernet/smsc/smsc9420.c
index edb24b0..a13c8ce 100644
--- a/drivers/net/ethernet/smsc/smsc9420.c
+++ b/drivers/net/ethernet/smsc/smsc9420.c
@@ -1360,7 +1360,7 @@ static int smsc9420_open(struct net_device *dev)
 	smsc9420_reg_write(pd, INT_STAT, 0xFFFFFFFF);
 	smsc9420_pci_flush_write(pd);
 
-	if (request_irq(dev->irq, smsc9420_isr, IRQF_SHARED | IRQF_DISABLED,
+	if (request_irq(dev->irq, smsc9420_isr, IRQF_SHARED,
 			DRV_NAME, pd)) {
 		smsc_warn(IFUP, "Unable to use IRQ = %d", dev->irq);
 		result = -ENODEV;
diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 815c797..ba8cb9d 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1553,7 +1553,7 @@ static int emac_dev_open(struct net_device *ndev)
 
 	while ((res = platform_get_resource(priv->pdev, IORESOURCE_IRQ, k))) {
 		for (i = res->start; i <= res->end; i++) {
-			if (request_irq(i, emac_irq, IRQF_DISABLED,
+			if (request_irq(i, emac_irq, 0,
 					ndev->name, ndev))
 				goto rollback;
 		}
diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index 7bf1e20..7036cb7 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -1735,7 +1735,7 @@ static int __devinit ps3_gelic_driver_probe(struct ps3_system_bus_device *dev)
 		goto fail_alloc_irq;
 	}
 	result = request_irq(card->irq, gelic_card_interrupt,
-			     IRQF_DISABLED, netdev->name, card);
+			     0, netdev->name, card);
 
 	if (result) {
 		dev_info(ctodev(card), "%s:request_irq failed (%d)\n",
diff --git a/drivers/net/hamradio/baycom_ser_fdx.c b/drivers/net/hamradio/baycom_ser_fdx.c
index a974727..636b65c 100644
--- a/drivers/net/hamradio/baycom_ser_fdx.c
+++ b/drivers/net/hamradio/baycom_ser_fdx.c
@@ -445,7 +445,7 @@ static int ser12_open(struct net_device *dev)
 	outb(0, FCR(dev->base_addr));  /* disable FIFOs */
 	outb(0x0d, MCR(dev->base_addr));
 	outb(0, IER(dev->base_addr));
-	if (request_irq(dev->irq, ser12_interrupt, IRQF_DISABLED | IRQF_SHARED,
+	if (request_irq(dev->irq, ser12_interrupt, IRQF_SHARED,
 			"baycom_ser_fdx", dev)) {
 		release_region(dev->base_addr, SER12_EXTENT);
 		return -EBUSY;
diff --git a/drivers/net/hamradio/baycom_ser_hdx.c b/drivers/net/hamradio/baycom_ser_hdx.c
index e349d86..f9a8976 100644
--- a/drivers/net/hamradio/baycom_ser_hdx.c
+++ b/drivers/net/hamradio/baycom_ser_hdx.c
@@ -490,7 +490,7 @@ static int ser12_open(struct net_device *dev)
 	outb(0, FCR(dev->base_addr));  /* disable FIFOs */
 	outb(0x0d, MCR(dev->base_addr));
 	outb(0, IER(dev->base_addr));
-	if (request_irq(dev->irq, ser12_interrupt, IRQF_DISABLED | IRQF_SHARED,
+	if (request_irq(dev->irq, ser12_interrupt, IRQF_SHARED,
 			"baycom_ser12", dev)) {
 		release_region(dev->base_addr, SER12_EXTENT);       
 		return -EBUSY;
diff --git a/drivers/net/hamradio/scc.c b/drivers/net/hamradio/scc.c
index 3365581..f432f32 100644
--- a/drivers/net/hamradio/scc.c
+++ b/drivers/net/hamradio/scc.c
@@ -1735,7 +1735,7 @@ static int scc_net_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 			if (!Ivec[hwcfg.irq].used && hwcfg.irq)
 			{
 				if (request_irq(hwcfg.irq, scc_isr,
-						IRQF_DISABLED, "AX.25 SCC",
+						0, "AX.25 SCC",
 						(void *)(long) hwcfg.irq))
 					printk(KERN_WARNING "z8530drv: warning, cannot get IRQ %d\n", hwcfg.irq);
 				else
diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
index 96a98d2..9d60f06 100644
--- a/drivers/net/hamradio/yam.c
+++ b/drivers/net/hamradio/yam.c
@@ -890,7 +890,7 @@ static int yam_open(struct net_device *dev)
 		goto out_release_base;
 	}
 	outb(0, IER(dev->base_addr));
-	if (request_irq(dev->irq, yam_interrupt, IRQF_DISABLED | IRQF_SHARED, dev->name, dev)) {
+	if (request_irq(dev->irq, yam_interrupt, IRQF_SHARED, dev->name, dev)) {
 		printk(KERN_ERR "%s: irq %d busy\n", dev->name, dev->irq);
 		ret = -EBUSY;
 		goto out_release_base;
diff --git a/drivers/net/irda/bfin_sir.c b/drivers/net/irda/bfin_sir.c
index 9d4ce1a..529317b 100644
--- a/drivers/net/irda/bfin_sir.c
+++ b/drivers/net/irda/bfin_sir.c
@@ -410,12 +410,12 @@ static int bfin_sir_startup(struct bfin_sir_port *port, struct net_device *dev)
 
 #else
 
-	if (request_irq(port->irq, bfin_sir_rx_int, IRQF_DISABLED, "BFIN_SIR_RX", dev)) {
+	if (request_irq(port->irq, bfin_sir_rx_int, 0, "BFIN_SIR_RX", dev)) {
 		dev_warn(&dev->dev, "Unable to attach SIR RX interrupt\n");
 		return -EBUSY;
 	}
 
-	if (request_irq(port->irq+1, bfin_sir_tx_int, IRQF_DISABLED, "BFIN_SIR_TX", dev)) {
+	if (request_irq(port->irq+1, bfin_sir_tx_int, 0, "BFIN_SIR_TX", dev)) {
 		dev_warn(&dev->dev, "Unable to attach SIR TX interrupt\n");
 		free_irq(port->irq, dev);
 		return -EBUSY;
diff --git a/drivers/net/irda/donauboe.c b/drivers/net/irda/donauboe.c
index b45b2cc..04e4528 100644
--- a/drivers/net/irda/donauboe.c
+++ b/drivers/net/irda/donauboe.c
@@ -1353,7 +1353,7 @@ toshoboe_net_open (struct net_device *dev)
     return 0;
 
   rc = request_irq (self->io.irq, toshoboe_interrupt,
-                    IRQF_SHARED | IRQF_DISABLED, dev->name, self);
+                    IRQF_SHARED, dev->name, self);
   if (rc)
   	return rc;
 
@@ -1560,7 +1560,7 @@ toshoboe_open (struct pci_dev *pci_dev, const struct pci_device_id *pdid)
   self->io.fir_base = self->base;
   self->io.fir_ext = OBOE_IO_EXTENT;
   self->io.irq = pci_dev->irq;
-  self->io.irqflags = IRQF_SHARED | IRQF_DISABLED;
+  self->io.irqflags = IRQF_SHARED;
 
   self->speed = self->io.speed = 9600;
   self->async = 0;
diff --git a/drivers/net/irda/sh_irda.c b/drivers/net/irda/sh_irda.c
index d275e27..bc77767 100644
--- a/drivers/net/irda/sh_irda.c
+++ b/drivers/net/irda/sh_irda.c
@@ -809,7 +809,7 @@ static int __devinit sh_irda_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, ndev);
 
-	if (request_irq(irq, sh_irda_irq, IRQF_DISABLED, "sh_irda", self)) {
+	if (request_irq(irq, sh_irda_irq, 0, "sh_irda", self)) {
 		dev_warn(&pdev->dev, "Unable to attach sh_irda interrupt\n");
 		goto err_mem_4;
 	}
diff --git a/drivers/net/irda/sh_sir.c b/drivers/net/irda/sh_sir.c
index ed7d7d6..d5575f7 100644
--- a/drivers/net/irda/sh_sir.c
+++ b/drivers/net/irda/sh_sir.c
@@ -761,7 +761,7 @@ static int __devinit sh_sir_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, ndev);
 
-	if (request_irq(irq, sh_sir_irq, IRQF_DISABLED, "sh_sir", self)) {
+	if (request_irq(irq, sh_sir_irq, 0, "sh_sir", self)) {
 		dev_warn(&pdev->dev, "Unable to attach sh_sir interrupt\n");
 		goto err_mem_4;
 	}
diff --git a/drivers/net/wan/hostess_sv11.c b/drivers/net/wan/hostess_sv11.c
index 3d80e42..3d74166 100644
--- a/drivers/net/wan/hostess_sv11.c
+++ b/drivers/net/wan/hostess_sv11.c
@@ -220,7 +220,7 @@ static struct z8530_dev *sv11_init(int iobase, int irq)
 	/* We want a fast IRQ for this device. Actually we'd like an even faster
 	   IRQ ;) - This is one driver RtLinux is made for */
 
-	if (request_irq(irq, z8530_interrupt, IRQF_DISABLED,
+	if (request_irq(irq, z8530_interrupt, 0,
 			"Hostess SV11", sv) < 0) {
 		pr_warn("IRQ %d already in use\n", irq);
 		goto err_irq;
diff --git a/drivers/net/wan/sealevel.c b/drivers/net/wan/sealevel.c
index 0b4fd05..6027e47 100644
--- a/drivers/net/wan/sealevel.c
+++ b/drivers/net/wan/sealevel.c
@@ -266,7 +266,7 @@ static __init struct slvl_board *slvl_init(int iobase, int irq,
 	/* We want a fast IRQ for this device. Actually we'd like an even faster
 	   IRQ ;) - This is one driver RtLinux is made for */
 
-	if (request_irq(irq, z8530_interrupt, IRQF_DISABLED,
+	if (request_irq(irq, z8530_interrupt, 0,
 			"SeaLevel", dev) < 0) {
 		pr_warn("IRQ %d already in use\n", irq);
 		goto err_request_irq;
diff --git a/drivers/net/wireless/p54/p54spi.c b/drivers/net/wireless/p54/p54spi.c
index f18df82..c901e07 100644
--- a/drivers/net/wireless/p54/p54spi.c
+++ b/drivers/net/wireless/p54/p54spi.c
@@ -641,7 +641,7 @@ static int __devinit p54spi_probe(struct spi_device *spi)
 	gpio_direction_input(p54spi_gpio_irq);
 
 	ret = request_irq(gpio_to_irq(p54spi_gpio_irq),
-			  p54spi_interrupt, IRQF_DISABLED, "p54spi",
+			  p54spi_interrupt, 0, "p54spi",
 			  priv->spi);
 	if (ret < 0) {
 		dev_err(&priv->spi->dev, "request_irq() failed");
diff --git a/include/net/irda/irda_device.h b/include/net/irda/irda_device.h
index 94c852d..1141747 100644
--- a/include/net/irda/irda_device.h
+++ b/include/net/irda/irda_device.h
@@ -162,7 +162,7 @@ typedef struct {
         int irq, irq2;        /* Interrupts used */
         int dma, dma2;        /* DMA channel(s) used */
         int fifo_size;        /* FIFO size */
-        int irqflags;         /* interrupt flags (ie, IRQF_SHARED|IRQF_DISABLED) */
+        int irqflags;         /* interrupt flags (ie, IRQF_SHARED) */
 	int direction;        /* Link direction, used by some FIR drivers */
 	int enabled;          /* Powered on? */
 	int suspended;        /* Suspended by APM */
-- 
1.7.1


^ permalink raw reply related

* [PATCH net-next] tcp: avoid frag allocation for small frames
From: Eric Dumazet @ 2011-10-22 12:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

tcp_sendmsg() uses select_size() helper to choose skb head size when a
new skb must be allocated.

If GSO is enabled for the socket, current strategy is to force all
payload data to be outside of headroom, in PAGE fragments.

This strategy is not welcome for small packets, wasting memory.

Experiments show that best results are obtained when using 2048 bytes
for skb head (This includes the skb overhead and various headers)

This patch provides better len/truesize ratios for packets sent to
loopback device, and reduce memory needs for in-flight loopback packets,
particularly on arches with big pages.

If a sender sends many 1-byte packets to an unresponsive application,
receiver rmem_alloc will grow faster and will stop queuing these packets
sooner, or will collapse its receive queue to free excess memory.

netperf -t TCP_RR results are improved by ~4 %, and many workloads are
improved as well (tbench, mysql...)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 704adad..cd45b44 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -897,9 +897,12 @@ static inline int select_size(const struct sock *sk, int sg)
 	int tmp = tp->mss_cache;

 	if (sg) {
-		if (sk_can_gso(sk))
-			tmp = 0;
-		else {
+		if (sk_can_gso(sk)) {
+			/* Small frames wont use a full page:
+			 * Payload will immediately follow tcp header.
+			 */
+			tmp = SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
+		} else {
 			int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER);

 			if (tmp >= pgbreak &&

^ permalink raw reply related

* [PATCH] jme: fix irq storm after suspend/resume
From: Clemens Buchacher @ 2011-10-22 12:56 UTC (permalink / raw)
  To: Guo-Fu Tseng; +Cc: netdev, linux-kernel, Adrian Chadd, Mohammed Shafi

If the device is down during suspend/resume, interrupts are enabled
without a registered interrupt handler, causing a storm of
unhandled interrupts until the IRQ is disabled because "nobody
cared".

Instead, check that the device is up before touching it in the
suspend/resume code.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=39112

Helped-by: Adrian Chadd <adrian@freebsd.org>
Helped-by: Mohammed Shafi <shafi.wireless@gmail.com>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---

Unfortunately, bugzilla.kernel.org is still down. There is at least
one other person who reported the issue, and I don't have their
email address. So for now, I am the only one who tested this fix.

The patch applies to current tip (2efd7c0) of Linus' tree. I also
tested it based on v3.0 and it worked the same.

Many thanks to Adrian and Mohammed for helping me debug this issue.
See this thread for the history:
 http://mid.gmane.org/20110827113253.GA1444@ecki

 drivers/net/jme.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/jme.c b/drivers/net/jme.c
index 3ac262f..7a8a3b6 100644
--- a/drivers/net/jme.c
+++ b/drivers/net/jme.c
@@ -3131,6 +3131,9 @@ jme_suspend(struct device *dev)
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct jme_adapter *jme = netdev_priv(netdev);

+	if (!netif_running(netdev))
+		return 0;
+
 	atomic_dec(&jme->link_changing);

 	netif_device_detach(netdev);
@@ -3171,6 +3174,9 @@ jme_resume(struct device *dev)
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct jme_adapter *jme = netdev_priv(netdev);

+	if (!netif_running(netdev))
+		return 0;
+
 	jme_clear_pm(jme);
 	jme_phy_on(jme);
 	if (test_bit(JME_FLAG_SSET, &jme->flags))
-- 
1.7.7

^ permalink raw reply related

* Re: [PATCH] jme: fix irq storm after suspend/resume
From: Mohammed Shafi @ 2011-10-22 13:09 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Guo-Fu Tseng, netdev, linux-kernel, Adrian Chadd
In-Reply-To: <20111022125620.GA2256@ecki.lan>

On Sat, Oct 22, 2011 at 6:26 PM, Clemens Buchacher <drizzd@aon.at> wrote:
> If the device is down during suspend/resume, interrupts are enabled
> without a registered interrupt handler, causing a storm of
> unhandled interrupts until the IRQ is disabled because "nobody
> cared".

thank you so much for fixing this. incredible persistence from your side!

>
> Instead, check that the device is up before touching it in the
> suspend/resume code.
>
> Fixes https://bugzilla.kernel.org/show_bug.cgi?id=39112
>
> Helped-by: Adrian Chadd <adrian@freebsd.org>
> Helped-by: Mohammed Shafi <shafi.wireless@gmail.com>
> Signed-off-by: Clemens Buchacher <drizzd@aon.at>
> ---
>
> Unfortunately, bugzilla.kernel.org is still down. There is at least
> one other person who reported the issue, and I don't have their
> email address. So for now, I am the only one who tested this fix.
>
> The patch applies to current tip (2efd7c0) of Linus' tree. I also
> tested it based on v3.0 and it worked the same.
>
> Many thanks to Adrian and Mohammed for helping me debug this issue.
> See this thread for the history:
>  http://mid.gmane.org/20110827113253.GA1444@ecki
>
>  drivers/net/jme.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/jme.c b/drivers/net/jme.c
> index 3ac262f..7a8a3b6 100644
> --- a/drivers/net/jme.c
> +++ b/drivers/net/jme.c
> @@ -3131,6 +3131,9 @@ jme_suspend(struct device *dev)
>        struct net_device *netdev = pci_get_drvdata(pdev);
>        struct jme_adapter *jme = netdev_priv(netdev);
>
> +       if (!netif_running(netdev))
> +               return 0;
> +
>        atomic_dec(&jme->link_changing);
>
>        netif_device_detach(netdev);
> @@ -3171,6 +3174,9 @@ jme_resume(struct device *dev)
>        struct net_device *netdev = pci_get_drvdata(pdev);
>        struct jme_adapter *jme = netdev_priv(netdev);
>
> +       if (!netif_running(netdev))
> +               return 0;
> +
>        jme_clear_pm(jme);
>        jme_phy_on(jme);
>        if (test_bit(JME_FLAG_SSET, &jme->flags))
> --
> 1.7.7
>
>



-- 
shafi

^ permalink raw reply

* Re: [PATCH] tg3: fix tigon3_dma_hwbug_workaround()
From: Ari Savolainen @ 2011-10-22 13:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: RongQing Li, David Miller, richardcochran, netdev, linux-kernel
In-Reply-To: <CAEbykaVif3Mi50aDzY5p09iHC3DgtbYFQBGPAWiLa8hhyk5w8A@mail.gmail.com>

I tested the patch. It works. The panics are gone.

Thanks,
Ari

2011/10/22 Ari Savolainen <ari.m.savolainen@gmail.com>:
> I tried a similar patch earlier and got another panic with that. I was
> quite tired at that time and may have made a mistake. I'll test Eric's
> patch either later today or tomorrow.
>
> Ari
>
> 2011/10/22 Eric Dumazet <eric.dumazet@gmail.com>:
>> Ari got kernel panics using tg3 NIC, and bisected to 2669069aacc9 "tg3:
>> enable transmit time stamping."
>>
>> This is because tigon3_dma_hwbug_workaround() might alloc a new skb and
>> free the original. We panic when skb_tx_timestamp() is called on freed
>> skb.
>>
>> Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> ---
>>  drivers/net/tg3.c |    8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
>> index 4a1374d..6149dc5 100644
>> --- a/drivers/net/tg3.c
>> +++ b/drivers/net/tg3.c
>> @@ -6029,12 +6029,12 @@ static void tg3_tx_skb_unmap(struct tg3_napi *tnapi, u32 entry, int last)
>>
>>  /* Workaround 4GB and 40-bit hardware DMA bugs. */
>>  static int tigon3_dma_hwbug_workaround(struct tg3_napi *tnapi,
>> -                                      struct sk_buff *skb,
>> +                                      struct sk_buff **pskb,
>>                                       u32 *entry, u32 *budget,
>>                                       u32 base_flags, u32 mss, u32 vlan)
>>  {
>>        struct tg3 *tp = tnapi->tp;
>> -       struct sk_buff *new_skb;
>> +       struct sk_buff *new_skb, *skb = *pskb;
>>        dma_addr_t new_addr = 0;
>>        int ret = 0;
>>
>> @@ -6076,7 +6076,7 @@ static int tigon3_dma_hwbug_workaround(struct tg3_napi *tnapi,
>>        }
>>
>>        dev_kfree_skb(skb);
>> -
>> +       *pskb = new_skb;
>>        return ret;
>>  }
>>
>> @@ -6305,7 +6305,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>                 */
>>                entry = tnapi->tx_prod;
>>                budget = tg3_tx_avail(tnapi);
>> -               if (tigon3_dma_hwbug_workaround(tnapi, skb, &entry, &budget,
>> +               if (tigon3_dma_hwbug_workaround(tnapi, &skb, &entry, &budget,
>>                                                base_flags, mss, vlan))
>>                        goto out_unlock;
>>        }
>>
>>
>>
>

^ permalink raw reply

* Re: [LARTC] LARTC mailing list
From: Andrew Beverley @ 2011-10-22 14:10 UTC (permalink / raw)
  To: Linux Advanced Routing & Traffic Control project
  Cc: linux-new-lists, netfilter, netdev
In-Reply-To: <4EA2BF3D.1010107@linuxsystems.it>

On Sat, 2011-10-22 at 15:03 +0200, Niccolò Belli wrote:
> Is someone still interested to bring LARTC to a new life?
> I'm sorry for the subscription but the list didn't work anymore and it 
> was unfeasible to CC 30+ addresses. Let me know if you want to 
> unsubscribe and you don't know how to.

Thanks for doing this Niccolo. However, I'd rather see it hosted at
vger.kernel.org for all the reasons previously outlined.

I was actually going to suggest a new list called net-users instead.
There is netdev, netfilter-devel and netfilter, but not a general
networking users list.

User questions do get asked on netdev, and some non-netfilter questions
get asked on the netfilter list, but it would seem sensible to have a
general networking users list that would include LARTC questions.

Comments anyone?

Andy

^ permalink raw reply

* Re: [patch net-next V2] net: introduce ethernet teaming device
From: Jiri Pirko @ 2011-10-22 15:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, davem, bhutchings, shemminger, fubar, andy, tgraf,
	ebiederm, mirqus, kaber, greearb, jesse, fbl, benjamin.poirier,
	jzupka
In-Reply-To: <1319208237.32161.14.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

>> +
>> +/************************
>> + * Rx path frame handler
>> + ************************/
>> +
>> +/* note: already called with rcu_read_lock */
>> +static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
>> +{
>> +	struct sk_buff *skb = *pskb;
>> +	struct team_port *port;
>> +	struct team *team;
>> +	rx_handler_result_t res = RX_HANDLER_ANOTHER;
>> +
>> +	skb = skb_share_check(skb, GFP_ATOMIC);
>> +	if (!skb)
>> +		return RX_HANDLER_CONSUMED;
>> +
>> +	*pskb = skb;
>> +
>> +	port = team_port_get_rcu(skb->dev);
>> +	team = port->team;
>> +
>> +	if (team->mode_ops.receive)
>
>Hmm, you need ACCESS_ONCE() here or rcu_dereference()
>
>See commit 4d97480b1806e883eb (bonding: use local function pointer of
>bond->recv_probe in bond_handle_frame) for reference

I do not think so. Because mode_ops.receive changes only from
__team_change_mode() and this can be called only in case no ports are in
team. And team_port_del() calls synchronize_rcu().

Jirka

>
>> +		res = team->mode_ops.receive(team, port, skb);
>> +
>> +	if (res == RX_HANDLER_ANOTHER) {
>> +		struct team_pcpu_stats *pcpu_stats;
>> +
>> +		pcpu_stats = this_cpu_ptr(team->pcpu_stats);
>> +		u64_stats_update_begin(&pcpu_stats->syncp);
>> +		pcpu_stats->rx_packets++;
>> +		pcpu_stats->rx_bytes += skb->len;
>> +		if (skb->pkt_type == PACKET_MULTICAST)
>> +			pcpu_stats->rx_multicast++;
>> +		u64_stats_update_end(&pcpu_stats->syncp);
>> +
>> +		skb->dev = team->dev;
>> +	} else {
>> +		this_cpu_inc(team->pcpu_stats->rx_dropped);
>> +	}
>> +
>> +	return res;
>> +}
>> +

^ permalink raw reply

* Kernel driver r8169 not working for Realtek 8111E onboard network card
From: Adrian George Sav @ 2011-10-22 15:22 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hello.

I am having trouble with the r8169 kernel driver for Realtek 8111E network card.
The NIC works intermittently and horribly with this driver. Unusable.

Below is my config. I am happy to provide any and every other necessary information to help in solving this.

Thank you.

Motherboard: ASUS P8P67(REV 3.1)

Kernel: up to 2.6.39

lspci -vv:
------------------------------------------------------------------------------------------------------------

07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
        Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard [Realtek RTL8111E]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 66
        Region 0: I/O ports at d000 [size=256]
        Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000feeff00c  Data: 417a
        Capabilities: [70] Express (v2) Endpoint, MSI 01
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [d0] Vital Product Data
                Unknown small resource type 00, will not decode more.
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
        Kernel driver in use: r8169
        Kernel modules: r8169
------------------------------------------------------------------------------------------------------------

^ permalink raw reply

* Re: [patch net-next V2] net: introduce ethernet teaming device
From: Jiri Pirko @ 2011-10-22 15:55 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, davem, eric.dumazet, bhutchings, shemminger, andy, tgraf,
	ebiederm, mirqus, kaber, greearb, jesse, fbl, benjamin.poirier,
	jzupka
In-Reply-To: <3128.1319221621@death>

Fri, Oct 21, 2011 at 08:27:01PM CEST, fubar@us.ibm.com wrote:
>Jiri Pirko <jpirko@redhat.com> wrote:
>
>>This patch introduces new network device called team. It supposes to be
>>very fast, simple, userspace-driven alternative to existing bonding
>>driver.
>>
>>Userspace library called libteam with couple of demo apps is available
>>here:
>>https://github.com/jpirko/libteam
>>Note it's still in its dipers atm.
>>
>>team<->libteam use generic netlink for communication. That and rtnl
>>suppose to be the only way to configure team device, no sysfs etc.
>>
>>Python binding basis for libteam was recently introduced (some need
>>still need to be done on it though). Daemon providing arpmon/miimon
>>active-backup functionality will be introduced shortly.
>>All what's necessary is already implemented in kernel team driver.
>>
>>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>
>>v1->v2:
>>	- modes are made as modules. Makes team more modular and
>>	  extendable.
>>	- several commenters' nitpicks found on v1 were fixed
>>	- several other bugs were fixed.
>>	- note I ignored Eric's comment about roundrobin port selector
>>	  as Eric's way may be easily implemented as another mode (mode
>>	  "random") in future.
>>---
>
>[...]
>
>>+static int team_port_add(struct team *team, struct net_device *port_dev)
>>+{
>>+	struct net_device *dev = team->dev;
>>+	struct team_port *port;
>>+	char *portname = port_dev->name;
>>+	char tmp_addr[ETH_ALEN];
>>+	int err;
>>+
>>+	if (port_dev->flags & IFF_LOOPBACK ||
>>+	    port_dev->type != ARPHRD_ETHER) {
>>+		netdev_err(dev, "Device %s is of an unsupported type\n",
>>+			   portname);
>>+		return -EINVAL;
>>+	}
>>+
>>+	if (team_port_exists(port_dev)) {
>>+		netdev_err(dev, "Device %s is already a port "
>>+				"of a team device\n", portname);
>>+		return -EBUSY;
>>+	}
>>+
>>+	if (port_dev->flags & IFF_UP) {
>>+		netdev_err(dev, "Device %s is up. Set it down before adding it as a team port\n",
>>+			   portname);
>>+		return -EBUSY;
>>+	}
>>+
>>+	port = kzalloc(sizeof(struct team_port), GFP_KERNEL);
>>+	if (!port)
>>+		return -ENOMEM;
>>+
>>+	port->dev = port_dev;
>>+	port->team = team;
>>+
>>+	port->orig.mtu = port_dev->mtu;
>>+	err = dev_set_mtu(port_dev, dev->mtu);
>>+	if (err) {
>>+		netdev_dbg(dev, "Error %d calling dev_set_mtu\n", err);
>>+		goto err_set_mtu;
>>+	}
>>+
>>+	memcpy(port->orig.dev_addr, port_dev->dev_addr, ETH_ALEN);
>>+	random_ether_addr(tmp_addr);
>>+	err = __set_port_mac(port_dev, tmp_addr);
>>+	if (err) {
>>+		netdev_dbg(dev, "Device %s mac addr set failed\n",
>>+			   portname);
>>+		goto err_set_mac_rand;
>>+	}
>>+
>>+	err = dev_open(port_dev);
>>+	if (err) {
>>+		netdev_dbg(dev, "Device %s opening failed\n",
>>+			   portname);
>>+		goto err_dev_open;
>>+	}
>>+
>>+	err = team_port_set_orig_mac(port);
>>+	if (err) {
>>+		netdev_dbg(dev, "Device %s mac addr set failed - Device does not support addr change when it's opened\n",
>>+			   portname);
>>+		goto err_set_mac_opened;
>>+	}
>
>	This will exclude a number of devices that bonding currently
>provides at least partial support for.
>
>	Most of those are older 10 or 10/100 Ethernet drivers (anything
>that uses eth_mac_addr for its ndo_set_mac_address, I think; there look
>to be about 140 or so of those), but it also includes Infiniband (which
>is excluded explicitly elsewhere).

Team supports only ETH atm. I think it can be easily extended to support
Infiniband in future. But that's not priority now.

The "life mac change" check can be easily moved into port_enter mode
callback to let the mode decide. Then FOM active can be easily implemented
as another mode.

>
>	Another small set of Ethernet devices (those that currently need
>bonding's fail_over_mac option) do permit setting the MAC while open,
>but will misbehave if multiple ports are set to the same MAC.  The usual
>suspects here are ehea and qeth, which are partition-aware devices for
>IBM's pseries and zseries hardware, but there may be others I'm not
>familiar with.

Actually this (FOM2) is very close to the behaviour "activebackup" team
mode has now.


>
>	If these will be permanent limitations of the team driver, then
>this should (eventually) be in the documentation.
>
>	Also, from looking at the code, it's not obvious if nesting of
>teams is supported or not.  I'm not seeing anything in the code that
>would prohibit adding a team device as a port to another team.  If
>nesting of teams is undesirable, it should probably be explicitly tested
>for and disallowed.

Nesting is allowed.



>
>[...]
>
>>+static int __init team_module_init(void)
>>+{
>>+	int err;
>>+
>>+	register_netdevice_notifier(&team_notifier_block);
>>+
>>+	err = rtnl_link_register(&team_link_ops);
>>+	if (err)
>>+		goto err_rtln_reg;
>>+
>>+	err = team_nl_init();
>>+	if (err)
>>+		goto err_nl_init;
>>+
>>+	return 0;
>>+
>>+err_nl_init:
>>+	rtnl_link_unregister(&team_link_ops);
>>+
>>+err_rtln_reg:
>>+	unregister_netdevice_notifier(&team_notifier_block);
>
>	Minor nit: I suspect you meant "err_rtnl_reg" here, and in the
>goto above.

Corrected.

Thanks Jay.


Jirka

>
>	-J
>
>---
>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>

^ permalink raw reply

* Re: [PATCH] jme: fix irq storm after suspend/resume
From: Mantas M. @ 2011-10-22 16:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev
In-Reply-To: <20111022125620.GA2256@ecki.lan>

On 22/10/11 15:56, Clemens Buchacher wrote:
> If the device is down during suspend/resume, interrupts are enabled
> without a registered interrupt handler, causing a storm of
> unhandled interrupts until the IRQ is disabled because "nobody
> cared".
>
> Instead, check that the device is up before touching it in the
> suspend/resume code.
>
> Fixes https://bugzilla.kernel.org/show_bug.cgi?id=39112

Just tested the patch, suspend/resume works well.

-- 
Mantas M.

^ permalink raw reply

* Re: [patch net-next V2] net: introduce ethernet teaming device
From: Eric Dumazet @ 2011-10-22 16:51 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, bhutchings, shemminger, fubar, andy, tgraf,
	ebiederm, mirqus, kaber, greearb, jesse, fbl, benjamin.poirier,
	jzupka
In-Reply-To: <20111022151346.GA2028@minipsycho.orion>

Le samedi 22 octobre 2011 à 17:13 +0200, Jiri Pirko a écrit :
> >> +
> >> +/************************
> >> + * Rx path frame handler
> >> + ************************/
> >> +
> >> +/* note: already called with rcu_read_lock */
> >> +static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
> >> +{
> >> +	struct sk_buff *skb = *pskb;
> >> +	struct team_port *port;
> >> +	struct team *team;
> >> +	rx_handler_result_t res = RX_HANDLER_ANOTHER;
> >> +
> >> +	skb = skb_share_check(skb, GFP_ATOMIC);
> >> +	if (!skb)
> >> +		return RX_HANDLER_CONSUMED;
> >> +
> >> +	*pskb = skb;
> >> +
> >> +	port = team_port_get_rcu(skb->dev);
> >> +	team = port->team;
> >> +
> >> +	if (team->mode_ops.receive)
> >
> >Hmm, you need ACCESS_ONCE() here or rcu_dereference()
> >
> >See commit 4d97480b1806e883eb (bonding: use local function pointer of
> >bond->recv_probe in bond_handle_frame) for reference
> 
> I do not think so. Because mode_ops.receive changes only from
> __team_change_mode() and this can be called only in case no ports are in
> team. And team_port_del() calls synchronize_rcu().
> 



We are used to code following this template :

if (ops->handler)
	ops->handler(arguments);

But this is valid only because ops points to constant memory.


In your case, we really see its not true, dont try to pretend its safe.

^ permalink raw reply

* Re: Kernel driver r8169 not working for Realtek 8111E onboard network card
From: Francois Romieu @ 2011-10-22 21:44 UTC (permalink / raw)
  To: Adrian George Sav; +Cc: netdev@vger.kernel.org
In-Reply-To: <1319296941.68390.YahooMailNeo@web161905.mail.bf1.yahoo.com>

Adrian George Sav <adi_sav@yahoo.com> :
[...]
> I am having trouble with the r8169 kernel driver for Realtek 8111E network
> card. The NIC works intermittently and horribly with this driver. Unusable.
> 
> Below is my config. I am happy to provide any and every other necessary
> information to help in solving this.

Up to 2.6.39 ?

Your r8169 driver is too old. Please try current -rc and send your dmesg
including the XID line from the r8169 driver.

-- 
Ueimor

^ permalink raw reply

* Kernel Panic every 2 weeks on ISP server (NULL pointer dereference)
From: Luciano Ruete @ 2011-10-23  1:18 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: Text/Plain, Size: 1072 bytes --]

Hi,

I'm the sysadmin at a 3500 customers ISP, wich runs an iptables+tc solution 
for load balancing and QoS.

Every 2 or 3 weeks the server panics with a "NULL pointer dereference" and 
with IP at "dev_queue_xmit"

It is curious that if i disable MSI on the network card driver this panics 
seems to disapear, does this ring a bell?

The server is an IBM, previously with Broadcom NetXtreme II BCM5709 nics and 
now with Intel 82576. I change the nics thinking that maybe the bug was in 
Broadcom Driver but it seems to affect MSI in general.

The tc+iptables rules are auto-generated with sequreisp[1] an ISP solution 
that i wrote and is open sourced under AGPLv3.

Tell me if you need any further information, and plz CC because I'm not 
suscribed. 


root@server:~# uname -a
Linux server 2.6.35-30-server #60~lucid1-Ubuntu SMP Tue Sep 20 22:28:40 UTC 
2011 x86_64 GNU/Linux


[1]https://github.com/sequre/sequreisp

-- 
Luciano Ruete
Sequre - Sys Admin
Mitre 617, piso 7, of. 1 
+54 261 4254894
Mendoza - Argentina
http://www.sequre.com.ar/
http://www.sequreisp.com/

[-- Attachment #2: kern.log.txt --]
[-- Type: text/plain, Size: 12769 bytes --]

BUG: unable to handle kernel NULL pointer dereference at (null)
[694244.692704] IP: [<ffffffff814b48ea>] dev_queue_xmit+0xaa/0x5b0
[694244.763424] PGD 16f369067 PUD 16f368067 PMD 0 
[694244.817577] Oops: 0000 [#1] SMP 
[694244.857160] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
[694244.951740] CPU 3 
[694244.974623] Modules linked in: xt_mac ppp_deflate zlib_deflate bsd_comp ppp_async crc_ccitt nf_conntrack_netlink nfnetlink xt_owner ipt_REJECT ipt_REDIRECT ipt_MASQUERADE xt_helper xt_length xt_TCPMSS xt_mark xt_connmark xt_state xt_tcpudp xt_multiport iptable_mangle iptable_nat iptable_filter ip_tables x_tables sch_sfq act_mirred cls_u32 sch_prio cls_fw sch_htb ifb dummy 8021q garp stp nf_nat_irc nf_conntrack_irc nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack cdc_ether i7core_edac usbnet serio_raw edac_core tpm_tis tpm ioatdma tpm_bios lp shpchp parport mii raid10 raid456 async_pq async_xor xor async_memcpy async_r
 aid6_recov megaraid_sas raid6_pq async_tx raid1 raid0 multipath igb dca usbhid hid linear
[694245.905128] 
[694245.923881] Pid: 30, comm: events/3 Not tainted 2.6.35-30-server #60~lucid1-Ubuntu 69Y5698     /System x3650 M3 -[7945AC1]-
[694246.057920] RIP: 0010:[<ffffffff814b48ea>]  [<ffffffff814b48ea>] dev_queue_xmit+0xaa/0x5b0
[694246.157723] RSP: 0018:ffff880001e63960  EFLAGS: 00010202
[694246.222176] RAX: 0000000000002000 RBX: ffff880145b6f400 RCX: 000000009fe9dec3
[694246.308451] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff88017bd47130
[694246.394725] RBP: ffff880001e639a0 R08: ffff880145b6f400 R09: ffff88017bd47130
[694246.480998] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000000
[694246.567265] R13: ffff880118128000 R14: ffff88015c39d300 R15: ffff880001e63b00
[694246.653534] FS:  0000000000000000(0000) GS:ffff880001e60000(0000) knlGS:0000000000000000
[694246.751226] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[694246.820861] CR2: 0000000000000000 CR3: 0000000250400000 CR4: 00000000000006e0
[694246.907128] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[694246.993394] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[694247.079668] Process events/3 (pid: 30, threadinfo ffff880276efe000, task ffff880276ed44d0)
[694247.179433] Stack:
[694247.204410]  0000000000000000 ffff880118128000 ffff880001e639e0 ffff880145b6f400
[694247.291793] <0> 0000000000000000 ffff88015ce2f780 ffff880145b6f400 ffff880001e63b00
[694247.384466] <0> ffff880001e639e0 ffffffff814e9347 ffff880001e63a20 ffff880145b6f400
[694247.479257] Call Trace:
[694247.509424]  <IRQ> 
[694247.535481]  [<ffffffff814e9347>] ip_finish_output+0x237/0x310
[694247.606165]  [<ffffffff814e9738>] ip_output+0xb8/0xc0
[694247.667503]  [<ffffffff814e75d3>] ? __ip_local_out+0xa3/0xb0
[694247.736114]  [<ffffffff814e84c9>] ip_local_out+0x29/0x30
[694247.800566]  [<ffffffff814e8ce1>] ip_queue_xmit+0x191/0x3f0
[694247.868129]  [<ffffffff814fe484>] tcp_transmit_skb+0x3f4/0x700
[694247.938811]  [<ffffffff815002fd>] tcp_send_ack+0xdd/0x130
[694248.004302]  [<ffffffff814fc823>] tcp_rcv_synsent_state_process+0x5a3/0x5b0
[694248.088493]  [<ffffffff815046bf>] ? tcp_v4_inbound_md5_hash+0x7f/0x210
[694248.167486]  [<ffffffff814fcf8d>] tcp_rcv_state_process+0x7d/0x4e0
[694248.242332]  [<ffffffff815048f3>] tcp_v4_do_rcv+0xa3/0x1c0
[694248.308864]  [<ffffffff81505ab9>] tcp_v4_rcv+0x5a9/0x830
[694248.373314]  [<ffffffff814e36a0>] ? ip_local_deliver_finish+0x0/0x290
[694248.451265]  [<ffffffff814db384>] ? nf_hook_slow+0x74/0x100
[694248.518830]  [<ffffffff814e36a0>] ? ip_local_deliver_finish+0x0/0x290
[694248.596781]  [<ffffffff814e377d>] ip_local_deliver_finish+0xdd/0x290
[694248.673696]  [<ffffffff814e39b0>] ip_local_deliver+0x80/0x90
[694248.742300]  [<ffffffff814e2f29>] ip_rcv_finish+0x119/0x410
[694248.809870]  [<ffffffff814e35cd>] ip_rcv+0x23d/0x310
[694248.870167]  [<ffffffff814af233>] __netif_receive_skb+0x383/0x5c0
[694248.943960]  [<ffffffff814af57b>] process_backlog+0x10b/0x210
[694249.013603]  [<ffffffff814b04af>] net_rx_action+0x10f/0x2a0
[694249.081175]  [<ffffffff8106862d>] __do_softirq+0xbd/0x200
[694249.146672]  [<ffffffff810ca950>] ? handle_IRQ_event+0x50/0x160
[694249.218394]  [<ffffffff81068695>] ? __do_softirq+0x125/0x200
[694249.287006]  [<ffffffff8100afdc>] call_softirq+0x1c/0x30
[694249.351458]  [<ffffffff8100cab5>] do_softirq+0x65/0xa0
[694249.413833]  [<ffffffff810684e5>] irq_exit+0x85/0x90
[694249.474131]  [<ffffffff815aac85>] do_IRQ+0x75/0xf0
[694249.532352]  [<ffffffff815a3853>] ret_from_intr+0x0/0x11
[694249.596797]  <EOI> 
[694249.622854]  [<ffffffff815a3319>] ? _raw_spin_unlock_irqrestore+0x19/0x30
[694249.704961]  [<ffffffffa0107826>] ppp_asynctty_receive+0x86/0x100 [ppp_async]
[694249.791233]  [<ffffffff81360816>] flush_to_ldisc+0x1a6/0x1e0
[694249.859834]  [<ffffffff81360670>] ? flush_to_ldisc+0x0/0x1e0
[694249.928442]  [<ffffffff8107b2a5>] run_workqueue+0xc5/0x1a0
[694249.994969]  [<ffffffff8107b423>] worker_thread+0xa3/0x110
[694250.061499]  [<ffffffff810800d0>] ? autoremove_wake_function+0x0/0x40
[694250.139451]  [<ffffffff8107b380>] ? worker_thread+0x0/0x110
[694250.207014]  [<ffffffff8107fb56>] kthread+0x96/0xa0
[694250.266274]  [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10
[694250.338002]  [<ffffffff8107fac0>] ? kthread+0x0/0xa0
[694250.398296]  [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10
[694250.472081] Code: f6 49 c1 e6 07 66 89 93 ac 00 00 00 4d 03 b5 40 03 00 00 0f b7 83 a6 00 00 00 4d 8b 66 08 80 e4 cf 80 cc 20 66 89 83 a6 00 00 00 <49> 83 3c 24 00 0f 84 3b 02 00 00 49 8d 84 24 9c 00 00 00 48 89 
[694250.700622] RIP  [<ffffffff814b48ea>] dev_queue_xmit+0xaa/0x5b0
[694250.772367]  RSP <ffff880001e63960>
[694250.814999] CR2: 0000000000000000
[694250.855923] ---[ end trace 0c85e47af955446e ]---
[694250.912113] Kernel panic - not syncing: Fatal exception in interrupt
[694250.989074] Pid: 30, comm: events/3 Tainted: G      D     2.6.35-30-server #60~lucid1-Ubuntu
[694251.090974] Call Trace:
[694251.121208]  <IRQ>  [<ffffffff815a0597>] panic+0x90/0x113
[694251.154109] ------------[ cut here ]------------
[694251.154118] WARNING: at /build/buildd/linux-lts-backport-maverick-2.6.35/net/sched/sch_generic.c:258 dev_watchdog+0x25f/0x270()
[694251.154121] Hardware name: System x3650 M3 -[7945AC1]-
[694251.154123] NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out
[694251.154124] Modules linked in: xt_mac ppp_deflate zlib_deflate bsd_comp ppp_async crc_ccitt nf_conntrack_netlink nfnetlink xt_owner ipt_REJECT ipt_REDIRECT ipt_MASQUERADE xt_helper xt_length xt_TCPMSS xt_mark xt_connmark xt_state xt_tcpudp xt_multiport iptable_mangle iptable_nat iptable_filter ip_tables x_tables sch_sfq act_mirred cls_u32 sch_prio cls_fw sch_htb ifb dummy 8021q garp stp nf_nat_irc nf_conntrack_irc nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack cdc_ether i7core_edac usbnet serio_raw edac_core tpm_tis tpm ioatdma tpm_bios lp shpchp parport mii raid10 raid456 async_pq async_xor xor async_memcpy async_r
 aid6_recov megaraid_sas raid6_pq async_tx raid1 raid0 multipath igb dca usbhid hid linear
[694251.154174] Pid: 0, comm: swapper Tainted: G      D     2.6.35-30-server #60~lucid1-Ubuntu
[694251.154176] Call Trace:
[694251.154178]  <IRQ>  [<ffffffff8106159f>] warn_slowpath_common+0x7f/0xc0
[694251.154187]  [<ffffffff81061696>] warn_slowpath_fmt+0x46/0x50
[694251.154190]  [<ffffffff814cd81f>] dev_watchdog+0x25f/0x270
[694251.154200]  [<ffffffffa01adc49>] ? destroy_conntrack+0xa9/0xe0 [nf_conntrack]
[694251.154204]  [<ffffffff814db1a7>] ? nf_conntrack_destroy+0x17/0x30
[694251.154211]  [<ffffffffa01ad264>] ? death_by_timeout+0xd4/0x140 [nf_conntrack]
[694251.154214]  [<ffffffff814cd5c0>] ? dev_watchdog+0x0/0x270
[694251.154217]  [<ffffffff814cd5c0>] ? dev_watchdog+0x0/0x270
[694251.154221]  [<ffffffff81070172>] call_timer_fn+0x42/0x120
[694251.154226]  [<ffffffff8105553b>] ? scheduler_tick+0x1db/0x300
[694251.154229]  [<ffffffff814cd5c0>] ? dev_watchdog+0x0/0x270
[694251.154232]  [<ffffffff81071734>] run_timer_softirq+0x154/0x270
[694251.154236]  [<ffffffff8108a683>] ? ktime_get+0x63/0xe0
[694251.154239]  [<ffffffff8106862d>] __do_softirq+0xbd/0x200
[694251.154243]  [<ffffffff8108fa1a>] ? tick_program_event+0x2a/0x30
[694251.154247]  [<ffffffff8100afdc>] call_softirq+0x1c/0x30
[694251.154250]  [<ffffffff8100cab5>] do_softirq+0x65/0xa0
[694251.154253]  [<ffffffff810684e5>] irq_exit+0x85/0x90
[694251.154258]  [<ffffffff815aad70>] smp_apic_timer_interrupt+0x70/0x9b
[694251.154261]  [<ffffffff8100aa93>] apic_timer_interrupt+0x13/0x20
[694251.154263]  <EOI>  [<ffffffff8130ad54>] ? intel_idle+0xe4/0x180
[694251.154271]  [<ffffffff8130ad37>] ? intel_idle+0xc7/0x180
[694251.154277]  [<ffffffff81488062>] cpuidle_idle_call+0x92/0x140
[694251.154281]  [<ffffffff81008d93>] cpu_idle+0xb3/0x110
[694251.154285]  [<ffffffff8159b226>] start_secondary+0x100/0x102
[694251.154288] ---[ end trace 0c85e47af955446f ]---
[694251.154389] igb 0000:17:00.0: eth0: Reset adapter
[694251.273081] igb 0000:18:00.0: eth2: Reset adapter
[694254.499174]  [<ffffffff815a485a>] oops_end+0xea/0xf0
[694254.559522]  [<ffffffff8103e45c>] no_context+0xfc/0x190
[694254.622984]  [<ffffffffa0070155>] ? nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
[694254.713470]  [<ffffffff8103e615>] __bad_area_nosemaphore+0x125/0x1e0
[694254.790433]  [<ffffffff8103e6e3>] bad_area_nosemaphore+0x13/0x20
[694254.863244]  [<ffffffff815a711f>] do_page_fault+0x28f/0x350
[694254.930864]  [<ffffffff815a3b35>] page_fault+0x25/0x30
[694254.993287]  [<ffffffff814b48ea>] ? dev_queue_xmit+0xaa/0x5b0
[694255.062987]  [<ffffffff814e9347>] ip_finish_output+0x237/0x310
[694255.133728]  [<ffffffff814e9738>] ip_output+0xb8/0xc0
[694255.195123]  [<ffffffff814e75d3>] ? __ip_local_out+0xa3/0xb0
[694255.263784]  [<ffffffff814e84c9>] ip_local_out+0x29/0x30
[694255.328283]  [<ffffffff814e8ce1>] ip_queue_xmit+0x191/0x3f0
[694255.395910]  [<ffffffff814fe484>] tcp_transmit_skb+0x3f4/0x700
[694255.466647]  [<ffffffff815002fd>] tcp_send_ack+0xdd/0x130
[694255.532185]  [<ffffffff814fc823>] tcp_rcv_synsent_state_process+0x5a3/0x5b0
[694255.616423]  [<ffffffff815046bf>] ? tcp_v4_inbound_md5_hash+0x7f/0x210
[694255.695489]  [<ffffffff814fcf8d>] tcp_rcv_state_process+0x7d/0x4e0
[694255.770377]  [<ffffffff815048f3>] tcp_v4_do_rcv+0xa3/0x1c0
[694255.838650]  [<ffffffff81505ab9>] tcp_v4_rcv+0x5a9/0x830
[694255.903158]  [<ffffffff814e36a0>] ? ip_local_deliver_finish+0x0/0x290
[694255.981164]  [<ffffffff814db384>] ? nf_hook_slow+0x74/0x100
[694256.048778]  [<ffffffff814e36a0>] ? ip_local_deliver_finish+0x0/0x290
[694256.126783]  [<ffffffff814e377d>] ip_local_deliver_finish+0xdd/0x290
[694256.203750]  [<ffffffff814e39b0>] ip_local_deliver+0x80/0x90
[694256.272413]  [<ffffffff814e2f29>] ip_rcv_finish+0x119/0x410
[694256.340028]  [<ffffffff814e35cd>] ip_rcv+0x23d/0x310
[694256.400385]  [<ffffffff814af233>] __netif_receive_skb+0x383/0x5c0
[694256.474233]  [<ffffffff814af57b>] process_backlog+0x10b/0x210
[694256.543933]  [<ffffffff814b04af>] net_rx_action+0x10f/0x2a0
[694256.611549]  [<ffffffff8106862d>] __do_softirq+0xbd/0x200
[694256.677096]  [<ffffffff810ca950>] ? handle_IRQ_event+0x50/0x160
[694256.748871]  [<ffffffff81068695>] ? __do_softirq+0x125/0x200
[694256.817527]  [<ffffffff8100afdc>] call_softirq+0x1c/0x30
[694256.882030]  [<ffffffff8100cab5>] do_softirq+0x65/0xa0
[694256.944458]  [<ffffffff810684e5>] irq_exit+0x85/0x90
[694257.004807]  [<ffffffff815aac85>] do_IRQ+0x75/0xf0
[694257.063079]  [<ffffffff815a3853>] ret_from_intr+0x0/0x11
[694257.127578]  <EOI>  [<ffffffff815a3319>] ? _raw_spin_unlock_irqrestore+0x19/0x30
[694257.217120]  [<ffffffffa0107826>] ppp_asynctty_receive+0x86/0x100 [ppp_async]
[694257.303447]  [<ffffffff81360816>] flush_to_ldisc+0x1a6/0x1e0
[694257.372104]  [<ffffffff81360670>] ? flush_to_ldisc+0x0/0x1e0
[694257.440768]  [<ffffffff8107b2a5>] run_workqueue+0xc5/0x1a0
[694257.507355]  [<ffffffff8107b423>] worker_thread+0xa3/0x110
[694257.573940]  [<ffffffff810800d0>] ? autoremove_wake_function+0x0/0x40
[694257.651961]  [<ffffffff8107b380>] ? worker_thread+0x0/0x110
[694257.719582]  [<ffffffff8107fb56>] kthread+0x96/0xa0
[694257.778897]  [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10
[694257.850666]  [<ffffffff8107fac0>] ? kthread+0x0/0xa0
[694257.911018]  [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10
[694257.984951] Rebooting in 1 seconds..[    0.000000] Initializing cgroup subsys cpuset

^ permalink raw reply

* YOUR TRANSACTION IS STOPED
From: Account Inspection service @ 2011-10-23  3:26 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 198 bytes --]

We are sorry to inform you that you package being 
delivered to you has been seized please download the attachment for 
information on how to receive the parcel 
Best Regards, 
Timothy Gale Puckett

[-- Attachment #2: upsconfirma.doc --]
[-- Type: application/msword, Size: 39936 bytes --]

^ permalink raw reply

* Re: Kernel Panic every 2 weeks on ISP server (NULL pointer dereference)
From: Eric Dumazet @ 2011-10-23  5:16 UTC (permalink / raw)
  To: Luciano Ruete; +Cc: netdev
In-Reply-To: <201110222218.12524.lruete@sequre.com.ar>

Le samedi 22 octobre 2011 à 22:18 -0300, Luciano Ruete a écrit :
> Hi,
> 
> I'm the sysadmin at a 3500 customers ISP, wich runs an iptables+tc solution 
> for load balancing and QoS.
> 
> Every 2 or 3 weeks the server panics with a "NULL pointer dereference" and 
> with IP at "dev_queue_xmit"
> 
> It is curious that if i disable MSI on the network card driver this panics 
> seems to disapear, does this ring a bell?
> 
> The server is an IBM, previously with Broadcom NetXtreme II BCM5709 nics and 
> now with Intel 82576. I change the nics thinking that maybe the bug was in 
> Broadcom Driver but it seems to affect MSI in general.
> 
> The tc+iptables rules are auto-generated with sequreisp[1] an ISP solution 
> that i wrote and is open sourced under AGPLv3.
> 
> Tell me if you need any further information, and plz CC because I'm not 
> suscribed. 
> 
> 
> root@server:~# uname -a
> Linux server 2.6.35-30-server #60~lucid1-Ubuntu SMP Tue Sep 20 22:28:40 UTC 
> 2011 x86_64 GNU/Linux
> 
> 
> [1]https://github.com/sequre/sequreisp
> 

Hi Luciano

[694250.472081] Code: f6 
49 c1 e6 07          shl    $0x7,%r14
66 89 93 ac 00 00 00 mov    %dx,0xac(%rbx)

4d 03 b5 40 03 00 00 add    0x340(%r13),%r14   
 txq = dev_pick_tx(dev, skb);

0f b7 83 a6 00 00 00 movzwl 0xa6(%rbx),%eax
4d 8b 66 08          mov    0x8(%r14),%r12    
	   q = rcu_dereference_bh(txq->qdisc);
80 e4 cf             and    $0xcf,%ah
80 cc 20             or     $0x20,%ah

66 89 83 a6 00 00 00 mov    %ax,0xa6(%rbx)   
   skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);

<49> 83 3c 24 00     cmpq   $0x0,(%r12)       
  if (q->enqueue)   CRASH because q is NULL.

0f 84 3b 02 00 00 je      ...		
				rc = __dev_xmit_skb(skb, q, dev, txq);  
49 8d 84 24 9c 00 00 00   lea    0x9c(%r12),%rax
48 89 


This looks like a dev_pick_tx() bug, using an out of bound 
queue_index number and returning a txq pointing after
the device allocated array.



With recent kernels, this cannot happen anymore because
we added fixes in this area.

You could try Ubuntu 11.10 (based on linux 3.0) kernel
on your server, or apply following patch :

commit df32cc193ad88f7b1326b90af799c927b27f7654
Author: Tom Herbert <therbert@google.com>
Date:   Mon Nov 1 12:55:52 2010 -0700

    net: check queue_index from sock is valid for device
    
    In dev_pick_tx recompute the queue index if the value stored in the
    socket is greater than or equal to the number of real queues for the
    device.  The saved index in the sock structure is not guaranteed to
    be appropriate for the egress device (this could happen on a route
    change or in presence of tunnelling).  The result of the queue index
    being bad would be to return a bogus queue (crash could prersumably
    follow).
    
    Signed-off-by: Tom Herbert <therbert@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 35dfb83..0dd54a6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2131,7 +2131,7 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 	} else {
 		struct sock *sk = skb->sk;
 		queue_index = sk_tx_queue_get(sk);
-		if (queue_index < 0) {
+		if (queue_index < 0 || queue_index >= dev->real_num_tx_queues) {
 
 			queue_index = 0;
 			if (dev->real_num_tx_queues > 1)

^ permalink raw reply related

* Gratulujeme!
From: PAYMENT OFFICER @ 2011-10-23  5:56 UTC (permalink / raw)


Predmet: GRATULUJEME v Yahoo! MSN (Windows Live), ceny programu


 Gratulujeme!

 Sme potešení Vám oznámit, že ste vyhrali súctom deviatich
 Set tisíc amerických dolárov len. $ 900,000.00 od Yahoo!
 MSN (Windows Live) Awards program.

 Yahoo! zbiera všetky e-mailovej adresy ludí, ktorí sú aktívne
 on-line, medzi milióny ludí, ktoré upísal na Yahoo a Hotmail a
 Niekolko z iných e-mailových služieb. Šest ludí sa vyberajú mesacné
 využit túto ponuku a ste jedným z vybraných vítaza.

 PLATBA CENA a tvrdí,
 Vítazi musia byt uhradené v súlade s jeho / jej vysporiadanie
 Centre.Yahoo Cena ocenenie sa musí uplatnit najneskôr do 15 dní odo dna
 odo dna oznámenia Draw. Všetky ceny netvrdil, pocas tejto doby
 sa prepadá.

 Uvedené nižšie sú vaše identifikacné císla:
 Císlo šarže: YPA/07-43658
 Referencné císlo: 2007234522
 PIN: 1206

 Ste sa poradit sa dostat spät k nám, že budete kontaktovat so svojimi
 Vôle Officer Rev.Fr.GREG MICHAEL. svoj nárok u svojej údaje nižšie / ..

 1. Vaše meno a priezvisko


 2. Vašu adresu


 3. Svoje povolanie


 4. Vaše telefónne císlo


 5.YOUR AGE

 6.Budete platit 185eur za registráciu vašej cenu Spojených národu
 agenta penažné.


 Kontakt: Rev.Fr.GREG MICHAEL

 Email: agen004@live.com

 Priamy Mobile / +447031917686

 S úctou,
 Mrs.Carol Jones
 Online koordinátor pre
 YAHOO Ocenenie a Windows LIVE PROGRAM, stávky Medzinárodný program.

 VAROVANIE! Nehovorte ludom o vašu cenu cenu až do svoje peniaze
 úspešné odovzdanie k vám, aby sa zabránilo zákaz cinnosti, môžu
 vyplývajú z dvojitej nárok.

 Môžete tiež dostat podobné e-maily od ludí, zobrazovat sa
 iných organizácií alebo Yahoo Inc. Toto je len vyzdvihnút
 osobné údaje od vás a uplatnit nárok na výhru. V
 prípade, že budete dostávat e-mailom na podobné oznámenia list
 , Ktorý bol poslaný k vám, prosím odstránit z vášho e-mailovej schránky
 a dat žiadne
 dalšie korešpondenciu takejto osobe alebo orgánu. Yahoo! nebude
 zodpovedný za akúkolvek stratu fondu vyplývajúce z vyššie uvedeného

^ permalink raw reply

* [PATCH V3 02/10] cxgb4: Common platform specific changes for DB Drop Recovery
From: Vipul Pandya @ 2011-10-23  6:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: roland-BHEL68pLQRGGvPXPguhicg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	divy-ut6Up61K2wZBDgjK7y7TUQ, dm-ut6Up61K2wZBDgjK7y7TUQ,
	kumaras-ut6Up61K2wZBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW, Vipul Pandya

    - Add platform-specific callback functions for interrupts.  This is
    needed to do a single read-clear of the CAUSE register and then call
    out to platform specific functions for DB threshold interrupts and DB
    drop interrupts.

    - Add t4_mem_win_read_len() - mem-window reads for arbitrary lengths.
    This is used to read the CIDX/PIDX values from EC contexts during DB
    drop recovery.

    - Add t4_fwaddrspace_write() - sends addrspace write cmds to the fw.
    Needed to flush the sge eq context cache.

Signed-off-by: Vipul Pandya <vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
V2: Corrected the subject for patch.
V3: Moved function declarations from t4fw_api.h to cxgb4.h

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |    3 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |   69 +++++++++++++++++++++++----
 2 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index e18b5ad..d5f9bd7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -723,4 +723,7 @@ int t4_ofld_eq_free(struct adapter *adap, unsigned int mbox, unsigned int pf,
 int t4_handle_fw_rpl(struct adapter *adap, const __be64 *rpl);
 void t4_db_full(struct adapter *adapter);
 void t4_db_dropped(struct adapter *adapter);
+int t4_mem_win_read_len(struct adapter *adap, u32 addr, __be32 *data, int len);
+int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox,
+			 u32 addr, u32 val);
 #endif /* __CXGB4_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 13609bf..32e1dd5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -868,11 +868,14 @@ int t4_restart_aneg(struct adapter *adap, unsigned int mbox, unsigned int port)
 	return t4_wr_mbox(adap, mbox, &c, sizeof(c), NULL);
 }
 
+typedef void (*int_handler_t)(struct adapter *adap);
+
 struct intr_info {
 	unsigned int mask;       /* bits to check in interrupt status */
 	const char *msg;         /* message to print or NULL */
 	short stat_idx;          /* stat counter to increment or -1 */
 	unsigned short fatal;    /* whether the condition reported is fatal */
+	int_handler_t int_handler; /* platform-specific int handler */
 };
 
 /**
@@ -905,6 +908,8 @@ static int t4_handle_intr_status(struct adapter *adapter, unsigned int reg,
 		} else if (acts->msg && printk_ratelimit())
 			dev_warn(adapter->pdev_dev, "%s (0x%x)\n", acts->msg,
 				 status & acts->mask);
+		if (acts->int_handler)
+			acts->int_handler(adapter);
 		mask |= acts->mask;
 	}
 	status &= mask;
@@ -1013,9 +1018,9 @@ static void sge_intr_handler(struct adapter *adapter)
 		{ ERR_INVALID_CIDX_INC,
 		  "SGE GTS CIDX increment too large", -1, 0 },
 		{ ERR_CPL_OPCODE_0, "SGE received 0-length CPL", -1, 0 },
-		{ F_DBFIFO_LP_INT, NULL, -1, 0 },
-		{ F_DBFIFO_HP_INT, NULL, -1, 0 },
-		{ ERR_DROPPED_DB, "SGE doorbell dropped", -1, 0 },
+		{ F_DBFIFO_LP_INT, NULL, -1, 0, t4_db_full },
+		{ F_DBFIFO_HP_INT, NULL, -1, 0, t4_db_full },
+		{ F_ERR_DROPPED_DB, NULL, -1, 0, t4_db_dropped },
 		{ ERR_DATA_CPL_ON_HIGH_QID1 | ERR_DATA_CPL_ON_HIGH_QID0,
 		  "SGE IQID > 1023 received CPL for FL", -1, 0 },
 		{ ERR_BAD_DB_PIDX3, "SGE DBP 3 pidx increment too large", -1,
@@ -1036,20 +1041,14 @@ static void sge_intr_handler(struct adapter *adapter)
 	};
 
 	v = (u64)t4_read_reg(adapter, SGE_INT_CAUSE1) |
-	    ((u64)t4_read_reg(adapter, SGE_INT_CAUSE2) << 32);
+		((u64)t4_read_reg(adapter, SGE_INT_CAUSE2) << 32);
 	if (v) {
 		dev_alert(adapter->pdev_dev, "SGE parity error (%#llx)\n",
-			 (unsigned long long)v);
+				(unsigned long long)v);
 		t4_write_reg(adapter, SGE_INT_CAUSE1, v);
 		t4_write_reg(adapter, SGE_INT_CAUSE2, v >> 32);
 	}
 
-	err = t4_read_reg(adapter, A_SGE_INT_CAUSE3);
-	if (err & (F_DBFIFO_HP_INT|F_DBFIFO_LP_INT))
-		t4_db_full(adapter);
-	if (err & F_ERR_DROPPED_DB)
-		t4_db_dropped(adapter);
-
 	if (t4_handle_intr_status(adapter, SGE_INT_CAUSE3, sge_intr_info) ||
 	    v != 0)
 		t4_fatal_err(adapter);
@@ -1995,6 +1994,54 @@ int t4_wol_pat_enable(struct adapter *adap, unsigned int port, unsigned int map,
 	(var).retval_len16 = htonl(FW_LEN16(var)); \
 } while (0)
 
+int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox,
+			  u32 addr, u32 val)
+{
+	struct fw_ldst_cmd c;
+
+	memset(&c, 0, sizeof(c));
+	c.op_to_addrspace = htonl(V_FW_CMD_OP(FW_LDST_CMD) | F_FW_CMD_REQUEST |
+			    F_FW_CMD_WRITE |
+			    V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_FIRMWARE));
+	c.cycles_to_len16 = htonl(FW_LEN16(c));
+	c.u.addrval.addr = htonl(addr);
+	c.u.addrval.val = htonl(val);
+
+	return t4_wr_mbox(adap, mbox, &c, sizeof(c), NULL);
+}
+
+/*
+ *     t4_mem_win_read_len - read memory through PCIE memory window
+ *     @adap: the adapter
+ *     @addr: address of first byte requested aligned on 32b.
+ *     @data: len bytes to hold the data read
+ *     @len: amount of data to read from window.  Must be <=
+ *            MEMWIN0_APERATURE after adjusting for 16B alignment
+ *            requirements of the the memory window.
+ *
+ *     Read len bytes of data from MC starting at @addr.
+ */
+int t4_mem_win_read_len(struct adapter *adap, u32 addr, __be32 *data, int len)
+{
+	int i;
+	int off;
+
+	/*
+	 * Align on a 16B boundary.
+	 */
+	off = addr & 15;
+	if ((addr & 3) || (len + off) > MEMWIN0_APERTURE)
+		return -EINVAL;
+
+	t4_write_reg(adap, A_PCIE_MEM_ACCESS_OFFSET, addr & ~15);
+	t4_read_reg(adap, A_PCIE_MEM_ACCESS_OFFSET);
+
+	for (i = 0; i < len; i += 4)
+		*data++ = t4_read_reg(adap, (MEMWIN0_BASE + off + i));
+
+	return 0;
+}
+
 /**
  *	t4_mdio_rd - read a PHY register through MDIO
  *	@adap: the adapter
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH V2 03/10] cxgb4: DB Drop Recovery for RDMA and LLD queues.
From: Vipul Pandya @ 2011-10-23  6:41 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: roland-BHEL68pLQRGGvPXPguhicg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	divy-ut6Up61K2wZBDgjK7y7TUQ, dm-ut6Up61K2wZBDgjK7y7TUQ,
	kumaras-ut6Up61K2wZBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW, Vipul Pandya

    - recover LLD EQs for DB drop interrupts.  This includes adding a new
    db_lock, a spin lock disabling BH too, used by the recovery thread and
    the ring_tx_db() paths to allow db drop recovery.

    - cleaned up initial db avoidance code.

    - add read_eq_indices() - allows the LLD to use the pcie mw to efficiently
    read hw eq contexts.

    - add cxgb4_sync_txq_pidx() - called by iw_cxgb4 to sync up the sw/hw pidx
    value.

    - add flush_eq_cache() and cxgb4_flush_eq_cache().  This allows iw_cxgb4
    to flush the sge eq context cache before beginning db drop recovery.

    - add module parameter, dbfoifo_int_thresh, to allow tuning the db
    interrupt threshold value.

    - add dbfifo_int_thresh to cxgb4_lld_info so iw_cxgb4 knows the threshold.

    - add module parameter, dbfoifo_drain_delay, to allow tuning the amount
    of time delay between DB FULL and EMPTY upcalls to iw_cxgb4.

Signed-off-by: Vipul Pandya <vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
V2: Moved enum declaration from t4fw_api.h to cxgb4.h

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |   16 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  214 +++++++++++++++++++----
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h  |    4 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |   20 ++-
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h    |   53 ++++++
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |   15 ++
 6 files changed, 280 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index d5f9bd7..a56ae95 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -51,6 +51,8 @@
 #define FW_VERSION_MINOR 1
 #define FW_VERSION_MICRO 0
 
+#define CH_WARN(adap, fmt, ...) dev_warn(adap->pdev_dev, fmt, ## __VA_ARGS__)
+
 enum {
 	MAX_NPORTS = 4,     /* max # of ports */
 	SERNUM_LEN = 24,    /* Serial # length */
@@ -64,6 +66,15 @@ enum {
 	MEM_MC
 };
 
+enum {
+	MEMWIN0_APERTURE = 65536,
+	MEMWIN0_BASE     = 0x30000,
+	MEMWIN1_APERTURE = 32768,
+	MEMWIN1_BASE     = 0x28000,
+	MEMWIN2_APERTURE = 2048,
+	MEMWIN2_BASE     = 0x1b800,
+};
+
 enum dev_master {
 	MASTER_CANT,
 	MASTER_MAY,
@@ -403,6 +414,9 @@ struct sge_txq {
 	struct tx_sw_desc *sdesc;   /* address of SW Tx descriptor ring */
 	struct sge_qstat *stat;     /* queue status entry */
 	dma_addr_t    phys_addr;    /* physical address of the ring */
+	spinlock_t db_lock;
+	int db_disabled;
+	unsigned short db_pidx;
 };
 
 struct sge_eth_txq {                /* state for an SGE Ethernet Tx queue */
@@ -475,6 +489,7 @@ struct adapter {
 	void __iomem *regs;
 	struct pci_dev *pdev;
 	struct device *pdev_dev;
+	unsigned int mbox;
 	unsigned int fn;
 	unsigned int flags;
 
@@ -607,6 +622,7 @@ irqreturn_t t4_sge_intr_msix(int irq, void *cookie);
 void t4_sge_init(struct adapter *adap);
 void t4_sge_start(struct adapter *adap);
 void t4_sge_stop(struct adapter *adap);
+extern int dbfifo_int_thresh;
 
 #define for_each_port(adapter, iter) \
 	for (iter = 0; iter < (adapter)->params.nports; ++iter)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 870c320..64ad1c8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -149,15 +149,6 @@ static unsigned int pfvfres_pmask(struct adapter *adapter,
 #endif
 
 enum {
-	MEMWIN0_APERTURE = 65536,
-	MEMWIN0_BASE     = 0x30000,
-	MEMWIN1_APERTURE = 32768,
-	MEMWIN1_BASE     = 0x28000,
-	MEMWIN2_APERTURE = 2048,
-	MEMWIN2_BASE     = 0x1b800,
-};
-
-enum {
 	MAX_TXQ_ENTRIES      = 16384,
 	MAX_CTRL_TXQ_ENTRIES = 1024,
 	MAX_RSPQ_ENTRIES     = 16384,
@@ -369,6 +360,15 @@ static int set_addr_filters(const struct net_device *dev, bool sleep)
 				uhash | mhash, sleep);
 }
 
+int dbfifo_int_thresh = 10; /* 10 == 640 entry threshold */
+module_param(dbfifo_int_thresh, int, 0644);
+MODULE_PARM_DESC(dbfifo_int_thresh, "doorbell fifo interrupt threshold");
+
+int dbfifo_drain_delay = 1000; /* usecs to sleep while draining the dbfifo */
+module_param(dbfifo_drain_delay, int, 0644);
+MODULE_PARM_DESC(dbfifo_drain_delay,
+		 "usecs to sleep while draining the dbfifo");
+
 /*
  * Set Rx properties of a port, such as promiscruity, address filters, and MTU.
  * If @mtu is -1 it is left unchanged.
@@ -387,6 +387,8 @@ static int set_rxmode(struct net_device *dev, int mtu, bool sleep_ok)
 	return ret;
 }
 
+static struct workqueue_struct *workq;
+
 /**
  *	link_start - enable a port
  *	@dev: the port to enable
@@ -2201,7 +2203,7 @@ static void cxgb4_queue_tid_release(struct tid_info *t, unsigned int chan,
 	adap->tid_release_head = (void **)((uintptr_t)p | chan);
 	if (!adap->tid_release_task_busy) {
 		adap->tid_release_task_busy = true;
-		schedule_work(&adap->tid_release_task);
+		queue_work(workq, &adap->tid_release_task);
 	}
 	spin_unlock_bh(&adap->tid_release_lock);
 }
@@ -2428,6 +2430,59 @@ void cxgb4_iscsi_init(struct net_device *dev, unsigned int tag_mask,
 }
 EXPORT_SYMBOL(cxgb4_iscsi_init);
 
+int cxgb4_flush_eq_cache(struct net_device *dev)
+{
+	struct adapter *adap = netdev2adap(dev);
+	int ret;
+
+	ret = t4_fwaddrspace_write(adap, adap->mbox,
+				   0xe1000000 + A_SGE_CTXT_CMD, 0x20000000);
+	return ret;
+}
+EXPORT_SYMBOL(cxgb4_flush_eq_cache);
+
+static int read_eq_indices(struct adapter *adap, u16 qid, u16 *pidx, u16 *cidx)
+{
+	u32 addr = t4_read_reg(adap, A_SGE_DBQ_CTXT_BADDR) + 24 * qid + 8;
+	__be64 indices;
+	int ret;
+
+	ret = t4_mem_win_read_len(adap, addr, (__be32 *)&indices, 8);
+	if (!ret) {
+		indices = be64_to_cpu(indices);
+		*cidx = (indices >> 25) & 0xffff;
+		*pidx = (indices >> 9) & 0xffff;
+	}
+	return ret;
+}
+
+int cxgb4_sync_txq_pidx(struct net_device *dev, u16 qid, u16 pidx,
+			u16 size)
+{
+	struct adapter *adap = netdev2adap(dev);
+	u16 hw_pidx, hw_cidx;
+	int ret;
+
+	ret = read_eq_indices(adap, qid, &hw_pidx, &hw_cidx);
+	if (ret)
+		goto out;
+
+	if (pidx != hw_pidx) {
+		u16 delta;
+
+		if (pidx >= hw_pidx)
+			delta = pidx - hw_pidx;
+		else
+			delta = size - hw_pidx + pidx;
+		wmb();
+		t4_write_reg(adap, MYPF_REG(A_SGE_PF_KDOORBELL),
+			     V_QID(qid) | V_PIDX(delta));
+	}
+out:
+	return ret;
+}
+EXPORT_SYMBOL(cxgb4_sync_txq_pidx);
+
 static struct pci_driver cxgb4_driver;
 
 static void check_neigh_update(struct neighbour *neigh)
@@ -2461,6 +2516,95 @@ static struct notifier_block cxgb4_netevent_nb = {
 	.notifier_call = netevent_cb
 };
 
+static void drain_db_fifo(struct adapter *adap, int usecs)
+{
+	u32 v;
+
+	do {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		schedule_timeout(usecs_to_jiffies(usecs));
+		v = t4_read_reg(adap, A_SGE_DBFIFO_STATUS);
+		if (G_LP_COUNT(v) == 0 && G_HP_COUNT(v) == 0)
+			break;
+	} while (1);
+}
+
+static void disable_txq_db(struct sge_txq *q)
+{
+	spin_lock_irq(&q->db_lock);
+	q->db_disabled = 1;
+	spin_unlock_irq(&q->db_lock);
+}
+
+static void enable_txq_db(struct sge_txq *q)
+{
+	spin_lock_irq(&q->db_lock);
+	q->db_disabled = 0;
+	spin_unlock_irq(&q->db_lock);
+}
+
+static void disable_dbs(struct adapter *adap)
+{
+	int i;
+
+	for_each_ethrxq(&adap->sge, i)
+		disable_txq_db(&adap->sge.ethtxq[i].q);
+	for_each_ofldrxq(&adap->sge, i)
+		disable_txq_db(&adap->sge.ofldtxq[i].q);
+	for_each_port(adap, i)
+		disable_txq_db(&adap->sge.ctrlq[i].q);
+}
+
+static void enable_dbs(struct adapter *adap)
+{
+	int i;
+
+	for_each_ethrxq(&adap->sge, i)
+		enable_txq_db(&adap->sge.ethtxq[i].q);
+	for_each_ofldrxq(&adap->sge, i)
+		enable_txq_db(&adap->sge.ofldtxq[i].q);
+	for_each_port(adap, i)
+		enable_txq_db(&adap->sge.ctrlq[i].q);
+}
+
+static void sync_txq_pidx(struct adapter *adap, struct sge_txq *q)
+{
+	u16 hw_pidx, hw_cidx;
+	int ret;
+
+	spin_lock_bh(&q->db_lock);
+	ret = read_eq_indices(adap, (u16)q->cntxt_id, &hw_pidx, &hw_cidx);
+	if (ret)
+		goto out;
+	if (q->db_pidx != hw_pidx) {
+		u16 delta;
+
+		if (q->db_pidx >= hw_pidx)
+			delta = q->db_pidx - hw_pidx;
+		else
+			delta = q->size - hw_pidx + q->db_pidx;
+		wmb();
+		t4_write_reg(adap, MYPF_REG(A_SGE_PF_KDOORBELL),
+				V_QID(q->cntxt_id) | V_PIDX(delta));
+	}
+out:
+	q->db_disabled = 0;
+	spin_unlock_bh(&q->db_lock);
+	if (ret)
+		CH_WARN(adap, "DB drop recovery failed.\n");
+}
+static void recover_all_queues(struct adapter *adap)
+{
+	int i;
+
+	for_each_ethrxq(&adap->sge, i)
+		sync_txq_pidx(adap, &adap->sge.ethtxq[i].q);
+	for_each_ofldrxq(&adap->sge, i)
+		sync_txq_pidx(adap, &adap->sge.ofldtxq[i].q);
+	for_each_port(adap, i)
+		sync_txq_pidx(adap, &adap->sge.ctrlq[i].q);
+}
+
 static void notify_rdma_uld(struct adapter *adap, enum cxgb4_control cmd)
 {
 	mutex_lock(&uld_mutex);
@@ -2473,55 +2617,41 @@ static void notify_rdma_uld(struct adapter *adap, enum cxgb4_control cmd)
 static void process_db_full(struct work_struct *work)
 {
 	struct adapter *adap;
-	static int delay = 1000;
-	u32 v;
 
 	adap = container_of(work, struct adapter, db_full_task);
 
-
-	/* stop LLD queues */
-
 	notify_rdma_uld(adap, CXGB4_CONTROL_DB_FULL);
-	do {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(usecs_to_jiffies(delay));
-		v = t4_read_reg(adap, A_SGE_DBFIFO_STATUS);
-		if (G_LP_COUNT(v) == 0 && G_HP_COUNT(v) == 0)
-			break;
-	} while (1);
+	drain_db_fifo(adap, dbfifo_drain_delay);
+	t4_set_reg_field(adap, A_SGE_INT_ENABLE3,
+			F_DBFIFO_HP_INT | F_DBFIFO_LP_INT,
+			F_DBFIFO_HP_INT | F_DBFIFO_LP_INT);
 	notify_rdma_uld(adap, CXGB4_CONTROL_DB_EMPTY);
-
-
-	/*
-	 * The more we get db full interrupts, the more we'll delay
-	 * in re-enabling db rings on queues, capped off at 200ms.
-	 */
-	delay = min(delay << 1, 200000);
-
-	/* resume LLD queues */
 }
 
 static void process_db_drop(struct work_struct *work)
 {
 	struct adapter *adap;
-	adap = container_of(work, struct adapter, db_drop_task);
 
+	adap = container_of(work, struct adapter, db_drop_task);
 
-	/*
-	 * sync the PIDX values in HW and SW for LLD queues.
-	 */
-
+	t4_set_reg_field(adap, A_SGE_DOORBELL_CONTROL, F_DROPPED_DB, 0);
+	disable_dbs(adap);
 	notify_rdma_uld(adap, CXGB4_CONTROL_DB_DROP);
+	drain_db_fifo(adap, 1);
+	recover_all_queues(adap);
+	enable_dbs(adap);
 }
 
 void t4_db_full(struct adapter *adap)
 {
-	schedule_work(&adap->db_full_task);
+	t4_set_reg_field(adap, A_SGE_INT_ENABLE3,
+			F_DBFIFO_HP_INT | F_DBFIFO_LP_INT, 0);
+	queue_work(workq, &adap->db_full_task);
 }
 
 void t4_db_dropped(struct adapter *adap)
 {
-	schedule_work(&adap->db_drop_task);
+	queue_work(workq, &adap->db_drop_task);
 }
 
 static void uld_attach(struct adapter *adap, unsigned int uld)
@@ -2557,6 +2687,7 @@ static void uld_attach(struct adapter *adap, unsigned int uld)
 	lli.gts_reg = adap->regs + MYPF_REG(SGE_PF_GTS);
 	lli.db_reg = adap->regs + MYPF_REG(SGE_PF_KDOORBELL);
 	lli.fw_vers = adap->params.fw_vers;
+	lli.dbfifo_int_thresh = dbfifo_int_thresh;
 
 	handle = ulds[uld].add(&lli);
 	if (IS_ERR(handle)) {
@@ -3673,6 +3804,7 @@ static int __devinit init_one(struct pci_dev *pdev,
 
 	adapter->pdev = pdev;
 	adapter->pdev_dev = &pdev->dev;
+	adapter->mbox = func;
 	adapter->fn = func;
 	adapter->msg_enable = dflt_msg_enable;
 	memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
@@ -3868,6 +4000,10 @@ static int __init cxgb4_init_module(void)
 {
 	int ret;
 
+	workq = create_singlethread_workqueue("cxgb4");
+	if (!workq)
+		return -ENOMEM;
+
 	/* Debugfs support is optional, just warn if this fails */
 	cxgb4_debugfs_root = debugfs_create_dir(KBUILD_MODNAME, NULL);
 	if (!cxgb4_debugfs_root)
@@ -3883,6 +4019,8 @@ static void __exit cxgb4_cleanup_module(void)
 {
 	pci_unregister_driver(&cxgb4_driver);
 	debugfs_remove(cxgb4_debugfs_root);  /* NULL ok */
+	flush_workqueue(workq);
+	destroy_workqueue(workq);
 }
 
 module_init(cxgb4_init_module);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 5cc2f27..d79980c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -218,6 +218,7 @@ struct cxgb4_lld_info {
 	unsigned short ucq_density;          /* # of user CQs/page */
 	void __iomem *gts_reg;               /* address of GTS register */
 	void __iomem *db_reg;                /* address of kernel doorbell */
+	int dbfifo_int_thresh;		     /* doorbell fifo int threshold */
 };
 
 struct cxgb4_uld_info {
@@ -226,6 +227,7 @@ struct cxgb4_uld_info {
 	int (*rx_handler)(void *handle, const __be64 *rsp,
 			  const struct pkt_gl *gl);
 	int (*state_change)(void *handle, enum cxgb4_state new_state);
+	int (*control)(void *handle, enum cxgb4_control control, ...);
 };
 
 int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
@@ -243,4 +245,6 @@ void cxgb4_iscsi_init(struct net_device *dev, unsigned int tag_mask,
 		      const unsigned int *pgsz_order);
 struct sk_buff *cxgb4_pktgl_to_skb(const struct pkt_gl *gl,
 				   unsigned int skb_len, unsigned int pull_len);
+int cxgb4_sync_txq_pidx(struct net_device *dev, u16 qid, u16 pidx, u16 size);
+int cxgb4_flush_eq_cache(struct net_device *dev);
 #endif  /* !__CXGB4_OFLD_H */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 3631fbb..65ecf1e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -767,8 +767,13 @@ static void write_sgl(const struct sk_buff *skb, struct sge_txq *q,
 static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
 {
 	wmb();            /* write descriptors before telling HW */
-	t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
-		     QID(q->cntxt_id) | PIDX(n));
+	spin_lock(&q->db_lock);
+	if (!q->db_disabled) {
+		t4_write_reg(adap, MYPF_REG(A_SGE_PF_KDOORBELL),
+			     V_QID(q->cntxt_id) | V_PIDX(n));
+	}
+	q->db_pidx = q->pidx;
+	spin_unlock(&q->db_lock);
 }
 
 /**
@@ -2080,6 +2085,7 @@ static void init_txq(struct adapter *adap, struct sge_txq *q, unsigned int id)
 	q->stops = q->restarts = 0;
 	q->stat = (void *)&q->desc[q->size];
 	q->cntxt_id = id;
+	spin_lock_init(&q->db_lock);
 	adap->sge.egr_map[id - adap->sge.egr_start] = q;
 }
 
@@ -2414,9 +2420,15 @@ void t4_sge_init(struct adapter *adap)
 			 RXPKTCPLMODE |
 			 (STAT_LEN == 128 ? EGRSTATUSPAGESIZE : 0));
 
+	/*
+	 * Set up to drop DOORBELL writes when the DOORBELL FIFO overflows
+	 * and generate an interrupt when this occurs so we can recover.
+	 */
 	t4_set_reg_field(adap, A_SGE_DBFIFO_STATUS,
-			V_HP_INT_THRESH(5) | V_LP_INT_THRESH(5),
-			V_HP_INT_THRESH(5) | V_LP_INT_THRESH(5));
+			V_HP_INT_THRESH(M_HP_INT_THRESH) |
+			V_LP_INT_THRESH(M_LP_INT_THRESH),
+			V_HP_INT_THRESH(dbfifo_int_thresh) |
+			V_LP_INT_THRESH(dbfifo_int_thresh));
 	t4_set_reg_field(adap, A_SGE_DOORBELL_CONTROL, F_ENABLE_DROP,
 			F_ENABLE_DROP);
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 0adc5bc..111fc32 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -190,6 +190,59 @@
 #define SGE_DEBUG_DATA_LOW 0x10d4
 #define SGE_INGRESS_QUEUES_PER_PAGE_PF 0x10f4
 
+#define S_LP_INT_THRESH    12
+#define V_LP_INT_THRESH(x) ((x) << S_LP_INT_THRESH)
+#define S_HP_INT_THRESH    28
+#define V_HP_INT_THRESH(x) ((x) << S_HP_INT_THRESH)
+#define A_SGE_DBFIFO_STATUS 0x10a4
+
+#define S_ENABLE_DROP    13
+#define V_ENABLE_DROP(x) ((x) << S_ENABLE_DROP)
+#define F_ENABLE_DROP    V_ENABLE_DROP(1U)
+#define A_SGE_DOORBELL_CONTROL 0x10a8
+
+#define A_SGE_CTXT_CMD 0x11fc
+#define A_SGE_DBQ_CTXT_BADDR 0x1084
+
+#define A_SGE_PF_KDOORBELL 0x0
+
+#define S_QID 15
+#define V_QID(x) ((x) << S_QID)
+
+#define S_PIDX 0
+#define V_PIDX(x) ((x) << S_PIDX)
+
+#define M_LP_COUNT 0x7ffU
+#define S_LP_COUNT 0
+#define G_LP_COUNT(x) (((x) >> S_LP_COUNT) & M_LP_COUNT)
+
+#define M_HP_COUNT 0x7ffU
+#define S_HP_COUNT 16
+#define G_HP_COUNT(x) (((x) >> S_HP_COUNT) & M_HP_COUNT)
+
+#define A_SGE_INT_ENABLE3 0x1040
+
+#define S_DBFIFO_HP_INT 8
+#define V_DBFIFO_HP_INT(x) ((x) << S_DBFIFO_HP_INT)
+#define F_DBFIFO_HP_INT V_DBFIFO_HP_INT(1U)
+
+#define S_DBFIFO_LP_INT 7
+#define V_DBFIFO_LP_INT(x) ((x) << S_DBFIFO_LP_INT)
+#define F_DBFIFO_LP_INT V_DBFIFO_LP_INT(1U)
+
+#define S_DROPPED_DB 0
+#define V_DROPPED_DB(x) ((x) << S_DROPPED_DB)
+#define F_DROPPED_DB V_DROPPED_DB(1U)
+
+#define S_ERR_DROPPED_DB 18
+#define V_ERR_DROPPED_DB(x) ((x) << S_ERR_DROPPED_DB)
+#define F_ERR_DROPPED_DB V_ERR_DROPPED_DB(1U)
+
+#define A_PCIE_MEM_ACCESS_OFFSET 0x306c
+
+#define M_HP_INT_THRESH 0xfU
+#define M_LP_INT_THRESH 0xfU
+
 #define PCIE_PF_CLI 0x44
 #define PCIE_INT_CAUSE 0x3004
 #define  UNXSPLCPLERR  0x20000000U
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index edcfd7e..ad53f79 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -1620,4 +1620,19 @@ struct fw_hdr {
 #define FW_HDR_FW_VER_MINOR_GET(x) (((x) >> 16) & 0xff)
 #define FW_HDR_FW_VER_MICRO_GET(x) (((x) >> 8) & 0xff)
 #define FW_HDR_FW_VER_BUILD_GET(x) (((x) >> 0) & 0xff)
+
+#define S_FW_CMD_OP 24
+#define V_FW_CMD_OP(x) ((x) << S_FW_CMD_OP)
+
+#define S_FW_CMD_REQUEST 23
+#define V_FW_CMD_REQUEST(x) ((x) << S_FW_CMD_REQUEST)
+#define F_FW_CMD_REQUEST V_FW_CMD_REQUEST(1U)
+
+#define S_FW_CMD_WRITE 21
+#define V_FW_CMD_WRITE(x) ((x) << S_FW_CMD_WRITE)
+#define F_FW_CMD_WRITE V_FW_CMD_WRITE(1U)
+
+#define S_FW_LDST_CMD_ADDRSPACE 0
+#define V_FW_LDST_CMD_ADDRSPACE(x) ((x) << S_FW_LDST_CMD_ADDRSPACE)
+
 #endif /* _T4FW_INTERFACE_H_ */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH] ipv4: fix ipsec forward performance regression
From: Yan, Zheng @ 2011-10-23  7:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org, davem@davemloft.net,
	eric.dumazet@gmail.com, Kim Phillips

There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 net/ipv4/xfrm4_policy.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index fc5368a..a0b4c5d 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 	struct rtable *rt = (struct rtable *)xdst->route;
 	const struct flowi4 *fl4 = &fl->u.ip4;
 
-	rt->rt_key_dst = fl4->daddr;
-	rt->rt_key_src = fl4->saddr;
-	rt->rt_key_tos = fl4->flowi4_tos;
-	rt->rt_route_iif = fl4->flowi4_iif;
-	rt->rt_iif = fl4->flowi4_iif;
-	rt->rt_oif = fl4->flowi4_oif;
-	rt->rt_mark = fl4->flowi4_mark;
+	xdst->u.rt.rt_key_dst = fl4->daddr;
+	xdst->u.rt.rt_key_src = fl4->saddr;
+	xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
+	xdst->u.rt.rt_route_iif = fl4->flowi4_iif;
+	xdst->u.rt.rt_iif = fl4->flowi4_iif;
+	xdst->u.rt.rt_oif = fl4->flowi4_oif;
+	xdst->u.rt.rt_mark = fl4->flowi4_mark;
 
 	xdst->u.dst.dev = dev;
 	dev_hold(dev);
-- 
1.7.4.4

^ permalink raw reply related

* Re: [patch net-next V2] net: introduce ethernet teaming device
From: Jiri Pirko @ 2011-10-23  8:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, davem, bhutchings, shemminger, fubar, andy, tgraf,
	ebiederm, mirqus, kaber, greearb, jesse, fbl, benjamin.poirier,
	jzupka
In-Reply-To: <1319302282.6180.60.camel@edumazet-laptop>

Sat, Oct 22, 2011 at 06:51:22PM CEST, eric.dumazet@gmail.com wrote:
>Le samedi 22 octobre 2011 à 17:13 +0200, Jiri Pirko a écrit :
>> >> +
>> >> +/************************
>> >> + * Rx path frame handler
>> >> + ************************/
>> >> +
>> >> +/* note: already called with rcu_read_lock */
>> >> +static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
>> >> +{
>> >> +	struct sk_buff *skb = *pskb;
>> >> +	struct team_port *port;
>> >> +	struct team *team;
>> >> +	rx_handler_result_t res = RX_HANDLER_ANOTHER;
>> >> +
>> >> +	skb = skb_share_check(skb, GFP_ATOMIC);
>> >> +	if (!skb)
>> >> +		return RX_HANDLER_CONSUMED;
>> >> +
>> >> +	*pskb = skb;
>> >> +
>> >> +	port = team_port_get_rcu(skb->dev);
>> >> +	team = port->team;
>> >> +
>> >> +	if (team->mode_ops.receive)
>> >
>> >Hmm, you need ACCESS_ONCE() here or rcu_dereference()
>> >
>> >See commit 4d97480b1806e883eb (bonding: use local function pointer of
>> >bond->recv_probe in bond_handle_frame) for reference
>> 
>> I do not think so. Because mode_ops.receive changes only from
>> __team_change_mode() and this can be called only in case no ports are in
>> team. And team_port_del() calls synchronize_rcu().
>> 
>
>
>
>We are used to code following this template :
>
>if (ops->handler)
>	ops->handler(arguments);
>
>But this is valid only because ops points to constant memory.
>
>
>In your case, we really see its not true, dont try to pretend its safe.

Please forgive me, it's possible I'm missing something. But I see no way how
team->mode_ops.receive can change during team_handle_frame (holding
rcu_read_lock) for the reason I stated earlier.

team_port_del() calls netdev_rx_handler_unregister() and after that it
calls synchronize_rcu(). That ensures that by the finish of team_port_del()
run, team_handle_frame() is not called for this port anymore.

And this combined with "if (!list_empty(&team->port_list))" check in
team_change_mode() ensures safety.

Of course team_port_del() and team_change_mode() are both protected by
team->lock so they are mutually excluded.

Jirka.

>
>
>

^ permalink raw reply

* [patch net-next V3] net: introduce ethernet teaming device
From: Jiri Pirko @ 2011-10-23  8:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, bhutchings, shemminger, fubar, andy, tgraf,
	ebiederm, mirqus, kaber, greearb, jesse, fbl, benjamin.poirier,
	jzupka

This patch introduces new network device called team. It supposes to be
very fast, simple, userspace-driven alternative to existing bonding
driver.

Userspace library called libteam with couple of demo apps is available
here:
https://github.com/jpirko/libteam
Note it's still in its dipers atm.

team<->libteam use generic netlink for communication. That and rtnl
suppose to be the only way to configure team device, no sysfs etc.

Python binding basis for libteam was recently introduced (some need
still need to be done on it though). Daemon providing arpmon/miimon
active-backup functionality will be introduced shortly.
All what's necessary is already implemented in kernel team driver.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

v2->v3:
	- team_change_mtu() user rcu version of list traversal to unwind
	- set and clear of mode_ops happens per pointer, not per byte
	- port hashlist changed to be embedded into team structure
	- error branch in team_port_enter() does cleanup now
	- fixed rtln->rtnl

v1->v2:
	- modes are made as modules. Makes team more modular and
	  extendable.
	- several commenters' nitpicks found on v1 were fixed
	- several other bugs were fixed.
	- note I ignored Eric's comment about roundrobin port selector
	  as Eric's way may be easily implemented as another mode (mode
	  "random") in future.
---
 Documentation/networking/team.txt         |    2 +
 MAINTAINERS                               |    7 +
 drivers/net/Kconfig                       |    2 +
 drivers/net/Makefile                      |    1 +
 drivers/net/team/Kconfig                  |   38 +
 drivers/net/team/Makefile                 |    7 +
 drivers/net/team/team.c                   | 1574 +++++++++++++++++++++++++++++
 drivers/net/team/team_mode_activebackup.c |  152 +++
 drivers/net/team/team_mode_roundrobin.c   |  107 ++
 include/linux/Kbuild                      |    1 +
 include/linux/if.h                        |    1 +
 include/linux/if_team.h                   |  254 +++++
 include/linux/rculist.h                   |   14 +
 13 files changed, 2160 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/networking/team.txt
 create mode 100644 drivers/net/team/Kconfig
 create mode 100644 drivers/net/team/Makefile
 create mode 100644 drivers/net/team/team.c
 create mode 100644 drivers/net/team/team_mode_activebackup.c
 create mode 100644 drivers/net/team/team_mode_roundrobin.c
 create mode 100644 include/linux/if_team.h

diff --git a/Documentation/networking/team.txt b/Documentation/networking/team.txt
new file mode 100644
index 0000000..5a01368
--- /dev/null
+++ b/Documentation/networking/team.txt
@@ -0,0 +1,2 @@
+Team devices are driven from userspace via libteam library which is here:
+	https://github.com/jpirko/libteam
diff --git a/MAINTAINERS b/MAINTAINERS
index 5008b08..c33400d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6372,6 +6372,13 @@ W:	http://tcp-lp-mod.sourceforge.net/
 S:	Maintained
 F:	net/ipv4/tcp_lp.c
 
+TEAM DRIVER
+M:	Jiri Pirko <jpirko@redhat.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	drivers/net/team/
+F:	include/linux/if_team.h
+
 TEGRA SUPPORT
 M:	Colin Cross <ccross@android.com>
 M:	Erik Gilling <konkers@android.com>
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 583f66c..b3020be 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -125,6 +125,8 @@ config IFB
 	  'ifb1' etc.
 	  Look at the iproute2 documentation directory for usage etc
 
+source "drivers/net/team/Kconfig"
+
 config MACVLAN
 	tristate "MAC-VLAN support (EXPERIMENTAL)"
 	depends on EXPERIMENTAL
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index fa877cd..4e4ebfe 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_NET) += Space.o loopback.o
 obj-$(CONFIG_NETCONSOLE) += netconsole.o
 obj-$(CONFIG_PHYLIB) += phy/
 obj-$(CONFIG_RIONET) += rionet.o
+obj-$(CONFIG_NET_TEAM) += team/
 obj-$(CONFIG_TUN) += tun.o
 obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
diff --git a/drivers/net/team/Kconfig b/drivers/net/team/Kconfig
new file mode 100644
index 0000000..70a43a6
--- /dev/null
+++ b/drivers/net/team/Kconfig
@@ -0,0 +1,38 @@
+menuconfig NET_TEAM
+	tristate "Ethernet team driver support (EXPERIMENTAL)"
+	depends on EXPERIMENTAL
+	---help---
+	  This allows one to create virtual interfaces that teams together
+	  multiple ethernet devices.
+
+	  Team devices can be added using the "ip" command from the
+	  iproute2 package:
+
+	  "ip link add link [ address MAC ] [ NAME ] type team"
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called team.
+
+if NET_TEAM
+
+config NET_TEAM_MODE_ROUNDROBIN
+	tristate "Round-robin mode support"
+	depends on NET_TEAM
+	---help---
+	  Basic mode where port used for transmitting packets is selected in
+	  round-robin fashion using packet counter.
+
+	  To compile this team mode as a module, choose M here: the module
+	  will be called team_mode_roundrobin.
+
+config NET_TEAM_MODE_ACTIVEBACKUP
+	tristate "Active-backup mode support"
+	depends on NET_TEAM
+	---help---
+	  Only one port is active at a time and the rest of ports are used
+	  for backup.
+
+	  To compile this team mode as a module, choose M here: the module
+	  will be called team_mode_activebackup.
+
+endif # NET_TEAM
diff --git a/drivers/net/team/Makefile b/drivers/net/team/Makefile
new file mode 100644
index 0000000..85f2028
--- /dev/null
+++ b/drivers/net/team/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the network team driver
+#
+
+obj-$(CONFIG_NET_TEAM) += team.o
+obj-$(CONFIG_NET_TEAM_MODE_ROUNDROBIN) += team_mode_roundrobin.o
+obj-$(CONFIG_NET_TEAM_MODE_ACTIVEBACKUP) += team_mode_activebackup.o
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
new file mode 100644
index 0000000..8004916
--- /dev/null
+++ b/drivers/net/team/team.c
@@ -0,0 +1,1574 @@
+/*
+ * net/drivers/team/team.c - Network team device driver
+ * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/rcupdate.h>
+#include <linux/errno.h>
+#include <linux/ctype.h>
+#include <linux/notifier.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+#include <linux/socket.h>
+#include <linux/etherdevice.h>
+#include <linux/rtnetlink.h>
+#include <net/rtnetlink.h>
+#include <net/genetlink.h>
+#include <net/netlink.h>
+#include <linux/if_team.h>
+
+#define DRV_NAME "team"
+
+
+/**********
+ * Helpers
+ **********/
+
+#define team_port_exists(dev) (dev->priv_flags & IFF_TEAM_PORT)
+
+static struct team_port *team_port_get_rcu(const struct net_device *dev)
+{
+	struct team_port *port = rcu_dereference(dev->rx_handler_data);
+
+	return team_port_exists(dev) ? port : NULL;
+}
+
+static struct team_port *team_port_get_rtnl(const struct net_device *dev)
+{
+	struct team_port *port = rtnl_dereference(dev->rx_handler_data);
+
+	return team_port_exists(dev) ? port : NULL;
+}
+
+/*
+ * Since the ability to change mac address for open port device is tested in
+ * team_port_add, this function can be called without control of return value
+ */
+static int __set_port_mac(struct net_device *port_dev,
+			  const unsigned char *dev_addr)
+{
+	struct sockaddr addr;
+
+	memcpy(addr.sa_data, dev_addr, ETH_ALEN);
+	addr.sa_family = ARPHRD_ETHER;
+	return dev_set_mac_address(port_dev, &addr);
+}
+
+int team_port_set_orig_mac(struct team_port *port)
+{
+	return __set_port_mac(port->dev, port->orig.dev_addr);
+}
+EXPORT_SYMBOL(team_port_set_orig_mac);
+
+int team_port_set_team_mac(struct team_port *port)
+{
+	return __set_port_mac(port->dev, port->team->dev->dev_addr);
+}
+EXPORT_SYMBOL(team_port_set_team_mac);
+
+
+/*******************
+ * Options handling
+ *******************/
+
+void team_options_register(struct team *team, struct team_option *option,
+			   size_t option_count)
+{
+	int i;
+
+	for (i = 0; i < option_count; i++, option++)
+		list_add_tail(&option->list, &team->option_list);
+}
+EXPORT_SYMBOL(team_options_register);
+
+static void __team_options_change_check(struct team *team,
+					struct team_option *changed_option);
+
+static void __team_options_unregister(struct team *team,
+				      struct team_option *option,
+				      size_t option_count)
+{
+	int i;
+
+	for (i = 0; i < option_count; i++, option++)
+		list_del(&option->list);
+}
+
+void team_options_unregister(struct team *team, struct team_option *option,
+			     size_t option_count)
+{
+	__team_options_unregister(team, option, option_count);
+	__team_options_change_check(team, NULL);
+}
+EXPORT_SYMBOL(team_options_unregister);
+
+static int team_option_get(struct team *team, struct team_option *option,
+			   void *arg)
+{
+	return option->getter(team, arg);
+}
+
+static int team_option_set(struct team *team, struct team_option *option,
+			   void *arg)
+{
+	int err;
+
+	err = option->setter(team, arg);
+	if (err)
+		return err;
+
+	__team_options_change_check(team, option);
+	return err;
+}
+
+/****************
+ * Mode handling
+ ****************/
+
+static LIST_HEAD(mode_list);
+static DEFINE_SPINLOCK(mode_list_lock);
+
+static struct team_mode *__find_mode(const char *kind)
+{
+	struct team_mode *mode;
+
+	list_for_each_entry(mode, &mode_list, list) {
+		if (strcmp(mode->kind, kind) == 0)
+			return mode;
+	}
+	return NULL;
+}
+
+static bool is_good_mode_name(const char *name)
+{
+	while (*name != '\0') {
+		if (!isalpha(*name) && !isdigit(*name) && *name != '_')
+			return false;
+		name++;
+	}
+	return true;
+}
+
+int team_mode_register(struct team_mode *mode)
+{
+	int err = 0;
+
+	if (!is_good_mode_name(mode->kind) ||
+	    mode->priv_size > TEAM_MODE_PRIV_SIZE)
+		return -EINVAL;
+	spin_lock(&mode_list_lock);
+	if (__find_mode(mode->kind)) {
+		err = -EEXIST;
+		goto unlock;
+	}
+	list_add_tail(&mode->list, &mode_list);
+unlock:
+	spin_unlock(&mode_list_lock);
+	return err;
+}
+EXPORT_SYMBOL(team_mode_register);
+
+int team_mode_unregister(struct team_mode *mode)
+{
+	spin_lock(&mode_list_lock);
+	list_del_init(&mode->list);
+	spin_unlock(&mode_list_lock);
+	return 0;
+}
+EXPORT_SYMBOL(team_mode_unregister);
+
+static struct team_mode *team_mode_get(const char *kind)
+{
+	struct team_mode *mode;
+
+	spin_lock(&mode_list_lock);
+	mode = __find_mode(kind);
+	if (!mode) {
+		spin_unlock(&mode_list_lock);
+		request_module("team-mode-%s", kind);
+		spin_lock(&mode_list_lock);
+		mode = __find_mode(kind);
+	}
+	if (mode)
+		if (!try_module_get(mode->owner))
+			mode = NULL;
+
+	spin_unlock(&mode_list_lock);
+	return mode;
+}
+
+static void team_mode_put(const char *kind)
+{
+	struct team_mode *mode;
+
+	spin_lock(&mode_list_lock);
+	mode = __find_mode(kind);
+	BUG_ON(!mode);
+	module_put(mode->owner);
+	spin_unlock(&mode_list_lock);
+}
+
+/*
+ * We can benefit from the fact that it's ensured no port is present
+ * at the time of mode change.
+ */
+static int __team_change_mode(struct team *team,
+			      const struct team_mode *new_mode)
+{
+	/* Check if mode was previously set and do cleanup if so */
+	if (team->mode_kind) {
+		void (*exit_op)(struct team *team) = team->mode_ops.exit;
+
+		/* Clear ops area so no callback is called any longer */
+		team_mode_ops_clear(&team->mode_ops);
+
+		synchronize_rcu();
+
+		if (exit_op)
+			exit_op(team);
+		team_mode_put(team->mode_kind);
+		team->mode_kind = NULL;
+		/* zero private data area */
+		memset(&team->mode_priv, 0,
+		       sizeof(struct team) - offsetof(struct team, mode_priv));
+	}
+
+	if (!new_mode)
+		return 0;
+
+	if (new_mode->ops->init) {
+		int err;
+
+		err = new_mode->ops->init(team);
+		if (err)
+			return err;
+	}
+
+	team->mode_kind = new_mode->kind;
+	team_mode_ops_copy(&team->mode_ops, new_mode->ops);
+
+	return 0;
+}
+
+static int team_change_mode(struct team *team, const char *kind)
+{
+	struct team_mode *new_mode;
+	struct net_device *dev = team->dev;
+	int err;
+
+	if (!list_empty(&team->port_list)) {
+		netdev_err(dev, "No ports can be present during mode change\n");
+		return -EBUSY;
+	}
+
+	if (team->mode_kind && strcmp(team->mode_kind, kind) == 0) {
+		netdev_err(dev, "Unable to change to the same mode the team is in\n");
+		return -EINVAL;
+	}
+
+	new_mode = team_mode_get(kind);
+	if (!new_mode) {
+		netdev_err(dev, "Mode \"%s\" not found\n", kind);
+		return -EINVAL;
+	}
+
+	err = __team_change_mode(team, new_mode);
+	if (err) {
+		netdev_err(dev, "Failed to change to mode \"%s\"\n", kind);
+		team_mode_put(kind);
+		return err;
+	}
+
+	netdev_info(dev, "Mode changed to \"%s\"\n", kind);
+	return 0;
+}
+
+
+/************************
+ * Rx path frame handler
+ ************************/
+
+/* note: already called with rcu_read_lock */
+static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
+{
+	struct sk_buff *skb = *pskb;
+	struct team_port *port;
+	struct team *team;
+	rx_handler_result_t res = RX_HANDLER_ANOTHER;
+
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (!skb)
+		return RX_HANDLER_CONSUMED;
+
+	*pskb = skb;
+
+	port = team_port_get_rcu(skb->dev);
+	team = port->team;
+
+	if (team->mode_ops.receive)
+		res = team->mode_ops.receive(team, port, skb);
+
+	if (res == RX_HANDLER_ANOTHER) {
+		struct team_pcpu_stats *pcpu_stats;
+
+		pcpu_stats = this_cpu_ptr(team->pcpu_stats);
+		u64_stats_update_begin(&pcpu_stats->syncp);
+		pcpu_stats->rx_packets++;
+		pcpu_stats->rx_bytes += skb->len;
+		if (skb->pkt_type == PACKET_MULTICAST)
+			pcpu_stats->rx_multicast++;
+		u64_stats_update_end(&pcpu_stats->syncp);
+
+		skb->dev = team->dev;
+	} else {
+		this_cpu_inc(team->pcpu_stats->rx_dropped);
+	}
+
+	return res;
+}
+
+
+/****************
+ * Port handling
+ ****************/
+
+static bool team_port_find(const struct team *team,
+			   const struct team_port *port)
+{
+	struct team_port *cur;
+
+	list_for_each_entry(cur, &team->port_list, list)
+		if (cur == port)
+			return true;
+	return false;
+}
+
+/*
+ * Add/delete port to the team port list. Write guarded by rtnl_lock.
+ * Takes care of correct port->index setup (might be racy).
+ */
+static void team_port_list_add_port(struct team *team,
+				    struct team_port *port)
+{
+	port->index = team->port_count++;
+	hlist_add_head_rcu(&port->hlist,
+			   team_port_index_hash(team, port->index));
+	list_add_tail_rcu(&port->list, &team->port_list);
+}
+
+static void __reconstruct_port_hlist(struct team *team, int rm_index)
+{
+	int i;
+	struct team_port *port;
+
+	for (i = rm_index + 1; i < team->port_count; i++) {
+		port = team_get_port_by_index_rcu(team, i);
+		hlist_del_rcu(&port->hlist);
+		port->index--;
+		hlist_add_head_rcu(&port->hlist,
+				   team_port_index_hash(team, port->index));
+	}
+}
+
+static void team_port_list_del_port(struct team *team,
+				   struct team_port *port)
+{
+	int rm_index = port->index;
+
+	hlist_del_rcu(&port->hlist);
+	list_del_rcu(&port->list);
+	__reconstruct_port_hlist(team, rm_index);
+	team->port_count--;
+}
+
+#define TEAM_VLAN_FEATURES (NETIF_F_ALL_CSUM | NETIF_F_SG | \
+			    NETIF_F_FRAGLIST | NETIF_F_ALL_TSO | \
+			    NETIF_F_HIGHDMA | NETIF_F_LRO)
+
+static void __team_compute_features(struct team *team)
+{
+	struct team_port *port;
+	u32 vlan_features = TEAM_VLAN_FEATURES;
+	unsigned short max_hard_header_len = ETH_HLEN;
+
+	list_for_each_entry(port, &team->port_list, list) {
+		vlan_features = netdev_increment_features(vlan_features,
+					port->dev->vlan_features,
+					TEAM_VLAN_FEATURES);
+
+		if (port->dev->hard_header_len > max_hard_header_len)
+			max_hard_header_len = port->dev->hard_header_len;
+	}
+
+	team->dev->vlan_features = vlan_features;
+	team->dev->hard_header_len = max_hard_header_len;
+
+	netdev_change_features(team->dev);
+}
+
+static void team_compute_features(struct team *team)
+{
+	spin_lock(&team->lock);
+	__team_compute_features(team);
+	spin_unlock(&team->lock);
+}
+
+static int team_port_enter(struct team *team, struct team_port *port)
+{
+	int err = 0;
+
+	dev_hold(team->dev);
+	port->dev->priv_flags |= IFF_TEAM_PORT;
+	if (team->mode_ops.port_enter) {
+		err = team->mode_ops.port_enter(team, port);
+		if (err) {
+			netdev_err(team->dev, "Device %s failed to enter team mode\n",
+				   port->dev->name);
+			goto err_port_enter;
+		}
+	}
+
+	return 0;
+
+err_port_enter:
+	port->dev->priv_flags &= ~IFF_TEAM_PORT;
+	dev_put(team->dev);
+
+	return err;
+}
+
+static void team_port_leave(struct team *team, struct team_port *port)
+{
+	if (team->mode_ops.port_leave)
+		team->mode_ops.port_leave(team, port);
+	port->dev->priv_flags &= ~IFF_TEAM_PORT;
+	dev_put(team->dev);
+}
+
+static void __team_port_change_check(struct team_port *port, bool linkup);
+
+static int team_port_add(struct team *team, struct net_device *port_dev)
+{
+	struct net_device *dev = team->dev;
+	struct team_port *port;
+	char *portname = port_dev->name;
+	char tmp_addr[ETH_ALEN];
+	int err;
+
+	if (port_dev->flags & IFF_LOOPBACK ||
+	    port_dev->type != ARPHRD_ETHER) {
+		netdev_err(dev, "Device %s is of an unsupported type\n",
+			   portname);
+		return -EINVAL;
+	}
+
+	if (team_port_exists(port_dev)) {
+		netdev_err(dev, "Device %s is already a port "
+				"of a team device\n", portname);
+		return -EBUSY;
+	}
+
+	if (port_dev->flags & IFF_UP) {
+		netdev_err(dev, "Device %s is up. Set it down before adding it as a team port\n",
+			   portname);
+		return -EBUSY;
+	}
+
+	port = kzalloc(sizeof(struct team_port), GFP_KERNEL);
+	if (!port)
+		return -ENOMEM;
+
+	port->dev = port_dev;
+	port->team = team;
+
+	port->orig.mtu = port_dev->mtu;
+	err = dev_set_mtu(port_dev, dev->mtu);
+	if (err) {
+		netdev_dbg(dev, "Error %d calling dev_set_mtu\n", err);
+		goto err_set_mtu;
+	}
+
+	memcpy(port->orig.dev_addr, port_dev->dev_addr, ETH_ALEN);
+	random_ether_addr(tmp_addr);
+	err = __set_port_mac(port_dev, tmp_addr);
+	if (err) {
+		netdev_dbg(dev, "Device %s mac addr set failed\n",
+			   portname);
+		goto err_set_mac_rand;
+	}
+
+	err = dev_open(port_dev);
+	if (err) {
+		netdev_dbg(dev, "Device %s opening failed\n",
+			   portname);
+		goto err_dev_open;
+	}
+
+	err = team_port_set_orig_mac(port);
+	if (err) {
+		netdev_dbg(dev, "Device %s mac addr set failed - Device does not support addr change when it's opened\n",
+			   portname);
+		goto err_set_mac_opened;
+	}
+
+	err = team_port_enter(team, port);
+	if (err) {
+		netdev_err(dev, "Device %s failed to enter team mode\n",
+			   portname);
+		goto err_port_enter;
+	}
+
+	err = netdev_set_master(port_dev, dev);
+	if (err) {
+		netdev_err(dev, "Device %s failed to set master\n", portname);
+		goto err_set_master;
+	}
+
+	err = netdev_rx_handler_register(port_dev, team_handle_frame,
+					 port);
+	if (err) {
+		netdev_err(dev, "Device %s failed to register rx_handler\n",
+			   portname);
+		goto err_handler_register;
+	}
+
+	team_port_list_add_port(team, port);
+	__team_compute_features(team);
+	__team_port_change_check(port, !!netif_carrier_ok(port_dev));
+
+	netdev_info(dev, "Port device %s added\n", portname);
+
+	return 0;
+
+err_handler_register:
+	netdev_set_master(port_dev, NULL);
+
+err_set_master:
+	team_port_leave(team, port);
+
+err_port_enter:
+err_set_mac_opened:
+	dev_close(port_dev);
+
+err_dev_open:
+	team_port_set_orig_mac(port);
+
+err_set_mac_rand:
+	dev_set_mtu(port_dev, port->orig.mtu);
+
+err_set_mtu:
+	kfree(port);
+
+	return err;
+}
+
+static int team_port_del(struct team *team, struct net_device *port_dev)
+{
+	struct net_device *dev = team->dev;
+	struct team_port *port;
+	char *portname = port_dev->name;
+
+	port = team_port_get_rtnl(port_dev);
+	if (!port || !team_port_find(team, port)) {
+		netdev_err(dev, "Device %s does not act as a port of this team\n",
+			   portname);
+		return -ENOENT;
+	}
+
+	__team_port_change_check(port, false);
+	team_port_list_del_port(team, port);
+	netdev_rx_handler_unregister(port_dev);
+	netdev_set_master(port_dev, NULL);
+	team_port_leave(team, port);
+	dev_close(port_dev);
+	team_port_set_orig_mac(port);
+	dev_set_mtu(port_dev, port->orig.mtu);
+	synchronize_rcu();
+	kfree(port);
+	netdev_info(dev, "Port device %s removed\n", portname);
+	__team_compute_features(team);
+
+	return 0;
+}
+
+
+/*****************
+ * Net device ops
+ *****************/
+
+static const char team_no_mode_kind[] = "*NOMODE*";
+
+static int team_mode_option_get(struct team *team, void *arg)
+{
+	const char **str = arg;
+
+	*str = team->mode_kind ? team->mode_kind : team_no_mode_kind;
+	return 0;
+}
+
+static int team_mode_option_set(struct team *team, void *arg)
+{
+	const char **str = arg;
+
+	return team_change_mode(team, *str);
+}
+
+static struct team_option team_options[] = {
+	{
+		.name = "mode",
+		.type = TEAM_OPTION_TYPE_STRING,
+		.getter = team_mode_option_get,
+		.setter = team_mode_option_set,
+	},
+};
+
+static int team_init(struct net_device *dev)
+{
+	struct team *team = netdev_priv(dev);
+	int i;
+
+	team->dev = dev;
+	spin_lock_init(&team->lock);
+
+	team->pcpu_stats = alloc_percpu(struct team_pcpu_stats);
+	if (!team->pcpu_stats)
+		return -ENOMEM;
+
+	for (i = 0; i < TEAM_PORT_HASHENTRIES; i++)
+		INIT_HLIST_HEAD(&team->port_hlist[i]);
+	INIT_LIST_HEAD(&team->port_list);
+
+	INIT_LIST_HEAD(&team->option_list);
+	team_options_register(team, team_options, ARRAY_SIZE(team_options));
+	netif_carrier_off(dev);
+
+	return 0;
+}
+
+static void team_uninit(struct net_device *dev)
+{
+	struct team *team = netdev_priv(dev);
+	struct team_port *port;
+	struct team_port *tmp;
+
+	spin_lock(&team->lock);
+	list_for_each_entry_safe(port, tmp, &team->port_list, list)
+		team_port_del(team, port->dev);
+
+	__team_change_mode(team, NULL); /* cleanup */
+	__team_options_unregister(team, team_options, ARRAY_SIZE(team_options));
+	spin_unlock(&team->lock);
+}
+
+static void team_destructor(struct net_device *dev)
+{
+	struct team *team = netdev_priv(dev);
+
+	free_percpu(team->pcpu_stats);
+	free_netdev(dev);
+}
+
+static int team_open(struct net_device *dev)
+{
+	netif_carrier_on(dev);
+	return 0;
+}
+
+static int team_close(struct net_device *dev)
+{
+	netif_carrier_off(dev);
+	return 0;
+}
+
+/*
+ * note: already called with rcu_read_lock
+ */
+static netdev_tx_t team_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct team *team = netdev_priv(dev);
+	bool tx_success = false;
+	unsigned int len = skb->len;
+
+	/*
+	 * Ensure transmit function is called only in case there is at least
+	 * one port present.
+	 */
+	if (likely(!list_empty(&team->port_list) && team->mode_ops.transmit))
+		tx_success = team->mode_ops.transmit(team, skb);
+	if (tx_success) {
+		struct team_pcpu_stats *pcpu_stats;
+
+		pcpu_stats = this_cpu_ptr(team->pcpu_stats);
+		u64_stats_update_begin(&pcpu_stats->syncp);
+		pcpu_stats->tx_packets++;
+		pcpu_stats->tx_bytes += len;
+		u64_stats_update_end(&pcpu_stats->syncp);
+	} else {
+		this_cpu_inc(team->pcpu_stats->tx_dropped);
+	}
+
+	return NETDEV_TX_OK;
+}
+
+static void team_change_rx_flags(struct net_device *dev, int change)
+{
+	struct team *team = netdev_priv(dev);
+	struct team_port *port;
+	int inc;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(port, &team->port_list, list) {
+		if (change & IFF_PROMISC) {
+			inc = dev->flags & IFF_PROMISC ? 1 : -1;
+			dev_set_promiscuity(port->dev, inc);
+		}
+		if (change & IFF_ALLMULTI) {
+			inc = dev->flags & IFF_ALLMULTI ? 1 : -1;
+			dev_set_allmulti(port->dev, inc);
+		}
+	}
+	rcu_read_unlock();
+}
+
+static void team_set_rx_mode(struct net_device *dev)
+{
+	struct team *team = netdev_priv(dev);
+	struct team_port *port;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(port, &team->port_list, list) {
+		dev_uc_sync(port->dev, dev);
+		dev_mc_sync(port->dev, dev);
+	}
+	rcu_read_unlock();
+}
+
+static int team_set_mac_address(struct net_device *dev, void *p)
+{
+	struct team *team = netdev_priv(dev);
+	struct team_port *port;
+	struct sockaddr *addr = p;
+
+	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
+	rcu_read_lock();
+	list_for_each_entry_rcu(port, &team->port_list, list)
+		if (team->mode_ops.port_change_mac)
+			team->mode_ops.port_change_mac(team, port);
+	rcu_read_unlock();
+	return 0;
+}
+
+static int team_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct team *team = netdev_priv(dev);
+	struct team_port *port;
+	int err;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(port, &team->port_list, list) {
+		err = dev_set_mtu(port->dev, new_mtu);
+		if (err) {
+			netdev_err(dev, "Device %s failed to change mtu",
+				   port->dev->name);
+			goto unwind;
+		}
+	}
+	rcu_read_unlock();
+
+	dev->mtu = new_mtu;
+
+	return 0;
+
+unwind:
+	list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
+		dev_set_mtu(port->dev, dev->mtu);
+
+	rcu_read_unlock();
+	return err;
+}
+
+static struct rtnl_link_stats64 *
+team_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
+{
+	struct team *team = netdev_priv(dev);
+	struct team_pcpu_stats *p;
+	u64 rx_packets, rx_bytes, rx_multicast, tx_packets, tx_bytes;
+	u32 rx_dropped = 0, tx_dropped = 0;
+	unsigned int start;
+	int i;
+
+	for_each_possible_cpu(i) {
+		p = per_cpu_ptr(team->pcpu_stats, i);
+		do {
+			start = u64_stats_fetch_begin_bh(&p->syncp);
+			rx_packets	= p->rx_packets;
+			rx_bytes	= p->rx_bytes;
+			rx_multicast	= p->rx_multicast;
+			tx_packets	= p->tx_packets;
+			tx_bytes	= p->tx_bytes;
+		} while (u64_stats_fetch_retry_bh(&p->syncp, start));
+
+		stats->rx_packets	+= rx_packets;
+		stats->rx_bytes		+= rx_bytes;
+		stats->multicast	+= rx_multicast;
+		stats->tx_packets	+= tx_packets;
+		stats->tx_bytes		+= tx_bytes;
+		/*
+		 * rx_dropped & tx_dropped are u32, updated
+		 * without syncp protection.
+		 */
+		rx_dropped	+= p->rx_dropped;
+		tx_dropped	+= p->tx_dropped;
+	}
+	stats->rx_dropped	= rx_dropped;
+	stats->tx_dropped	= tx_dropped;
+	return stats;
+}
+
+static void team_vlan_rx_add_vid(struct net_device *dev, uint16_t vid)
+{
+	struct team *team = netdev_priv(dev);
+	struct team_port *port;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(port, &team->port_list, list) {
+		const struct net_device_ops *ops = port->dev->netdev_ops;
+
+		ops->ndo_vlan_rx_add_vid(port->dev, vid);
+	}
+	rcu_read_unlock();
+}
+
+static void team_vlan_rx_kill_vid(struct net_device *dev, uint16_t vid)
+{
+	struct team *team = netdev_priv(dev);
+	struct team_port *port;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(port, &team->port_list, list) {
+		const struct net_device_ops *ops = port->dev->netdev_ops;
+
+		ops->ndo_vlan_rx_kill_vid(port->dev, vid);
+	}
+	rcu_read_unlock();
+}
+
+static int team_add_slave(struct net_device *dev, struct net_device *port_dev)
+{
+	struct team *team = netdev_priv(dev);
+	int err;
+
+	spin_lock(&team->lock);
+	err = team_port_add(team, port_dev);
+	spin_unlock(&team->lock);
+	return err;
+}
+
+static int team_del_slave(struct net_device *dev, struct net_device *port_dev)
+{
+	struct team *team = netdev_priv(dev);
+	int err;
+
+	spin_lock(&team->lock);
+	err = team_port_del(team, port_dev);
+	spin_unlock(&team->lock);
+	return err;
+}
+
+static const struct net_device_ops team_netdev_ops = {
+	.ndo_init		= team_init,
+	.ndo_uninit		= team_uninit,
+	.ndo_open		= team_open,
+	.ndo_stop		= team_close,
+	.ndo_start_xmit		= team_xmit,
+	.ndo_change_rx_flags	= team_change_rx_flags,
+	.ndo_set_rx_mode	= team_set_rx_mode,
+	.ndo_set_mac_address	= team_set_mac_address,
+	.ndo_change_mtu		= team_change_mtu,
+	.ndo_get_stats64	= team_get_stats64,
+	.ndo_vlan_rx_add_vid	= team_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid	= team_vlan_rx_kill_vid,
+	.ndo_add_slave		= team_add_slave,
+	.ndo_del_slave		= team_del_slave,
+};
+
+
+/***********************
+ * rt netlink interface
+ ***********************/
+
+static void team_setup(struct net_device *dev)
+{
+	ether_setup(dev);
+
+	dev->netdev_ops = &team_netdev_ops;
+	dev->destructor	= team_destructor;
+	dev->tx_queue_len = 0;
+	dev->flags |= IFF_MULTICAST;
+	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
+
+	/*
+	 * Indicate we support unicast address filtering. That way core won't
+	 * bring us to promisc mode in case a unicast addr is added.
+	 * Let this up to underlay drivers.
+	 */
+	dev->priv_flags |= IFF_UNICAST_FLT;
+
+	dev->features |= NETIF_F_LLTX;
+	dev->features |= NETIF_F_GRO;
+	dev->hw_features = NETIF_F_HW_VLAN_TX |
+			   NETIF_F_HW_VLAN_RX |
+			   NETIF_F_HW_VLAN_FILTER;
+
+	dev->features |= dev->hw_features;
+}
+
+static int team_newlink(struct net *src_net, struct net_device *dev,
+			struct nlattr *tb[], struct nlattr *data[])
+{
+	int err;
+
+	if (tb[IFLA_ADDRESS] == NULL)
+		random_ether_addr(dev->dev_addr);
+
+	err = register_netdevice(dev);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int team_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+	if (tb[IFLA_ADDRESS]) {
+		if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN)
+			return -EINVAL;
+		if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS])))
+			return -EADDRNOTAVAIL;
+	}
+	return 0;
+}
+
+static struct rtnl_link_ops team_link_ops __read_mostly = {
+	.kind		= DRV_NAME,
+	.priv_size	= sizeof(struct team),
+	.setup		= team_setup,
+	.newlink	= team_newlink,
+	.validate	= team_validate,
+};
+
+
+/***********************************
+ * Generic netlink custom interface
+ ***********************************/
+
+static struct genl_family team_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= TEAM_GENL_NAME,
+	.version	= TEAM_GENL_VERSION,
+	.maxattr	= TEAM_ATTR_MAX,
+	.netnsok	= true,
+};
+
+static const struct nla_policy team_nl_policy[TEAM_ATTR_MAX + 1] = {
+	[TEAM_ATTR_UNSPEC]			= { .type = NLA_UNSPEC, },
+	[TEAM_ATTR_TEAM_IFINDEX]		= { .type = NLA_U32 },
+	[TEAM_ATTR_LIST_OPTION]			= { .type = NLA_NESTED },
+	[TEAM_ATTR_LIST_PORT]			= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy
+team_nl_option_policy[TEAM_ATTR_OPTION_MAX + 1] = {
+	[TEAM_ATTR_OPTION_UNSPEC]		= { .type = NLA_UNSPEC, },
+	[TEAM_ATTR_OPTION_NAME] = {
+		.type = NLA_STRING,
+		.len = TEAM_STRING_MAX_LEN,
+	},
+	[TEAM_ATTR_OPTION_CHANGED]		= { .type = NLA_FLAG },
+	[TEAM_ATTR_OPTION_TYPE]			= { .type = NLA_U8 },
+	[TEAM_ATTR_OPTION_DATA] = {
+		.type = NLA_BINARY,
+		.len = TEAM_STRING_MAX_LEN,
+	},
+};
+
+static int team_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info)
+{
+	struct sk_buff *msg;
+	void *hdr;
+	int err;
+
+	msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(msg, info->snd_pid, info->snd_seq,
+			  &team_nl_family, 0, TEAM_CMD_NOOP);
+	if (IS_ERR(hdr)) {
+		err = PTR_ERR(hdr);
+		goto err_msg_put;
+	}
+
+	genlmsg_end(msg, hdr);
+
+	return genlmsg_unicast(genl_info_net(info), msg, info->snd_pid);
+
+err_msg_put:
+	nlmsg_free(msg);
+
+	return err;
+}
+
+/*
+ * Netlink cmd functions should be locked by following two functions.
+ * To ensure team_uninit would not be called in between, hold rcu_read_lock
+ * all the time.
+ */
+static struct team *team_nl_team_get(struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	int ifindex;
+	struct net_device *dev;
+	struct team *team;
+
+	if (!info->attrs[TEAM_ATTR_TEAM_IFINDEX])
+		return NULL;
+
+	ifindex = nla_get_u32(info->attrs[TEAM_ATTR_TEAM_IFINDEX]);
+	rcu_read_lock();
+	dev = dev_get_by_index_rcu(net, ifindex);
+	if (!dev || dev->netdev_ops != &team_netdev_ops) {
+		rcu_read_unlock();
+		return NULL;
+	}
+
+	team = netdev_priv(dev);
+	spin_lock(&team->lock);
+	return team;
+}
+
+static void team_nl_team_put(struct team *team)
+{
+	spin_unlock(&team->lock);
+	rcu_read_unlock();
+}
+
+static int team_nl_send_generic(struct genl_info *info, struct team *team,
+				int (*fill_func)(struct sk_buff *skb,
+						 struct genl_info *info,
+						 int flags, struct team *team))
+{
+	struct sk_buff *skb;
+	int err;
+
+	skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	err = fill_func(skb, info, NLM_F_ACK, team);
+	if (err < 0)
+		goto err_fill;
+
+	err = genlmsg_unicast(genl_info_net(info), skb, info->snd_pid);
+	return err;
+
+err_fill:
+	nlmsg_free(skb);
+	return err;
+}
+
+static int team_nl_fill_options_get_changed(struct sk_buff *skb,
+					    u32 pid, u32 seq, int flags,
+					    struct team *team,
+					    struct team_option *changed_option)
+{
+	struct nlattr *option_list;
+	void *hdr;
+	struct team_option *option;
+
+	hdr = genlmsg_put(skb, pid, seq, &team_nl_family, flags,
+			  TEAM_CMD_OPTIONS_GET);
+	if (IS_ERR(hdr))
+		return PTR_ERR(hdr);
+
+	NLA_PUT_U32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex);
+	option_list = nla_nest_start(skb, TEAM_ATTR_LIST_OPTION);
+	if (!option_list)
+		return -EMSGSIZE;
+
+	list_for_each_entry(option, &team->option_list, list) {
+		struct nlattr *option_item;
+		long arg;
+
+		option_item = nla_nest_start(skb, TEAM_ATTR_ITEM_OPTION);
+		if (!option_item)
+			goto nla_put_failure;
+		NLA_PUT_STRING(skb, TEAM_ATTR_OPTION_NAME, option->name);
+		if (option == changed_option)
+			NLA_PUT_FLAG(skb, TEAM_ATTR_OPTION_CHANGED);
+		switch (option->type) {
+		case TEAM_OPTION_TYPE_U32:
+			NLA_PUT_U8(skb, TEAM_ATTR_OPTION_TYPE, NLA_U32);
+			team_option_get(team, option, &arg);
+			NLA_PUT_U32(skb, TEAM_ATTR_OPTION_DATA, arg);
+			break;
+		case TEAM_OPTION_TYPE_STRING:
+			NLA_PUT_U8(skb, TEAM_ATTR_OPTION_TYPE, NLA_STRING);
+			team_option_get(team, option, &arg);
+			NLA_PUT_STRING(skb, TEAM_ATTR_OPTION_DATA,
+				       (char *) arg);
+			break;
+		default:
+			BUG();
+		}
+		nla_nest_end(skb, option_item);
+	}
+
+	nla_nest_end(skb, option_list);
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int team_nl_fill_options_get(struct sk_buff *skb,
+				    struct genl_info *info, int flags,
+				    struct team *team)
+{
+	return team_nl_fill_options_get_changed(skb, info->snd_pid,
+						info->snd_seq, NLM_F_ACK,
+						team, NULL);
+}
+
+static int team_nl_cmd_options_get(struct sk_buff *skb, struct genl_info *info)
+{
+	struct team *team;
+	int err;
+
+	team = team_nl_team_get(info);
+	if (!team)
+		return -EINVAL;
+
+	err = team_nl_send_generic(info, team, team_nl_fill_options_get);
+
+	team_nl_team_put(team);
+
+	return err;
+}
+
+static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
+{
+	struct team *team;
+	int err = 0;
+	int i;
+	struct nlattr *nl_option;
+
+	team = team_nl_team_get(info);
+	if (!team)
+		return -EINVAL;
+
+	err = -EINVAL;
+	if (!info->attrs[TEAM_ATTR_LIST_OPTION]) {
+		err = -EINVAL;
+		goto team_put;
+	}
+
+	nla_for_each_nested(nl_option, info->attrs[TEAM_ATTR_LIST_OPTION], i) {
+		struct nlattr *mode_attrs[TEAM_ATTR_OPTION_MAX + 1];
+		enum team_option_type opt_type;
+		struct team_option *option;
+		char *opt_name;
+		bool opt_found = false;
+
+		if (nla_type(nl_option) != TEAM_ATTR_ITEM_OPTION) {
+			err = -EINVAL;
+			goto team_put;
+		}
+		err = nla_parse_nested(mode_attrs, TEAM_ATTR_OPTION_MAX,
+				       nl_option, team_nl_option_policy);
+		if (err)
+			goto team_put;
+		if (!mode_attrs[TEAM_ATTR_OPTION_NAME] ||
+		    !mode_attrs[TEAM_ATTR_OPTION_TYPE] ||
+		    !mode_attrs[TEAM_ATTR_OPTION_DATA]) {
+			err = -EINVAL;
+			goto team_put;
+		}
+		switch (nla_get_u8(mode_attrs[TEAM_ATTR_OPTION_TYPE])) {
+		case NLA_U32:
+			opt_type = TEAM_OPTION_TYPE_U32;
+			break;
+		case NLA_STRING:
+			opt_type = TEAM_OPTION_TYPE_STRING;
+			break;
+		default:
+			goto team_put;
+		}
+
+		opt_name = nla_data(mode_attrs[TEAM_ATTR_OPTION_NAME]);
+		list_for_each_entry(option, &team->option_list, list) {
+			long arg;
+			struct nlattr *opt_data_attr;
+
+			if (option->type != opt_type ||
+			    strcmp(option->name, opt_name))
+				continue;
+			opt_found = true;
+			opt_data_attr = mode_attrs[TEAM_ATTR_OPTION_DATA];
+			switch (opt_type) {
+			case TEAM_OPTION_TYPE_U32:
+				arg = nla_get_u32(opt_data_attr);
+				break;
+			case TEAM_OPTION_TYPE_STRING:
+				arg = (long) nla_data(opt_data_attr);
+				break;
+			default:
+				BUG();
+			}
+			err = team_option_set(team, option, &arg);
+			if (err)
+				goto team_put;
+		}
+		if (!opt_found) {
+			err = -ENOENT;
+			goto team_put;
+		}
+	}
+
+team_put:
+	team_nl_team_put(team);
+
+	return err;
+}
+
+static int team_nl_fill_port_list_get_changed(struct sk_buff *skb,
+					      u32 pid, u32 seq, int flags,
+					      struct team *team,
+					      struct team_port *changed_port)
+{
+	struct nlattr *port_list;
+	void *hdr;
+	struct team_port *port;
+
+	hdr = genlmsg_put(skb, pid, seq, &team_nl_family, flags,
+			  TEAM_CMD_PORT_LIST_GET);
+	if (IS_ERR(hdr))
+		return PTR_ERR(hdr);
+
+	NLA_PUT_U32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex);
+	port_list = nla_nest_start(skb, TEAM_ATTR_LIST_PORT);
+	if (!port_list)
+		return -EMSGSIZE;
+
+	list_for_each_entry_rcu(port, &team->port_list, list) {
+		struct nlattr *port_item;
+
+		port_item = nla_nest_start(skb, TEAM_ATTR_ITEM_PORT);
+		if (!port_item)
+			goto nla_put_failure;
+		NLA_PUT_U32(skb, TEAM_ATTR_PORT_IFINDEX, port->dev->ifindex);
+		if (port == changed_port)
+			NLA_PUT_FLAG(skb, TEAM_ATTR_PORT_CHANGED);
+		if (port->linkup)
+			NLA_PUT_FLAG(skb, TEAM_ATTR_PORT_LINKUP);
+		NLA_PUT_U32(skb, TEAM_ATTR_PORT_SPEED, port->speed);
+		NLA_PUT_U8(skb, TEAM_ATTR_PORT_DUPLEX, port->duplex);
+		nla_nest_end(skb, port_item);
+	}
+
+	nla_nest_end(skb, port_list);
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int team_nl_fill_port_list_get(struct sk_buff *skb,
+				      struct genl_info *info, int flags,
+				      struct team *team)
+{
+	return team_nl_fill_port_list_get_changed(skb, info->snd_pid,
+						  info->snd_seq, NLM_F_ACK,
+						  team, NULL);
+}
+
+static int team_nl_cmd_port_list_get(struct sk_buff *skb,
+				     struct genl_info *info)
+{
+	struct team *team;
+	int err;
+
+	team = team_nl_team_get(info);
+	if (!team)
+		return -EINVAL;
+
+	err = team_nl_send_generic(info, team, team_nl_fill_port_list_get);
+
+	team_nl_team_put(team);
+
+	return err;
+}
+
+static struct genl_ops team_nl_ops[] = {
+	{
+		.cmd = TEAM_CMD_NOOP,
+		.doit = team_nl_cmd_noop,
+		.policy = team_nl_policy,
+	},
+	{
+		.cmd = TEAM_CMD_OPTIONS_SET,
+		.doit = team_nl_cmd_options_set,
+		.policy = team_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = TEAM_CMD_OPTIONS_GET,
+		.doit = team_nl_cmd_options_get,
+		.policy = team_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = TEAM_CMD_PORT_LIST_GET,
+		.doit = team_nl_cmd_port_list_get,
+		.policy = team_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+static struct genl_multicast_group team_change_event_mcgrp = {
+	.name = TEAM_GENL_CHANGE_EVENT_MC_GRP_NAME,
+};
+
+static int team_nl_send_event_options_get(struct team *team,
+					  struct team_option *changed_option)
+{
+	struct sk_buff *skb;
+	int err;
+	struct net *net = dev_net(team->dev);
+
+	skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	err = team_nl_fill_options_get_changed(skb, 0, 0, 0, team,
+					       changed_option);
+	if (err < 0)
+		goto err_fill;
+
+	err = genlmsg_multicast_netns(net, skb, 0, team_change_event_mcgrp.id,
+				      GFP_KERNEL);
+	return err;
+
+err_fill:
+	nlmsg_free(skb);
+	return err;
+}
+
+static int team_nl_send_event_port_list_get(struct team_port *port)
+{
+	struct sk_buff *skb;
+	int err;
+	struct net *net = dev_net(port->team->dev);
+
+	skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	err = team_nl_fill_port_list_get_changed(skb, 0, 0, 0,
+						 port->team, port);
+	if (err < 0)
+		goto err_fill;
+
+	err = genlmsg_multicast_netns(net, skb, 0, team_change_event_mcgrp.id,
+				      GFP_KERNEL);
+	return err;
+
+err_fill:
+	nlmsg_free(skb);
+	return err;
+}
+
+static int team_nl_init(void)
+{
+	int err;
+
+	err = genl_register_family_with_ops(&team_nl_family, team_nl_ops,
+					    ARRAY_SIZE(team_nl_ops));
+	if (err)
+		return err;
+
+	err = genl_register_mc_group(&team_nl_family, &team_change_event_mcgrp);
+	if (err)
+		goto err_change_event_grp_reg;
+
+	return 0;
+
+err_change_event_grp_reg:
+	genl_unregister_family(&team_nl_family);
+
+	return err;
+}
+
+static void team_nl_fini(void)
+{
+	genl_unregister_family(&team_nl_family);
+}
+
+
+/******************
+ * Change checkers
+ ******************/
+
+static void __team_options_change_check(struct team *team,
+					struct team_option *changed_option)
+{
+	int err;
+
+	err = team_nl_send_event_options_get(team, changed_option);
+	if (err)
+		netdev_warn(team->dev, "Failed to send options change via netlink\n");
+}
+
+/* rtnl lock is held */
+static void __team_port_change_check(struct team_port *port, bool linkup)
+{
+	int err;
+
+	if (port->linkup == linkup)
+		return;
+
+	port->linkup = linkup;
+	if (linkup) {
+		struct ethtool_cmd ecmd;
+
+		err = __ethtool_get_settings(port->dev, &ecmd);
+		if (!err) {
+			port->speed = ethtool_cmd_speed(&ecmd);
+			port->duplex = ecmd.duplex;
+			goto send_event;
+		}
+	}
+	port->speed = 0;
+	port->duplex = 0;
+
+send_event:
+	err = team_nl_send_event_port_list_get(port);
+	if (err)
+		netdev_warn(port->team->dev, "Failed to send port change of device %s via netlink\n",
+			    port->dev->name);
+
+}
+
+static void team_port_change_check(struct team_port *port, bool linkup)
+{
+	struct team *team = port->team;
+
+	spin_lock(&team->lock);
+	__team_port_change_check(port, linkup);
+	spin_unlock(&team->lock);
+}
+
+/************************************
+ * Net device notifier event handler
+ ************************************/
+
+static int team_device_event(struct notifier_block *unused,
+			     unsigned long event, void *ptr)
+{
+	struct net_device *dev = (struct net_device *) ptr;
+	struct team_port *port;
+
+	port = team_port_get_rtnl(dev);
+	if (!port)
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_UP:
+		if (netif_carrier_ok(dev))
+			team_port_change_check(port, true);
+	case NETDEV_DOWN:
+		team_port_change_check(port, false);
+	case NETDEV_CHANGE:
+		if (netif_running(port->dev))
+			team_port_change_check(port,
+					       !!netif_carrier_ok(port->dev));
+		break;
+	case NETDEV_UNREGISTER:
+		team_del_slave(port->team->dev, dev);
+		break;
+	case NETDEV_FEAT_CHANGE:
+		team_compute_features(port->team);
+		break;
+	case NETDEV_CHANGEMTU:
+		/* Forbid to change mtu of underlaying device */
+		return NOTIFY_BAD;
+	case NETDEV_CHANGEADDR:
+		/* Forbid to change addr of underlaying device */
+		return NOTIFY_BAD;
+	case NETDEV_PRE_TYPE_CHANGE:
+		/* Forbid to change type of underlaying device */
+		return NOTIFY_BAD;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block team_notifier_block __read_mostly = {
+	.notifier_call = team_device_event,
+};
+
+
+/***********************
+ * Module init and exit
+ ***********************/
+
+static int __init team_module_init(void)
+{
+	int err;
+
+	register_netdevice_notifier(&team_notifier_block);
+
+	err = rtnl_link_register(&team_link_ops);
+	if (err)
+		goto err_rtnl_reg;
+
+	err = team_nl_init();
+	if (err)
+		goto err_nl_init;
+
+	return 0;
+
+err_nl_init:
+	rtnl_link_unregister(&team_link_ops);
+
+err_rtnl_reg:
+	unregister_netdevice_notifier(&team_notifier_block);
+
+	return err;
+}
+
+static void __exit team_module_exit(void)
+{
+	team_nl_fini();
+	rtnl_link_unregister(&team_link_ops);
+	unregister_netdevice_notifier(&team_notifier_block);
+}
+
+module_init(team_module_init);
+module_exit(team_module_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>");
+MODULE_DESCRIPTION("Ethernet team device driver");
+MODULE_ALIAS_RTNL_LINK(DRV_NAME);
diff --git a/drivers/net/team/team_mode_activebackup.c b/drivers/net/team/team_mode_activebackup.c
new file mode 100644
index 0000000..1aa2bfb
--- /dev/null
+++ b/drivers/net/team/team_mode_activebackup.c
@@ -0,0 +1,152 @@
+/*
+ * net/drivers/team/team_mode_activebackup.c - Active-backup mode for team
+ * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/netdevice.h>
+#include <net/rtnetlink.h>
+#include <linux/if_team.h>
+
+struct ab_priv {
+	struct team_port __rcu *active_port;
+};
+
+static struct ab_priv *ab_priv(struct team *team)
+{
+	return (struct ab_priv *) &team->mode_priv;
+}
+
+static rx_handler_result_t ab_receive(struct team *team, struct team_port *port,
+				      struct sk_buff *skb) {
+	struct team_port *active_port;
+
+	active_port = rcu_dereference(ab_priv(team)->active_port);
+	if (active_port != port)
+		return RX_HANDLER_EXACT;
+	return RX_HANDLER_ANOTHER;
+}
+
+static bool ab_transmit(struct team *team, struct sk_buff *skb)
+{
+	struct team_port *active_port;
+
+	active_port = rcu_dereference(ab_priv(team)->active_port);
+	if (unlikely(!active_port))
+		goto drop;
+	skb->dev = active_port->dev;
+	if (dev_queue_xmit(skb))
+		return false;
+	return true;
+
+drop:
+	dev_kfree_skb(skb);
+	return false;
+}
+
+static void ab_port_leave(struct team *team, struct team_port *port)
+{
+	if (ab_priv(team)->active_port == port)
+		rcu_assign_pointer(ab_priv(team)->active_port, NULL);
+}
+
+static void ab_port_change_mac(struct team *team, struct team_port *port)
+{
+	if (ab_priv(team)->active_port == port)
+		team_port_set_team_mac(port);
+}
+
+static int ab_active_port_get(struct team *team, void *arg)
+{
+	u32 *ifindex = arg;
+
+	*ifindex = 0;
+	if (ab_priv(team)->active_port)
+		*ifindex = ab_priv(team)->active_port->dev->ifindex;
+	return 0;
+}
+
+static int ab_active_port_set(struct team *team, void *arg)
+{
+	u32 *ifindex = arg;
+	struct team_port *port;
+
+	list_for_each_entry_rcu(port, &team->port_list, list) {
+		if (port->dev->ifindex == *ifindex) {
+			struct team_port *ac_port = ab_priv(team)->active_port;
+
+			/* rtnl_lock needs to be held when setting macs */
+			rtnl_lock();
+			if (ac_port)
+				team_port_set_orig_mac(ac_port);
+			rcu_assign_pointer(ab_priv(team)->active_port, port);
+			team_port_set_team_mac(port);
+			rtnl_unlock();
+			return 0;
+		}
+	}
+	return -ENOENT;
+}
+
+static struct team_option ab_options[] = {
+	{
+		.name = "activeport",
+		.type = TEAM_OPTION_TYPE_U32,
+		.getter = ab_active_port_get,
+		.setter = ab_active_port_set,
+	},
+};
+
+int ab_init(struct team *team)
+{
+	team_options_register(team, ab_options, ARRAY_SIZE(ab_options));
+	return 0;
+}
+
+void ab_exit(struct team *team)
+{
+	team_options_unregister(team, ab_options, ARRAY_SIZE(ab_options));
+}
+
+static const struct team_mode_ops ab_mode_ops = {
+	.init			= ab_init,
+	.exit			= ab_exit,
+	.receive		= ab_receive,
+	.transmit		= ab_transmit,
+	.port_leave		= ab_port_leave,
+	.port_change_mac	= ab_port_change_mac,
+};
+
+static struct team_mode ab_mode = {
+	.kind		= "activebackup",
+	.owner		= THIS_MODULE,
+	.priv_size	= sizeof(struct ab_priv),
+	.ops		= &ab_mode_ops,
+};
+
+static int __init ab_init_module(void)
+{
+	return team_mode_register(&ab_mode);
+}
+
+static void __exit ab_cleanup_module(void)
+{
+	team_mode_unregister(&ab_mode);
+}
+
+module_init(ab_init_module);
+module_exit(ab_cleanup_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>");
+MODULE_DESCRIPTION("Active-backup mode for team");
+MODULE_ALIAS("team-mode-activebackup");
diff --git a/drivers/net/team/team_mode_roundrobin.c b/drivers/net/team/team_mode_roundrobin.c
new file mode 100644
index 0000000..0374052
--- /dev/null
+++ b/drivers/net/team/team_mode_roundrobin.c
@@ -0,0 +1,107 @@
+/*
+ * net/drivers/team/team_mode_roundrobin.c - Round-robin mode for team
+ * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/netdevice.h>
+#include <linux/if_team.h>
+
+struct rr_priv {
+	unsigned int sent_packets;
+};
+
+static struct rr_priv *rr_priv(struct team *team)
+{
+	return (struct rr_priv *) &team->mode_priv;
+}
+
+static struct team_port *__get_first_port_up(struct team *team,
+					     struct team_port *port)
+{
+	struct team_port *cur;
+
+	if (port->linkup)
+		return port;
+	cur = port;
+	list_for_each_entry_continue_rcu(cur, &team->port_list, list)
+		if (cur->linkup)
+			return cur;
+	list_for_each_entry_rcu(cur, &team->port_list, list) {
+		if (cur == port)
+			break;
+		if (cur->linkup)
+			return cur;
+	}
+	return NULL;
+}
+
+static bool rr_transmit(struct team *team, struct sk_buff *skb)
+{
+	struct team_port *port;
+	int port_index;
+
+	port_index = rr_priv(team)->sent_packets++ % team->port_count;
+	port = team_get_port_by_index_rcu(team, port_index);
+	port = __get_first_port_up(team, port);
+	if (unlikely(!port))
+		goto drop;
+	skb->dev = port->dev;
+	if (dev_queue_xmit(skb))
+		return false;
+	return true;
+
+drop:
+	dev_kfree_skb(skb);
+	return false;
+}
+
+static int rr_port_enter(struct team *team, struct team_port *port)
+{
+	return team_port_set_team_mac(port);
+}
+
+static void rr_port_change_mac(struct team *team, struct team_port *port)
+{
+	team_port_set_team_mac(port);
+}
+
+static const struct team_mode_ops rr_mode_ops = {
+	.transmit		= rr_transmit,
+	.port_enter		= rr_port_enter,
+	.port_change_mac	= rr_port_change_mac,
+};
+
+static struct team_mode rr_mode = {
+	.kind		= "roundrobin",
+	.owner		= THIS_MODULE,
+	.priv_size	= sizeof(struct rr_priv),
+	.ops		= &rr_mode_ops,
+};
+
+static int __init rr_init_module(void)
+{
+	return team_mode_register(&rr_mode);
+}
+
+static void __exit rr_cleanup_module(void)
+{
+	team_mode_unregister(&rr_mode);
+}
+
+module_init(rr_init_module);
+module_exit(rr_cleanup_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>");
+MODULE_DESCRIPTION("Round-robin mode for team");
+MODULE_ALIAS("team-mode-roundrobin");
diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index 619b565..0b091b3 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -185,6 +185,7 @@ header-y += if_pppol2tp.h
 header-y += if_pppox.h
 header-y += if_slip.h
 header-y += if_strip.h
+header-y += if_team.h
 header-y += if_tr.h
 header-y += if_tun.h
 header-y += if_tunnel.h
diff --git a/include/linux/if.h b/include/linux/if.h
index db20bd4..06b6ef6 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -79,6 +79,7 @@
 #define IFF_TX_SKB_SHARING	0x10000	/* The interface supports sharing
 					 * skbs on transmit */
 #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
+#define IFF_TEAM_PORT	0x40000		/* device used as team port */
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
 #define IF_GET_PROTO	0x0002
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
new file mode 100644
index 0000000..de395fc
--- /dev/null
+++ b/include/linux/if_team.h
@@ -0,0 +1,254 @@
+/*
+ * include/linux/if_team.h - Network team device driver header
+ * Copyright (c) 2011 Jiri Pirko <jpirko@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _LINUX_IF_TEAM_H_
+#define _LINUX_IF_TEAM_H_
+
+#ifdef __KERNEL__
+
+struct team_pcpu_stats {
+	u64			rx_packets;
+	u64			rx_bytes;
+	u64			rx_multicast;
+	u64			tx_packets;
+	u64			tx_bytes;
+	struct u64_stats_sync	syncp;
+	u32			rx_dropped;
+	u32			tx_dropped;
+};
+
+struct team;
+
+struct team_port {
+	struct net_device *dev;
+	struct hlist_node hlist; /* node in hash list */
+	struct list_head list; /* node in ordinary list */
+	struct team *team;
+	int index;
+
+	/*
+	 * A place for storing original values of the device before it
+	 * become a port.
+	 */
+	struct {
+		unsigned char dev_addr[MAX_ADDR_LEN];
+		unsigned int mtu;
+	} orig;
+
+	bool linkup;
+	u32 speed;
+	u8 duplex;
+
+	struct rcu_head rcu;
+};
+
+struct team_mode_ops {
+	int (*init)(struct team *team);
+	void (*exit)(struct team *team);
+	rx_handler_result_t (*receive)(struct team *team,
+				       struct team_port *port,
+				       struct sk_buff *skb);
+	bool (*transmit)(struct team *team, struct sk_buff *skb);
+	int (*port_enter)(struct team *team, struct team_port *port);
+	void (*port_leave)(struct team *team, struct team_port *port);
+	void (*port_change_mac)(struct team *team, struct team_port *port);
+};
+
+static inline void team_mode_ops_copy(struct team_mode_ops *dst,
+				      const struct team_mode_ops *src)
+{
+	dst->init		= src->init;
+	dst->exit		= src->exit;
+	dst->receive		= src->receive;
+	dst->transmit		= src->transmit;
+	dst->port_enter		= src->port_enter;
+	dst->port_leave		= src->port_leave;
+	dst->port_change_mac	= src->port_change_mac;
+}
+
+static inline void team_mode_ops_clear(struct team_mode_ops *dst)
+{
+	dst->init		= NULL;
+	dst->exit		= NULL;
+	dst->receive		= NULL;
+	dst->transmit		= NULL;
+	dst->port_enter		= NULL;
+	dst->port_leave		= NULL;
+	dst->port_change_mac	= NULL;
+}
+
+enum team_option_type {
+	TEAM_OPTION_TYPE_U32,
+	TEAM_OPTION_TYPE_STRING,
+};
+
+struct team_option {
+	struct list_head list;
+	const char *name;
+	enum team_option_type type;
+	int (*getter)(struct team *team, void *arg);
+	int (*setter)(struct team *team, void *arg);
+};
+
+struct team_mode {
+	struct list_head list;
+	const char *kind;
+	struct module *owner;
+	size_t priv_size;
+	const struct team_mode_ops *ops;
+};
+
+#define TEAM_PORT_HASHBITS 4
+#define TEAM_PORT_HASHENTRIES (1 << TEAM_PORT_HASHBITS)
+
+#define TEAM_MODE_PRIV_LONGS 4
+#define TEAM_MODE_PRIV_SIZE (sizeof(long) * TEAM_MODE_PRIV_LONGS)
+
+struct team {
+	struct net_device *dev; /* associated netdevice */
+	struct team_pcpu_stats __percpu *pcpu_stats;
+
+	spinlock_t lock; /* used for overall locking, e.g. port lists write */
+
+	/*
+	 * port lists with port count
+	 */
+	int port_count;
+	struct hlist_head port_hlist[TEAM_PORT_HASHENTRIES];
+	struct list_head port_list;
+
+	struct list_head option_list;
+
+	const char *mode_kind;
+	struct team_mode_ops mode_ops;
+	long mode_priv[TEAM_MODE_PRIV_LONGS];
+};
+
+static inline struct hlist_head *team_port_index_hash(struct team *team,
+						      int port_index)
+{
+	return &team->port_hlist[port_index & (TEAM_PORT_HASHENTRIES - 1)];
+}
+
+static inline struct team_port *team_get_port_by_index_rcu(struct team *team,
+							   int port_index)
+{
+	struct hlist_node *p;
+	struct team_port *port;
+	struct hlist_head *head = team_port_index_hash(team, port_index);
+
+	hlist_for_each_entry_rcu(port, p, head, hlist)
+		if (port->index == port_index)
+			return port;
+	return NULL;
+}
+
+extern int team_port_set_orig_mac(struct team_port *port);
+extern int team_port_set_team_mac(struct team_port *port);
+extern void team_options_register(struct team *team,
+				  struct team_option *option,
+				  size_t option_count);
+extern void team_options_unregister(struct team *team,
+				    struct team_option *option,
+				    size_t option_count);
+extern int team_mode_register(struct team_mode *mode);
+extern int team_mode_unregister(struct team_mode *mode);
+
+#endif /* __KERNEL__ */
+
+#define TEAM_STRING_MAX_LEN 32
+
+/**********************************
+ * NETLINK_GENERIC netlink family.
+ **********************************/
+
+enum {
+	TEAM_CMD_NOOP,
+	TEAM_CMD_OPTIONS_SET,
+	TEAM_CMD_OPTIONS_GET,
+	TEAM_CMD_PORT_LIST_GET,
+
+	__TEAM_CMD_MAX,
+	TEAM_CMD_MAX = (__TEAM_CMD_MAX - 1),
+};
+
+enum {
+	TEAM_ATTR_UNSPEC,
+	TEAM_ATTR_TEAM_IFINDEX,		/* u32 */
+	TEAM_ATTR_LIST_OPTION,		/* nest */
+	TEAM_ATTR_LIST_PORT,		/* nest */
+
+	__TEAM_ATTR_MAX,
+	TEAM_ATTR_MAX = __TEAM_ATTR_MAX - 1,
+};
+
+/* Nested layout of get/set msg:
+ *
+ *	[TEAM_ATTR_LIST_OPTION]
+ *		[TEAM_ATTR_ITEM_OPTION]
+ *			[TEAM_ATTR_OPTION_*], ...
+ *		[TEAM_ATTR_ITEM_OPTION]
+ *			[TEAM_ATTR_OPTION_*], ...
+ *		...
+ *	[TEAM_ATTR_LIST_PORT]
+ *		[TEAM_ATTR_ITEM_PORT]
+ *			[TEAM_ATTR_PORT_*], ...
+ *		[TEAM_ATTR_ITEM_PORT]
+ *			[TEAM_ATTR_PORT_*], ...
+ *		...
+ */
+
+enum {
+	TEAM_ATTR_ITEM_OPTION_UNSPEC,
+	TEAM_ATTR_ITEM_OPTION,		/* nest */
+
+	__TEAM_ATTR_ITEM_OPTION_MAX,
+	TEAM_ATTR_ITEM_OPTION_MAX = __TEAM_ATTR_ITEM_OPTION_MAX - 1,
+};
+
+enum {
+	TEAM_ATTR_OPTION_UNSPEC,
+	TEAM_ATTR_OPTION_NAME,		/* string */
+	TEAM_ATTR_OPTION_CHANGED,	/* flag */
+	TEAM_ATTR_OPTION_TYPE,		/* u8 */
+	TEAM_ATTR_OPTION_DATA,		/* dynamic */
+
+	__TEAM_ATTR_OPTION_MAX,
+	TEAM_ATTR_OPTION_MAX = __TEAM_ATTR_OPTION_MAX - 1,
+};
+
+enum {
+	TEAM_ATTR_ITEM_PORT_UNSPEC,
+	TEAM_ATTR_ITEM_PORT,		/* nest */
+
+	__TEAM_ATTR_ITEM_PORT_MAX,
+	TEAM_ATTR_ITEM_PORT_MAX = __TEAM_ATTR_ITEM_PORT_MAX - 1,
+};
+
+enum {
+	TEAM_ATTR_PORT_UNSPEC,
+	TEAM_ATTR_PORT_IFINDEX,		/* u32 */
+	TEAM_ATTR_PORT_CHANGED,		/* flag */
+	TEAM_ATTR_PORT_LINKUP,		/* flag */
+	TEAM_ATTR_PORT_SPEED,		/* u32 */
+	TEAM_ATTR_PORT_DUPLEX,		/* u8 */
+
+	__TEAM_ATTR_PORT_MAX,
+	TEAM_ATTR_PORT_MAX = __TEAM_ATTR_PORT_MAX - 1,
+};
+
+/*
+ * NETLINK_GENERIC related info
+ */
+#define TEAM_GENL_NAME "team"
+#define TEAM_GENL_VERSION 0x1
+#define TEAM_GENL_CHANGE_EVENT_MC_GRP_NAME "change_event"
+
+#endif /* _LINUX_IF_TEAM_H_ */
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index d079290..7586b2c 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
 	     pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
 
 /**
+ * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ *
+ * Start to iterate over list of given type backwards, continuing after
+ * the current position.
+ */
+#define list_for_each_entry_continue_reverse_rcu(pos, head, member)	\
+	for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
+	     &pos->member != (head);	\
+	     pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
+
+/**
  * hlist_del_rcu - deletes entry from hash list without re-initialization
  * @n: the element to delete from the hash list.
  *
-- 
1.7.6

^ permalink raw reply related

* Re: [patch net-next V2] net: introduce ethernet teaming device
From: Eric Dumazet @ 2011-10-23  8:43 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, bhutchings, shemminger, fubar, andy, tgraf,
	ebiederm, mirqus, kaber, greearb, jesse, fbl, benjamin.poirier,
	jzupka
In-Reply-To: <20111023082545.GA15908@minipsycho.orion>

Le dimanche 23 octobre 2011 à 10:25 +0200, Jiri Pirko a écrit :
> Sat, Oct 22, 2011 at 06:51:22PM CEST, eric.dumazet@gmail.com wrote:
> >Le samedi 22 octobre 2011 à 17:13 +0200, Jiri Pirko a écrit :
> >> >> +
> >> >> +/************************
> >> >> + * Rx path frame handler
> >> >> + ************************/
> >> >> +
> >> >> +/* note: already called with rcu_read_lock */
> >> >> +static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
> >> >> +{
> >> >> +	struct sk_buff *skb = *pskb;
> >> >> +	struct team_port *port;
> >> >> +	struct team *team;
> >> >> +	rx_handler_result_t res = RX_HANDLER_ANOTHER;
> >> >> +
> >> >> +	skb = skb_share_check(skb, GFP_ATOMIC);
> >> >> +	if (!skb)
> >> >> +		return RX_HANDLER_CONSUMED;
> >> >> +
> >> >> +	*pskb = skb;
> >> >> +
> >> >> +	port = team_port_get_rcu(skb->dev);
> >> >> +	team = port->team;
> >> >> +
> >> >> +	if (team->mode_ops.receive)
> >> >
> >> >Hmm, you need ACCESS_ONCE() here or rcu_dereference()
> >> >
> >> >See commit 4d97480b1806e883eb (bonding: use local function pointer of
> >> >bond->recv_probe in bond_handle_frame) for reference
> >> 
> >> I do not think so. Because mode_ops.receive changes only from
> >> __team_change_mode() and this can be called only in case no ports are in
> >> team. And team_port_del() calls synchronize_rcu().
> >> 
> >
> >
> >
> >We are used to code following this template :
> >
> >if (ops->handler)
> >	ops->handler(arguments);
> >
> >But this is valid only because ops points to constant memory.
> >
> >
> >In your case, we really see its not true, dont try to pretend its safe.
> 
> Please forgive me, it's possible I'm missing something. But I see no way how
> team->mode_ops.receive can change during team_handle_frame (holding
> rcu_read_lock) for the reason I stated earlier.
> 
> team_port_del() calls netdev_rx_handler_unregister() and after that it
> calls synchronize_rcu(). That ensures that by the finish of team_port_del()
> run, team_handle_frame() is not called for this port anymore.
> 
> And this combined with "if (!list_empty(&team->port_list))" check in
> team_change_mode() ensures safety.
> 
> Of course team_port_del() and team_change_mode() are both protected by
> team->lock so they are mutually excluded.

Then, why even testing (team->mode_ops.receive) being NULL at the first
place, if you are sure no packets can flight meeting this NULL pointer ?

Something is flawed in the logic.

^ permalink raw reply

* Re: [patch net-next V2] net: introduce ethernet teaming device
From: Jiri Pirko @ 2011-10-23  8:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, davem, bhutchings, shemminger, fubar, andy, tgraf,
	ebiederm, mirqus, kaber, greearb, jesse, fbl, benjamin.poirier,
	jzupka
In-Reply-To: <1319359437.6180.73.camel@edumazet-laptop>

Sun, Oct 23, 2011 at 10:43:57AM CEST, eric.dumazet@gmail.com wrote:
>Le dimanche 23 octobre 2011 à 10:25 +0200, Jiri Pirko a écrit :
>> Sat, Oct 22, 2011 at 06:51:22PM CEST, eric.dumazet@gmail.com wrote:
>> >Le samedi 22 octobre 2011 à 17:13 +0200, Jiri Pirko a écrit :
>> >> >> +
>> >> >> +/************************
>> >> >> + * Rx path frame handler
>> >> >> + ************************/
>> >> >> +
>> >> >> +/* note: already called with rcu_read_lock */
>> >> >> +static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
>> >> >> +{
>> >> >> +	struct sk_buff *skb = *pskb;
>> >> >> +	struct team_port *port;
>> >> >> +	struct team *team;
>> >> >> +	rx_handler_result_t res = RX_HANDLER_ANOTHER;
>> >> >> +
>> >> >> +	skb = skb_share_check(skb, GFP_ATOMIC);
>> >> >> +	if (!skb)
>> >> >> +		return RX_HANDLER_CONSUMED;
>> >> >> +
>> >> >> +	*pskb = skb;
>> >> >> +
>> >> >> +	port = team_port_get_rcu(skb->dev);
>> >> >> +	team = port->team;
>> >> >> +
>> >> >> +	if (team->mode_ops.receive)
>> >> >
>> >> >Hmm, you need ACCESS_ONCE() here or rcu_dereference()
>> >> >
>> >> >See commit 4d97480b1806e883eb (bonding: use local function pointer of
>> >> >bond->recv_probe in bond_handle_frame) for reference
>> >> 
>> >> I do not think so. Because mode_ops.receive changes only from
>> >> __team_change_mode() and this can be called only in case no ports are in
>> >> team. And team_port_del() calls synchronize_rcu().
>> >> 
>> >
>> >
>> >
>> >We are used to code following this template :
>> >
>> >if (ops->handler)
>> >	ops->handler(arguments);
>> >
>> >But this is valid only because ops points to constant memory.
>> >
>> >
>> >In your case, we really see its not true, dont try to pretend its safe.
>> 
>> Please forgive me, it's possible I'm missing something. But I see no way how
>> team->mode_ops.receive can change during team_handle_frame (holding
>> rcu_read_lock) for the reason I stated earlier.
>> 
>> team_port_del() calls netdev_rx_handler_unregister() and after that it
>> calls synchronize_rcu(). That ensures that by the finish of team_port_del()
>> run, team_handle_frame() is not called for this port anymore.
>> 
>> And this combined with "if (!list_empty(&team->port_list))" check in
>> team_change_mode() ensures safety.
>> 
>> Of course team_port_del() and team_change_mode() are both protected by
>> team->lock so they are mutually excluded.
>
>Then, why even testing (team->mode_ops.receive) being NULL at the first
>place, if you are sure no packets can flight meeting this NULL pointer ?
>
>Something is flawed in the logic.

It's not :) The test is simply because a mode may not implement this
callback (actually "roundrobin" mode doesn't have this implemented).

Jirka
>
>
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox