Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 3/7] flow: allocate hash table for online cpus only
From: Rusty Russell @ 2010-03-31 12:32 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev
In-Reply-To: <20100330121255.GC5731@gondor.apana.org.au>

On Tue, 30 Mar 2010 10:42:55 pm Herbert Xu wrote:
> On Mon, Mar 29, 2010 at 05:12:40PM +0300, Timo Teras wrote:
> > Instead of unconditionally allocating hash table for all possible
> > cpu's, allocate it only for online cpu's and release related
> > memory if cpu goes down.
> > 
> > Signed-off-by: Timo Teras <timo.teras@iki.fi>
> 
> Hmm that's where we started but then Rusty changed it back in 2004:
> 
> commit 0a32dc4d8e83c48f7535d66731eb35d1916b39a8
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date:   Wed Jan 21 18:14:37 2004 -0800
> 
>     [NET]: Simplify net/flow.c per-cpu handling.
> 
>     The cpu handling in net/core/flow.c is complex: it tries to allocate
>     flow cache as each CPU comes up.  It might as well allocate them for
>     each possible CPU at boot.
> 
> So I'd like to hear his opinion on changing it back again.

It was pretty unique at the time, it no longer is, so the arguments are less
compelling IMHO.

However, we can now use a dynamic percpu variable and get it as a real
per-cpu thing (which currently means it *will* be for every available cpu,
not just online ones).  Haven't thought about it, but that change might be
worth considering instead?

Thanks!
Rusty.

^ permalink raw reply

* [PATCH] net/pcmcia/3c589_cs: using netdev_info and friends where appropriate
From: linux @ 2010-03-31 12:42 UTC (permalink / raw)
  To: Joe Perches
  Cc: David S. Miller, Ken Kawasaki, Dominik Brodowski, Magnus Damm,
	Ben Hutchings, netdev, linux-kernel, linux
In-Reply-To: <1269969733.4558.171.camel@Joe-Laptop.home>

From: Alexander Kurz <linux@kbdbabel.org>

Signed-off-by: Alexander Kurz <linux@kbdbabel.org>
---
 drivers/net/pcmcia/3c589_cs.c |  286 +++++++++++++++++++++--------------------
 1 files changed, 147 insertions(+), 139 deletions(-)

diff --git a/drivers/net/pcmcia/3c589_cs.c b/drivers/net/pcmcia/3c589_cs.c
index 091e0b0..580977f 100644
--- a/drivers/net/pcmcia/3c589_cs.c
+++ b/drivers/net/pcmcia/3c589_cs.c
@@ -1,20 +1,20 @@
 /*======================================================================
 
     A PCMCIA ethernet driver for the 3com 3c589 card.
-    
+
     Copyright (C) 1999 David A. Hinds -- dahinds@users.sourceforge.net
 
     3c589_cs.c 1.162 2001/10/13 00:08:50
 
     The network driver code is based on Donald Becker's 3c589 code:
-    
+
     Written 1994 by Donald Becker.
     Copyright 1993 United States Government as represented by the
     Director, National Security Agency.  This software may be used and
     distributed according to the terms of the GNU General Public License,
     incorporated herein by reference.
     Donald Becker may be reached at becker@scyld.com
-    
+
     Updated for 2.5.x by Alan Cox <alan@lxorguk.ukuu.org.uk>
 
 ======================================================================*/
@@ -69,31 +69,54 @@
 /* The top five bits written to EL3_CMD are a command, the lower
    11 bits are the parameter, if applicable. */
 enum c509cmd {
-    TotalReset = 0<<11, SelectWindow = 1<<11, StartCoax = 2<<11,
-    RxDisable = 3<<11, RxEnable = 4<<11, RxReset = 5<<11, RxDiscard = 8<<11,
-    TxEnable = 9<<11, TxDisable = 10<<11, TxReset = 11<<11,
-    FakeIntr = 12<<11, AckIntr = 13<<11, SetIntrEnb = 14<<11,
-    SetStatusEnb = 15<<11, SetRxFilter = 16<<11, SetRxThreshold = 17<<11,
-    SetTxThreshold = 18<<11, SetTxStart = 19<<11, StatsEnable = 21<<11,
-    StatsDisable = 22<<11, StopCoax = 23<<11,
+	TotalReset	= 0<<11,
+	SelectWindow	= 1<<11,
+	StartCoax	= 2<<11,
+	RxDisable	= 3<<11,
+	RxEnable	= 4<<11,
+	RxReset		= 5<<11,
+	RxDiscard	= 8<<11,
+	TxEnable	= 9<<11,
+	TxDisable	= 10<<11,
+	TxReset		= 11<<11,
+	FakeIntr	= 12<<11,
+	AckIntr		= 13<<11,
+	SetIntrEnb	= 14<<11,
+	SetStatusEnb	= 15<<11,
+	SetRxFilter	= 16<<11,
+	SetRxThreshold	= 17<<11,
+	SetTxThreshold	= 18<<11,
+	SetTxStart	= 19<<11,
+	StatsEnable	= 21<<11,
+	StatsDisable	= 22<<11,
+	StopCoax	= 23<<11
 };
 
 enum c509status {
-    IntLatch = 0x0001, AdapterFailure = 0x0002, TxComplete = 0x0004,
-    TxAvailable = 0x0008, RxComplete = 0x0010, RxEarly = 0x0020,
-    IntReq = 0x0040, StatsFull = 0x0080, CmdBusy = 0x1000
+	IntLatch	= 0x0001,
+	AdapterFailure	= 0x0002,
+	TxComplete	= 0x0004,
+	TxAvailable	= 0x0008,
+	RxComplete	= 0x0010,
+	RxEarly		= 0x0020,
+	IntReq		= 0x0040,
+	StatsFull	= 0x0080,
+	CmdBusy		= 0x1000
 };
 
 /* The SetRxFilter command accepts the following classes: */
 enum RxFilter {
-    RxStation = 1, RxMulticast = 2, RxBroadcast = 4, RxProm = 8
+	RxStation	= 1,
+	RxMulticast	= 2,
+	RxBroadcast	= 4,
+	RxProm		= 8
 };
 
 /* Register window 1 offsets, the window used in normal operation. */
 #define TX_FIFO		0x00
 #define RX_FIFO		0x00
-#define RX_STATUS 	0x08
-#define TX_STATUS 	0x0B
+#define RX_STATUS	0x08
+#define TX_STATUS	0x0B
 #define TX_FREE		0x0C	/* Remaining free bytes in Tx buffer. */
 
 #define WN0_IRQ		0x08	/* Window 0: Set IRQ line in bits 12-15. */
@@ -106,13 +129,13 @@ enum RxFilter {
 
 struct el3_private {
 	struct pcmcia_device	*p_dev;
-    dev_node_t 		node;
-    /* For transceiver monitoring */
-    struct timer_list	media;
-    u16			media_status;
-    u16			fast_poll;
-    unsigned long	last_irq;
-    spinlock_t		lock;
+	dev_node_t		node;
+	/* For transceiver monitoring */
+	struct timer_list	media;
+	u16			media_status;
+	u16			fast_poll;
+	unsigned long		last_irq;
+	spinlock_t		lock;
 };
 
 static const char *if_names[] = { "auto", "10baseT", "10base2", "AUI" };
@@ -164,15 +187,15 @@ static void tc589_detach(struct pcmcia_device *p_dev);
 ======================================================================*/
 
 static const struct net_device_ops el3_netdev_ops = {
-	.ndo_open 		= el3_open,
-	.ndo_stop 		= el3_close,
+	.ndo_open		= el3_open,
+	.ndo_stop		= el3_close,
 	.ndo_start_xmit		= el3_start_xmit,
-	.ndo_tx_timeout 	= el3_tx_timeout,
+	.ndo_tx_timeout		= el3_tx_timeout,
 	.ndo_set_config		= el3_config,
 	.ndo_get_stats		= el3_get_stats,
 	.ndo_set_multicast_list = set_multicast_list,
 	.ndo_change_mtu		= eth_change_mtu,
-	.ndo_set_mac_address 	= eth_mac_addr,
+	.ndo_set_mac_address	= eth_mac_addr,
 	.ndo_validate_addr	= eth_validate_addr,
 };
 
@@ -236,7 +259,7 @@ static void tc589_detach(struct pcmcia_device *link)
     tc589_config() is scheduled to run after a CARD_INSERTION event
     is received, to configure the PCMCIA socket, and to make the
     ethernet device available to the system.
-    
+
 ======================================================================*/
 
 static int tc589_config(struct pcmcia_device *link)
@@ -249,7 +272,7 @@ static int tc589_config(struct pcmcia_device *link)
     char *ram_split[] = {"5:3", "3:1", "1:1", "3:5"};
     u8 *buf;
     size_t len;
-    
+
     dev_dbg(&link->dev, "3c589_config\n");
 
     phys_addr = (__be16 *)dev->dev_addr;
@@ -278,7 +301,7 @@ static int tc589_config(struct pcmcia_device *link)
     ret = pcmcia_request_configuration(link, &link->conf);
     if (ret)
 	    goto failed;
-	
+
     dev->irq = link->irq.AssignedIRQ;
     dev->base_addr = link->io.BasePort1;
     ioaddr = dev->base_addr;
@@ -312,7 +335,7 @@ static int tc589_config(struct pcmcia_device *link)
 	dev->if_port = if_port;
     else
 	printk(KERN_ERR "3c589_cs: invalid if_port requested\n");
-    
+
     link->dev_node = &lp->node;
     SET_NETDEV_DEV(dev, &link->dev);
 
@@ -324,13 +347,12 @@ static int tc589_config(struct pcmcia_device *link)
 
     strcpy(lp->node.dev_name, dev->name);
 
-    printk(KERN_INFO "%s: 3Com 3c%s, io %#3lx, irq %d, "
-	   "hw_addr %pM\n",
-	   dev->name, (multi ? "562" : "589"), dev->base_addr, dev->irq,
-	   dev->dev_addr);
-    printk(KERN_INFO "  %dK FIFO split %s Rx:Tx, %s xcvr\n",
-	   (fifo & 7) ? 32 : 8, ram_split[(fifo >> 16) & 3],
-	   if_names[dev->if_port]);
+    netdev_info(dev, "3Com 3c%s, io %#3lx, irq %d, hw_addr %pM\n",
+		(multi ? "562" : "589"), dev->base_addr, dev->irq,
+		dev->dev_addr);
+    netdev_info(dev, "  %dK FIFO split %s Rx:Tx, %s xcvr\n",
+		(fifo & 7) ? 32 : 8, ram_split[(fifo >> 16) & 3],
+		if_names[dev->if_port]);
     return 0;
 
 failed:
@@ -343,7 +365,7 @@ failed:
     After a card is removed, tc589_release() will unregister the net
     device, and release the PCMCIA configuration.  If the device is
     still open, this will be postponed until it is closed.
-    
+
 ======================================================================*/
 
 static void tc589_release(struct pcmcia_device *link)
@@ -365,7 +387,7 @@ static int tc589_resume(struct pcmcia_device *link)
 {
 	struct net_device *dev = link->priv;
 
- 	if (link->open) {
+	if (link->open) {
 		tc589_reset(dev);
 		netif_device_attach(dev);
 	}
@@ -385,8 +407,7 @@ static void tc589_wait_for_completion(struct net_device *dev, int cmd)
     while (--i > 0)
 	if (!(inw(dev->base_addr + EL3_STATUS) & 0x1000)) break;
     if (i == 0)
-	printk(KERN_WARNING "%s: command 0x%04x did not complete!\n",
-	       dev->name, cmd);
+	netdev_warn(dev, "command 0x%04x did not complete!\n", cmd);
 }
 
 /*
@@ -412,7 +433,7 @@ static void tc589_set_xcvr(struct net_device *dev, int if_port)
 {
     struct el3_private *lp = netdev_priv(dev);
     unsigned int ioaddr = dev->base_addr;
-    
+
     EL3WINDOW(0);
     switch (if_port) {
     case 0: case 1: outw(0, ioaddr + 6); break;
@@ -435,14 +456,13 @@ static void dump_status(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
     EL3WINDOW(1);
-    printk(KERN_INFO "  irq status %04x, rx status %04x, tx status "
-	   "%02x  tx free %04x\n", inw(ioaddr+EL3_STATUS),
-	   inw(ioaddr+RX_STATUS), inb(ioaddr+TX_STATUS),
-	   inw(ioaddr+TX_FREE));
+    netdev_info(dev, "  irq status %04x, rx status %04x, tx status %02x  tx free %04x\n",
+		inw(ioaddr+EL3_STATUS), inw(ioaddr+RX_STATUS),
+		inb(ioaddr+TX_STATUS), inw(ioaddr+TX_FREE));
     EL3WINDOW(4);
-    printk(KERN_INFO "  diagnostics: fifo %04x net %04x ethernet %04x"
-	   " media %04x\n", inw(ioaddr+0x04), inw(ioaddr+0x06),
-	   inw(ioaddr+0x08), inw(ioaddr+0x0a));
+    netdev_info(dev, "  diagnostics: fifo %04x net %04x ethernet %04x media %04x\n",
+		inw(ioaddr+0x04), inw(ioaddr+0x06), inw(ioaddr+0x08),
+		inw(ioaddr+0x0a));
     EL3WINDOW(1);
 }
 
@@ -451,18 +471,18 @@ static void tc589_reset(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
     int i;
-    
+
     EL3WINDOW(0);
-    outw(0x0001, ioaddr + 4);			/* Activate board. */ 
+    outw(0x0001, ioaddr + 4);			/* Activate board. */
     outw(0x3f00, ioaddr + 8);			/* Set the IRQ line. */
-    
+
     /* Set the station address in window 2. */
     EL3WINDOW(2);
     for (i = 0; i < 6; i++)
 	outb(dev->dev_addr[i], ioaddr + i);
 
     tc589_set_xcvr(dev, dev->if_port);
-    
+
     /* Switch to the stats window, and clear all stats by reading. */
     outw(StatsDisable, ioaddr + EL3_CMD);
     EL3WINDOW(6);
@@ -470,7 +490,7 @@ static void tc589_reset(struct net_device *dev)
 	inb(ioaddr+i);
     inw(ioaddr + 10);
     inw(ioaddr + 12);
-    
+
     /* Switch to register set 1 for normal use. */
     EL3WINDOW(1);
 
@@ -504,8 +524,7 @@ static int el3_config(struct net_device *dev, struct ifmap *map)
     if ((map->port != (u_char)(-1)) && (map->port != dev->if_port)) {
 	if (map->port <= 3) {
 	    dev->if_port = map->port;
-	    printk(KERN_INFO "%s: switched to %s port\n",
-		   dev->name, if_names[dev->if_port]);
+	    netdev_info(dev, "switched to %s port\n", if_names[dev->if_port]);
 	    tc589_set_xcvr(dev, dev->if_port);
 	} else
 	    return -EINVAL;
@@ -517,13 +536,13 @@ static int el3_open(struct net_device *dev)
 {
     struct el3_private *lp = netdev_priv(dev);
     struct pcmcia_device *link = lp->p_dev;
-    
+
     if (!pcmcia_dev_present(link))
 	return -ENODEV;
 
     link->open++;
     netif_start_queue(dev);
-    
+
     tc589_reset(dev);
     init_timer(&lp->media);
     lp->media.function = &media_check;
@@ -533,15 +552,15 @@ static int el3_open(struct net_device *dev)
 
     dev_dbg(&link->dev, "%s: opened, status %4.4x.\n",
 	  dev->name, inw(dev->base_addr + EL3_STATUS));
-    
+
     return 0;
 }
 
 static void el3_tx_timeout(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
-    
-    printk(KERN_WARNING "%s: Transmit timed out!\n", dev->name);
+
+    netdev_warn(dev, "Transmit timed out!\n");
     dump_status(dev);
     dev->stats.tx_errors++;
     dev->trans_start = jiffies;
@@ -555,19 +574,18 @@ static void pop_tx_status(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
     int i;
-    
+
     /* Clear the Tx status stack. */
     for (i = 32; i > 0; i--) {
 	u_char tx_status = inb(ioaddr + TX_STATUS);
 	if (!(tx_status & 0x84)) break;
 	/* reset transmitter on jabber error or underrun */
 	if (tx_status & 0x30)
-	    tc589_wait_for_completion(dev, TxReset);
+		tc589_wait_for_completion(dev, TxReset);
 	if (tx_status & 0x38) {
-	    pr_debug("%s: transmit error: status 0x%02x\n",
-		  dev->name, tx_status);
-	    outw(TxEnable, ioaddr + EL3_CMD);
-	    dev->stats.tx_aborted_errors++;
+		netdev_dbg(dev, "transmit error: status 0x%02x\n", tx_status);
+		outw(TxEnable, ioaddr + EL3_CMD);
+		dev->stats.tx_aborted_errors++;
 	}
 	outb(0x00, ioaddr + TX_STATUS); /* Pop the status stack. */
     }
@@ -580,11 +598,10 @@ static netdev_tx_t el3_start_xmit(struct sk_buff *skb,
     struct el3_private *priv = netdev_priv(dev);
     unsigned long flags;
 
-    pr_debug("%s: el3_start_xmit(length = %ld) called, "
-	  "status %4.4x.\n", dev->name, (long)skb->len,
-	  inw(ioaddr + EL3_STATUS));
+    netdev_dbg(dev, "el3_start_xmit(length = %ld) called, status %4.4x.\n",
+	       (long)skb->len, inw(ioaddr + EL3_STATUS));
 
-    spin_lock_irqsave(&priv->lock, flags);    
+    spin_lock_irqsave(&priv->lock, flags);
 
     dev->stats.tx_bytes += skb->len;
 
@@ -602,9 +619,9 @@ static netdev_tx_t el3_start_xmit(struct sk_buff *skb,
     }
 
     pop_tx_status(dev);
-    spin_unlock_irqrestore(&priv->lock, flags);    
+    spin_unlock_irqrestore(&priv->lock, flags);
     dev_kfree_skb(skb);
-    
+
     return NETDEV_TX_OK;
 }
 
@@ -616,37 +633,32 @@ static irqreturn_t el3_interrupt(int irq, void *dev_id)
     unsigned int ioaddr;
     __u16 status;
     int i = 0, handled = 1;
-    
+
     if (!netif_device_present(dev))
 	return IRQ_NONE;
 
     ioaddr = dev->base_addr;
 
-    pr_debug("%s: interrupt, status %4.4x.\n",
-	  dev->name, inw(ioaddr + EL3_STATUS));
+    netdev_dbg(dev, "interrupt, status %4.4x.\n", inw(ioaddr + EL3_STATUS));
 
-    spin_lock(&lp->lock);    
+    spin_lock(&lp->lock);
     while ((status = inw(ioaddr + EL3_STATUS)) &
 	(IntLatch | RxComplete | StatsFull)) {
 	if ((status & 0xe000) != 0x2000) {
-	    pr_debug("%s: interrupt from dead card\n", dev->name);
-	    handled = 0;
-	    break;
+		netdev_dbg(dev, "interrupt from dead card\n");
+		handled = 0;
+		break;
 	}
-	
 	if (status & RxComplete)
-	    el3_rx(dev);
-	
+		el3_rx(dev);
 	if (status & TxAvailable) {
-	    pr_debug("    TX room bit was handled.\n");
-	    /* There's room in the FIFO for a full-sized packet. */
-	    outw(AckIntr | TxAvailable, ioaddr + EL3_CMD);
-	    netif_wake_queue(dev);
+		netdev_dbg(dev, "    TX room bit was handled.\n");
+		/* There's room in the FIFO for a full-sized packet. */
+		outw(AckIntr | TxAvailable, ioaddr + EL3_CMD);
+		netif_wake_queue(dev);
 	}
-	
 	if (status & TxComplete)
-	    pop_tx_status(dev);
-
+		pop_tx_status(dev);
 	if (status & (AdapterFailure | RxEarly | StatsFull)) {
 	    /* Handle all uncommon interrupts. */
 	    if (status & StatsFull)		/* Empty statistics. */
@@ -660,8 +672,8 @@ static irqreturn_t el3_interrupt(int irq, void *dev_id)
 		EL3WINDOW(4);
 		fifo_diag = inw(ioaddr + 4);
 		EL3WINDOW(1);
-		printk(KERN_WARNING "%s: adapter failure, FIFO diagnostic"
-		       " register %04x.\n", dev->name, fifo_diag);
+		netdev_warn(dev, "adapter failure, FIFO diagnostic register %04x.\n",
+			    fifo_diag);
 		if (fifo_diag & 0x0400) {
 		    /* Tx overrun */
 		    tc589_wait_for_completion(dev, TxReset);
@@ -676,22 +688,20 @@ static irqreturn_t el3_interrupt(int irq, void *dev_id)
 		outw(AckIntr | AdapterFailure, ioaddr + EL3_CMD);
 	    }
 	}
-	
 	if (++i > 10) {
-	    printk(KERN_ERR "%s: infinite loop in interrupt, "
-		   "status %4.4x.\n", dev->name, status);
-	    /* Clear all interrupts */
-	    outw(AckIntr | 0xFF, ioaddr + EL3_CMD);
-	    break;
+		netdev_err(dev, "infinite loop in interrupt, status %4.4x.\n",
+			   status);
+		/* Clear all interrupts */
+		outw(AckIntr | 0xFF, ioaddr + EL3_CMD);
+		break;
 	}
 	/* Acknowledge the IRQ. */
 	outw(AckIntr | IntReq | IntLatch, ioaddr + EL3_CMD);
     }
-
     lp->last_irq = jiffies;
-    spin_unlock(&lp->lock);    
-    pr_debug("%s: exiting interrupt, status %4.4x.\n",
-	  dev->name, inw(ioaddr + EL3_STATUS));
+    spin_unlock(&lp->lock);
+    netdev_dbg(dev, "exiting interrupt, status %4.4x.\n",
+	       inw(ioaddr + EL3_STATUS));
     return IRQ_RETVAL(handled);
 }
 
@@ -710,7 +720,7 @@ static void media_check(unsigned long arg)
     if ((inw(ioaddr + EL3_STATUS) & IntLatch) &&
 	(inb(ioaddr + EL3_TIMER) == 0xff)) {
 	if (!lp->fast_poll)
-	    printk(KERN_WARNING "%s: interrupt(s) dropped!\n", dev->name);
+		netdev_warn(dev, "interrupt(s) dropped!\n");
 
 	local_irq_save(flags);
 	el3_interrupt(dev->irq, dev);
@@ -727,7 +737,7 @@ static void media_check(unsigned long arg)
 
     /* lp->lock guards the EL3 window. Window should always be 1 except
        when the lock is held */
-    spin_lock_irqsave(&lp->lock, flags);    
+    spin_lock_irqsave(&lp->lock, flags);
     EL3WINDOW(4);
     media = inw(ioaddr+WN4_MEDIA) & 0xc810;
 
@@ -747,32 +757,30 @@ static void media_check(unsigned long arg)
     if (media != lp->media_status) {
 	if ((media & lp->media_status & 0x8000) &&
 	    ((lp->media_status ^ media) & 0x0800))
-	    printk(KERN_INFO "%s: %s link beat\n", dev->name,
-		   (lp->media_status & 0x0800 ? "lost" : "found"));
+		netdev_info(dev, "%s link beat\n",
+			    (lp->media_status & 0x0800 ? "lost" : "found"));
 	else if ((media & lp->media_status & 0x4000) &&
 		 ((lp->media_status ^ media) & 0x0010))
-	    printk(KERN_INFO "%s: coax cable %s\n", dev->name,
-		   (lp->media_status & 0x0010 ? "ok" : "problem"));
+		netdev_info(dev, "coax cable %s\n",
+			    (lp->media_status & 0x0010 ? "ok" : "problem"));
 	if (dev->if_port == 0) {
 	    if (media & 0x8000) {
 		if (media & 0x0800)
-		    printk(KERN_INFO "%s: flipped to 10baseT\n",
-			   dev->name);
+			netdev_info(dev, "flipped to 10baseT\n");
 		else
-		    tc589_set_xcvr(dev, 2);
+			tc589_set_xcvr(dev, 2);
 	    } else if (media & 0x4000) {
 		if (media & 0x0010)
 		    tc589_set_xcvr(dev, 1);
 		else
-		    printk(KERN_INFO "%s: flipped to 10base2\n",
-			   dev->name);
+		    netdev_info(dev, "flipped to 10base2\n");
 	    }
 	}
 	lp->media_status = media;
     }
-    
+
     EL3WINDOW(1);
-    spin_unlock_irqrestore(&lp->lock, flags);    
+    spin_unlock_irqrestore(&lp->lock, flags);
 
 reschedule:
     lp->media.expires = jiffies + HZ;
@@ -786,7 +794,7 @@ static struct net_device_stats *el3_get_stats(struct net_device *dev)
     struct pcmcia_device *link = lp->p_dev;
 
     if (pcmcia_dev_present(link)) {
-    	spin_lock_irqsave(&lp->lock, flags);
+	spin_lock_irqsave(&lp->lock, flags);
 	update_stats(dev);
 	spin_unlock_irqrestore(&lp->lock, flags);
     }
@@ -798,21 +806,21 @@ static struct net_device_stats *el3_get_stats(struct net_device *dev)
   single-threaded if the device is active. This is expected to be a rare
   operation, and it's simpler for the rest of the driver to assume that
   window 1 is always valid rather than use a special window-state variable.
-  
+
   Caller must hold the lock for this
 */
 static void update_stats(struct net_device *dev)
 {
     unsigned int ioaddr = dev->base_addr;
 
-    pr_debug("%s: updating the statistics.\n", dev->name);
+    netdev_dbg(dev, "updating the statistics.\n");
     /* Turn off statistics updates while reading. */
     outw(StatsDisable, ioaddr + EL3_CMD);
     /* Switch to the stats window, and read everything. */
     EL3WINDOW(6);
-    dev->stats.tx_carrier_errors 	+= inb(ioaddr + 0);
+    dev->stats.tx_carrier_errors	+= inb(ioaddr + 0);
     dev->stats.tx_heartbeat_errors	+= inb(ioaddr + 1);
-    /* Multiple collisions. */	   	inb(ioaddr + 2);
+    /* Multiple collisions. */		inb(ioaddr + 2);
     dev->stats.collisions		+= inb(ioaddr + 3);
     dev->stats.tx_window_errors		+= inb(ioaddr + 4);
     dev->stats.rx_fifo_errors		+= inb(ioaddr + 5);
@@ -821,7 +829,7 @@ static void update_stats(struct net_device *dev)
     /* Tx deferrals */			inb(ioaddr + 8);
     /* Rx octets */			inw(ioaddr + 10);
     /* Tx octets */			inw(ioaddr + 12);
-    
+
     /* Back to window 1, and turn statistics back on. */
     EL3WINDOW(1);
     outw(StatsEnable, ioaddr + EL3_CMD);
@@ -832,9 +840,9 @@ static int el3_rx(struct net_device *dev)
     unsigned int ioaddr = dev->base_addr;
     int worklimit = 32;
     short rx_status;
-    
-    pr_debug("%s: in rx_packet(), status %4.4x, rx_status %4.4x.\n",
-	  dev->name, inw(ioaddr+EL3_STATUS), inw(ioaddr+RX_STATUS));
+
+    netdev_dbg(dev, "in rx_packet(), status %4.4x, rx_status %4.4x.\n",
+	       inw(ioaddr+EL3_STATUS), inw(ioaddr+RX_STATUS));
     while (!((rx_status = inw(ioaddr + RX_STATUS)) & 0x8000) &&
 		    worklimit > 0) {
 	worklimit--;
@@ -852,11 +860,11 @@ static int el3_rx(struct net_device *dev)
 	} else {
 	    short pkt_len = rx_status & 0x7ff;
 	    struct sk_buff *skb;
-	    
+
 	    skb = dev_alloc_skb(pkt_len+5);
-	    
-	    pr_debug("    Receiving packet size %d status %4.4x.\n",
-		  pkt_len, rx_status);
+
+	    netdev_dbg(dev, "    Receiving packet size %d status %4.4x.\n",
+		       pkt_len, rx_status);
 	    if (skb != NULL) {
 		skb_reserve(skb, 2);
 		insl(ioaddr+RX_FIFO, skb_put(skb, pkt_len),
@@ -866,8 +874,8 @@ static int el3_rx(struct net_device *dev)
 		dev->stats.rx_packets++;
 		dev->stats.rx_bytes += pkt_len;
 	    } else {
-		pr_debug("%s: couldn't allocate a sk_buff of"
-		      " size %d.\n", dev->name, pkt_len);
+		netdev_dbg(dev, "couldn't allocate a sk_buff of size %d.\n",
+			   pkt_len);
 		dev->stats.rx_dropped++;
 	    }
 	}
@@ -875,7 +883,7 @@ static int el3_rx(struct net_device *dev)
 	tc589_wait_for_completion(dev, RxDiscard);
     }
     if (worklimit == 0)
-	printk(KERN_WARNING "%s: too much work in el3_rx!\n", dev->name);
+	netdev_warn(dev, "too much work in el3_rx!\n");
     return 0;
 }
 
@@ -906,17 +914,17 @@ static int el3_close(struct net_device *dev)
     struct el3_private *lp = netdev_priv(dev);
     struct pcmcia_device *link = lp->p_dev;
     unsigned int ioaddr = dev->base_addr;
-    
+
     dev_dbg(&link->dev, "%s: shutting down ethercard.\n", dev->name);
 
     if (pcmcia_dev_present(link)) {
 	/* Turn off statistics ASAP.  We update dev->stats below. */
 	outw(StatsDisable, ioaddr + EL3_CMD);
-	
+
 	/* Disable the receiver and transmitter. */
 	outw(RxDisable, ioaddr + EL3_CMD);
 	outw(TxDisable, ioaddr + EL3_CMD);
-	
+
 	if (dev->if_port == 2)
 	    /* Turn off thinnet power.  Green! */
 	    outw(StopCoax, ioaddr + EL3_CMD);
@@ -925,12 +933,12 @@ static int el3_close(struct net_device *dev)
 	    EL3WINDOW(4);
 	    outw(0, ioaddr + WN4_MEDIA);
 	}
-	
+
 	/* Switching back to window 0 disables the IRQ. */
 	EL3WINDOW(0);
 	/* But we explicitly zero the IRQ line select anyway. */
 	outw(0x0f00, ioaddr + WN0_IRQ);
-        
+
 	/* Check if the card still exists */
 	if ((inw(ioaddr+EL3_STATUS) & 0xe000) == 0x2000)
 	    update_stats(dev);
@@ -939,7 +947,7 @@ static int el3_close(struct net_device *dev)
     link->open--;
     netif_stop_queue(dev);
     del_timer_sync(&lp->media);
-    
+
     return 0;
 }
 
@@ -961,7 +969,7 @@ static struct pcmcia_driver tc589_driver = {
 	},
 	.probe		= tc589_probe,
 	.remove		= tc589_detach,
-        .id_table       = tc589_ids,
+	.id_table	= tc589_ids,
 	.suspend	= tc589_suspend,
 	.resume		= tc589_resume,
 };
-- 
1.7.0

^ permalink raw reply related

* [PATCH] virtio-net: move sg off stack
From: Michael S. Tsirkin @ 2010-03-31 12:41 UTC (permalink / raw)
  To: David S. Miller, Rusty Russell, Jiri Pirko, Michael S. Tsirkin,
	Shirley Ma, ne
In-Reply-To: <20100331.031636.04083770.davem@davemloft.net>

Move sg structure off stack and into virtnet_info structure.
This helps remove extra sg_init_table calls as well as reduce
stack usage.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Compile-tested only for now.
Shirley, Rusty Russell, does this fix CONFIG_DEBUG_SG=y for you?

 drivers/net/virtio_net.c |   52 ++++++++++++++++++++++-----------------------
 1 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3f5be35..93dcde2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -39,8 +39,7 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX    2
 
-struct virtnet_info
-{
+struct virtnet_info {
 	struct virtio_device *vdev;
 	struct virtqueue *rvq, *svq, *cvq;
 	struct net_device *dev;
@@ -61,6 +60,10 @@ struct virtnet_info
 
 	/* Chain pages by the private ptr. */
 	struct page *pages;
+
+	/* fragments + linear part + virtio header */
+	struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
+	struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
 };
 
 struct skb_vnet_hdr {
@@ -323,10 +326,8 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
 {
 	struct sk_buff *skb;
 	struct skb_vnet_hdr *hdr;
-	struct scatterlist sg[2];
 	int err;
 
-	sg_init_table(sg, 2);
 	skb = netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN);
 	if (unlikely(!skb))
 		return -ENOMEM;
@@ -334,11 +335,11 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
 	skb_put(skb, MAX_PACKET_LEN);
 
 	hdr = skb_vnet_hdr(skb);
-	sg_set_buf(sg, &hdr->hdr, sizeof hdr->hdr);
+	sg_set_buf(vi->rx_sg, &hdr->hdr, sizeof hdr->hdr);
 
-	skb_to_sgvec(skb, sg + 1, 0, skb->len);
+	skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
 
-	err = vi->rvq->vq_ops->add_buf(vi->rvq, sg, 0, 2, skb);
+	err = vi->rvq->vq_ops->add_buf(vi->rvq, vi->rx_sg, 0, 2, skb);
 	if (err < 0)
 		dev_kfree_skb(skb);
 
@@ -347,13 +348,11 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
 
 static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 {
-	struct scatterlist sg[MAX_SKB_FRAGS + 2];
 	struct page *first, *list = NULL;
 	char *p;
 	int i, err, offset;
 
-	sg_init_table(sg, MAX_SKB_FRAGS + 2);
-	/* page in sg[MAX_SKB_FRAGS + 1] is list tail */
+	/* page in vi->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
 	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
 		first = get_a_page(vi, gfp);
 		if (!first) {
@@ -361,7 +360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 				give_pages(vi, list);
 			return -ENOMEM;
 		}
-		sg_set_buf(&sg[i], page_address(first), PAGE_SIZE);
+		sg_set_buf(&vi->rx_sg[i], page_address(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
 		first->private = (unsigned long)list;
@@ -375,17 +374,17 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 	}
 	p = page_address(first);
 
-	/* sg[0], sg[1] share the same page */
-	/* a separated sg[0] for  virtio_net_hdr only during to QEMU bug*/
-	sg_set_buf(&sg[0], p, sizeof(struct virtio_net_hdr));
+	/* vi->rx_sg[0], vi->rx_sg[1] share the same page */
+	/* a separated vi->rx_sg[0] for virtio_net_hdr only due to QEMU bug */
+	sg_set_buf(&vi->rx_sg[0], p, sizeof(struct virtio_net_hdr));
 
-	/* sg[1] for data packet, from offset */
+	/* vi->rx_sg[1] for data packet, from offset */
 	offset = sizeof(struct padded_vnet_hdr);
-	sg_set_buf(&sg[1], p + offset, PAGE_SIZE - offset);
+	sg_set_buf(&vi->rx_sg[1], p + offset, PAGE_SIZE - offset);
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = vi->rvq->vq_ops->add_buf(vi->rvq, sg, 0, MAX_SKB_FRAGS + 2,
+	err = vi->rvq->vq_ops->add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
 				       first);
 	if (err < 0)
 		give_pages(vi, first);
@@ -396,16 +395,15 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
 {
 	struct page *page;
-	struct scatterlist sg;
 	int err;
 
 	page = get_a_page(vi, gfp);
 	if (!page)
 		return -ENOMEM;
 
-	sg_init_one(&sg, page_address(page), PAGE_SIZE);
+	sg_init_one(&vi->rx_sg, page_address(page), PAGE_SIZE);
 
-	err = vi->rvq->vq_ops->add_buf(vi->rvq, &sg, 0, 1, page);
+	err = vi->rvq->vq_ops->add_buf(vi->rvq, &vi->rx_sg, 0, 1, page);
 	if (err < 0)
 		give_pages(vi, page);
 
@@ -514,12 +512,9 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 {
-	struct scatterlist sg[2+MAX_SKB_FRAGS];
 	struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
 	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
 
-	sg_init_table(sg, 2+MAX_SKB_FRAGS);
-
 	pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -553,12 +548,13 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 
 	/* Encode metadata header at front. */
 	if (vi->mergeable_rx_bufs)
-		sg_set_buf(sg, &hdr->mhdr, sizeof hdr->mhdr);
+		sg_set_buf(vi->tx_sg, &hdr->mhdr, sizeof hdr->mhdr);
 	else
-		sg_set_buf(sg, &hdr->hdr, sizeof hdr->hdr);
+		sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
 
-	hdr->num_sg = skb_to_sgvec(skb, sg+1, 0, skb->len) + 1;
-	return vi->svq->vq_ops->add_buf(vi->svq, sg, hdr->num_sg, 0, skb);
+	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
+	return vi->svq->vq_ops->add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
+					0, skb);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -941,6 +937,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	vdev->priv = vi;
 	vi->pages = NULL;
 	INIT_DELAYED_WORK(&vi->refill, refill_work);
+	sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
+	sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
 
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
-- 
1.7.0.2.280.gc6f05

^ permalink raw reply related

* [net-next PATCH] be2net: Adding PCI SRIOV support
From: Sarveshwar Bandi @ 2010-03-31 12:56 UTC (permalink / raw)
  To: davem; +Cc: netdev

- Patch adds support to enable PCI SRIOV in the driver and changes to handle initialization of PCI virtual functions.
- Function handler to change mac addresses for VF from its corresponding PF.

Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
---
 drivers/net/benet/be.h      |    9 ++
 drivers/net/benet/be_cmds.c |    4 +
 drivers/net/benet/be_cmds.h |    2 
 drivers/net/benet/be_hw.h   |    3 +
 drivers/net/benet/be_main.c |  222 ++++++++++++++++++++++++++++++++++---------
 5 files changed, 191 insertions(+), 49 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 8f07525..20842c5 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -83,6 +83,8 @@ #define RX_FRAGS_REFILL_WM	(RX_Q_LEN - M
 
 #define FW_VER_LEN		32
 
+#define BE_MAX_VF		32
+
 struct be_dma_mem {
 	void *va;
 	dma_addr_t dma;
@@ -280,8 +282,15 @@ struct be_adapter {
 	u8 port_type;
 	u8 transceiver;
 	u8 generation;		/* BladeEngine ASIC generation */
+
+	bool sriov_enabled;
+	u32 vf_if_handle[BE_MAX_VF];
+	u32 vf_pmac_id[BE_MAX_VF];
+	u8 base_eq_id;
 };
 
+#define be_physfn(adapter) (!adapter->pdev->is_virtfn)
+
 /* BladeEngine Generation numbers */
 #define BE_GEN2 2
 #define BE_GEN3 3
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 50e6259..9f53d9e 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -843,7 +843,8 @@ int be_cmd_q_destroy(struct be_adapter *
  * Uses mbox
  */
 int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags, u32 en_flags,
-		u8 *mac, bool pmac_invalid, u32 *if_handle, u32 *pmac_id)
+		u8 *mac, bool pmac_invalid, u32 *if_handle, u32 *pmac_id,
+		u32 domain)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_if_create *req;
@@ -860,6 +861,7 @@ int be_cmd_if_create(struct be_adapter *
 	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
 		OPCODE_COMMON_NTWK_INTERFACE_CREATE, sizeof(*req));
 
+	req->hdr.domain = domain;
 	req->capability_flags = cpu_to_le32(cap_flags);
 	req->enable_flags = cpu_to_le32(en_flags);
 	req->pmac_invalid = pmac_invalid;
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index cce61f9..763dc19 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -878,7 +878,7 @@ extern int be_cmd_pmac_add(struct be_ada
 extern int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id, u32 pmac_id);
 extern int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags,
 			u32 en_flags, u8 *mac, bool pmac_invalid,
-			u32 *if_handle, u32 *pmac_id);
+			u32 *if_handle, u32 *pmac_id, u32 domain);
 extern int be_cmd_if_destroy(struct be_adapter *adapter, u32 if_handle);
 extern int be_cmd_eq_create(struct be_adapter *adapter,
 			struct be_queue_info *eq, int eq_delay);
diff --git a/drivers/net/benet/be_hw.h b/drivers/net/benet/be_hw.h
index 2d4a4b8..063026d 100644
--- a/drivers/net/benet/be_hw.h
+++ b/drivers/net/benet/be_hw.h
@@ -99,6 +99,9 @@ #define DB_MCCQ_RING_ID_MASK		0x7FF	/* b
 /* Number of entries posted */
 #define DB_MCCQ_NUM_POSTED_SHIFT	(16)	/* bits 16 - 29 */
 
+/********** SRIOV VF PCICFG OFFSET ********/
+#define SRIOV_VF_PCICFG_OFFSET		(4096)
+
 /* Flashrom related descriptors */
 #define IMAGE_TYPE_FIRMWARE		160
 #define IMAGE_TYPE_BOOTCODE		224
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 50ea056..881c87d 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -26,8 +26,11 @@ MODULE_AUTHOR("ServerEngines Corporation
 MODULE_LICENSE("GPL");
 
 static unsigned int rx_frag_size = 2048;
+static unsigned int num_vfs;
 module_param(rx_frag_size, uint, S_IRUGO);
+module_param(num_vfs, uint, S_IRUGO);
 MODULE_PARM_DESC(rx_frag_size, "Size of a fragment that holds rcvd data.");
+MODULE_PARM_DESC(num_vfs, "Number of PCI VFs to initialize");
 
 static DEFINE_PCI_DEVICE_TABLE(be_dev_ids) = {
 	{ PCI_DEVICE(BE_VENDOR_ID, BE_DEVICE_ID1) },
@@ -138,12 +141,19 @@ static int be_mac_addr_set(struct net_de
 	if (!is_valid_ether_addr(addr->sa_data))
 		return -EADDRNOTAVAIL;
 
+	/* MAC addr configuration will be done in hardware for VFs
+	 * by their corresponding PFs. Just copy to netdev addr here
+	 */
+	if (!be_physfn(adapter))
+		goto netdev_addr;
+
 	status = be_cmd_pmac_del(adapter, adapter->if_handle, adapter->pmac_id);
 	if (status)
 		return status;
 
 	status = be_cmd_pmac_add(adapter, (u8 *)addr->sa_data,
 			adapter->if_handle, &adapter->pmac_id);
+netdev_addr:
 	if (!status)
 		memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
 
@@ -576,6 +586,9 @@ static void be_vlan_add_vid(struct net_d
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
 
+	if (!be_physfn(adapter))
+		return;
+
 	adapter->vlan_tag[vid] = 1;
 	adapter->vlans_added++;
 	if (adapter->vlans_added <= (adapter->max_vlans + 1))
@@ -586,6 +599,9 @@ static void be_vlan_rem_vid(struct net_d
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
 
+	if (!be_physfn(adapter))
+		return;
+
 	adapter->vlan_tag[vid] = 0;
 	vlan_group_set_device(adapter->vlan_grp, vid, NULL);
 	adapter->vlans_added--;
@@ -623,6 +639,28 @@ done:
 	return;
 }
 
+static int be_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
+{
+	struct be_adapter *adapter = netdev_priv(netdev);
+	int status;
+
+	if (!adapter->sriov_enabled)
+		return -EPERM;
+
+	if (!is_valid_ether_addr(mac) || (vf >= num_vfs))
+		return -EINVAL;
+
+	status = be_cmd_pmac_del(adapter, adapter->vf_if_handle[vf],
+				adapter->vf_pmac_id[vf]);
+
+	status = be_cmd_pmac_add(adapter, mac, adapter->vf_if_handle[vf],
+				&adapter->vf_pmac_id[vf]);
+	if (!status)
+		dev_err(&adapter->pdev->dev, "MAC %pM set on VF %d Failed\n",
+				mac, vf);
+	return status;
+}
+
 static void be_rx_rate_update(struct be_adapter *adapter)
 {
 	struct be_drvr_stats *stats = drvr_stats(adapter);
@@ -1281,6 +1319,8 @@ static int be_tx_queues_create(struct be
 	/* Ask BE to create Tx Event queue */
 	if (be_cmd_eq_create(adapter, eq, adapter->tx_eq.cur_eqd))
 		goto tx_eq_free;
+	adapter->base_eq_id = adapter->tx_eq.q.id;
+
 	/* Alloc TX eth compl queue */
 	cq = &adapter->tx_obj.cq;
 	if (be_queue_alloc(adapter, cq, TX_CQ_LEN,
@@ -1408,7 +1448,7 @@ rx_eq_free:
 /* There are 8 evt ids per func. Retruns the evt id's bit number */
 static inline int be_evt_bit_get(struct be_adapter *adapter, u32 eq_id)
 {
-	return eq_id % 8;
+	return eq_id - adapter->base_eq_id;
 }
 
 static irqreturn_t be_intx(int irq, void *dev)
@@ -1586,6 +1626,28 @@ static void be_msix_enable(struct be_ada
 	return;
 }
 
+static void be_sriov_enable(struct be_adapter *adapter)
+{
+#ifdef CONFIG_PCI_IOV
+	int status;
+	if (be_physfn(adapter) && num_vfs) {
+		status = pci_enable_sriov(adapter->pdev, num_vfs);
+		adapter->sriov_enabled = status ? false : true;
+	}
+#endif
+	return;
+}
+
+static void be_sriov_disable(struct be_adapter *adapter)
+{
+#ifdef CONFIG_PCI_IOV
+	if (adapter->sriov_enabled) {
+		pci_disable_sriov(adapter->pdev);
+		adapter->sriov_enabled = false;
+	}
+#endif
+}
+
 static inline int be_msix_vec_get(struct be_adapter *adapter, u32 eq_id)
 {
 	return adapter->msix_entries[
@@ -1643,6 +1705,9 @@ static int be_irq_register(struct be_ada
 		status = be_msix_register(adapter);
 		if (status == 0)
 			goto done;
+		/* INTx is not supported for VF */
+		if (!be_physfn(adapter))
+			return status;
 	}
 
 	/* INTx */
@@ -1716,14 +1781,17 @@ static int be_open(struct net_device *ne
 		goto ret_sts;
 	be_link_status_update(adapter, link_up);
 
-	status = be_vid_config(adapter);
+	if (be_physfn(adapter))
+		status = be_vid_config(adapter);
 	if (status)
 		goto ret_sts;
 
-	status = be_cmd_set_flow_control(adapter,
-					adapter->tx_fc, adapter->rx_fc);
-	if (status)
-		goto ret_sts;
+	if (be_physfn(adapter)) {
+		status = be_cmd_set_flow_control(adapter,
+				adapter->tx_fc, adapter->rx_fc);
+		if (status)
+			goto ret_sts;
+	}
 
 	schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
 ret_sts:
@@ -1771,22 +1839,48 @@ static int be_setup_wol(struct be_adapte
 static int be_setup(struct be_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
-	u32 cap_flags, en_flags;
+	u32 cap_flags, en_flags, vf = 0;
 	int status;
+	u8 mac[ETH_ALEN];
+
+	cap_flags = en_flags = BE_IF_FLAGS_UNTAGGED | BE_IF_FLAGS_BROADCAST;
 
-	cap_flags = BE_IF_FLAGS_UNTAGGED | BE_IF_FLAGS_BROADCAST |
-			BE_IF_FLAGS_MCAST_PROMISCUOUS |
-			BE_IF_FLAGS_PROMISCUOUS |
-			BE_IF_FLAGS_PASS_L3L4_ERRORS;
-	en_flags = BE_IF_FLAGS_UNTAGGED | BE_IF_FLAGS_BROADCAST |
-			BE_IF_FLAGS_PASS_L3L4_ERRORS;
+	if (be_physfn(adapter)) {
+		cap_flags |= BE_IF_FLAGS_MCAST_PROMISCUOUS |
+				BE_IF_FLAGS_PROMISCUOUS |
+				BE_IF_FLAGS_PASS_L3L4_ERRORS;
+		en_flags |= BE_IF_FLAGS_PASS_L3L4_ERRORS;
+	}
 
 	status = be_cmd_if_create(adapter, cap_flags, en_flags,
 			netdev->dev_addr, false/* pmac_invalid */,
-			&adapter->if_handle, &adapter->pmac_id);
+			&adapter->if_handle, &adapter->pmac_id, 0);
 	if (status != 0)
 		goto do_none;
 
+	if (be_physfn(adapter)) {
+		while (vf < num_vfs) {
+			cap_flags = en_flags = BE_IF_FLAGS_UNTAGGED
+					| BE_IF_FLAGS_BROADCAST;
+			status = be_cmd_if_create(adapter, cap_flags, en_flags,
+					mac, true, &adapter->vf_if_handle[vf],
+					NULL, vf+1);
+			if (status) {
+				dev_err(&adapter->pdev->dev,
+				"Interface Create failed for VF %d\n", vf);
+				goto if_destroy;
+			}
+			vf++;
+		} while (vf < num_vfs);
+	} else if (!be_physfn(adapter)) {
+		status = be_cmd_mac_addr_query(adapter, mac,
+			MAC_ADDRESS_TYPE_NETWORK, false, adapter->if_handle);
+		if (!status) {
+			memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
+			memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
+		}
+	}
+
 	status = be_tx_queues_create(adapter);
 	if (status != 0)
 		goto if_destroy;
@@ -1808,6 +1902,9 @@ rx_qs_destroy:
 tx_qs_destroy:
 	be_tx_queues_destroy(adapter);
 if_destroy:
+	for (vf = 0; vf < num_vfs; vf++)
+		if (adapter->vf_if_handle[vf])
+			be_cmd_if_destroy(adapter, adapter->vf_if_handle[vf]);
 	be_cmd_if_destroy(adapter, adapter->if_handle);
 do_none:
 	return status;
@@ -2088,6 +2185,7 @@ static struct net_device_ops be_netdev_o
 	.ndo_vlan_rx_register	= be_vlan_register,
 	.ndo_vlan_rx_add_vid	= be_vlan_add_vid,
 	.ndo_vlan_rx_kill_vid	= be_vlan_rem_vid,
+	.ndo_set_vf_mac		= be_set_vf_mac
 };
 
 static void be_netdev_init(struct net_device *netdev)
@@ -2129,37 +2227,48 @@ static void be_unmap_pci_bars(struct be_
 		iounmap(adapter->csr);
 	if (adapter->db)
 		iounmap(adapter->db);
-	if (adapter->pcicfg)
+	if (adapter->pcicfg && be_physfn(adapter))
 		iounmap(adapter->pcicfg);
 }
 
 static int be_map_pci_bars(struct be_adapter *adapter)
 {
 	u8 __iomem *addr;
-	int pcicfg_reg;
+	int pcicfg_reg, db_reg;
 
-	addr = ioremap_nocache(pci_resource_start(adapter->pdev, 2),
-			pci_resource_len(adapter->pdev, 2));
-	if (addr == NULL)
-		return -ENOMEM;
-	adapter->csr = addr;
-
-	addr = ioremap_nocache(pci_resource_start(adapter->pdev, 4),
-			128 * 1024);
-	if (addr == NULL)
-		goto pci_map_err;
-	adapter->db = addr;
+	if (be_physfn(adapter)) {
+		addr = ioremap_nocache(pci_resource_start(adapter->pdev, 2),
+				pci_resource_len(adapter->pdev, 2));
+		if (addr == NULL)
+			return -ENOMEM;
+		adapter->csr = addr;
+	}
 
-	if (adapter->generation == BE_GEN2)
+	if (adapter->generation == BE_GEN2) {
 		pcicfg_reg = 1;
-	else
+		db_reg = 4;
+	} else {
 		pcicfg_reg = 0;
-
-	addr = ioremap_nocache(pci_resource_start(adapter->pdev, pcicfg_reg),
-			pci_resource_len(adapter->pdev, pcicfg_reg));
+		if (be_physfn(adapter))
+			db_reg = 4;
+		else
+			db_reg = 0;
+	}
+	addr = ioremap_nocache(pci_resource_start(adapter->pdev, db_reg),
+				pci_resource_len(adapter->pdev, db_reg));
 	if (addr == NULL)
 		goto pci_map_err;
-	adapter->pcicfg = addr;
+	adapter->db = addr;
+
+	if (be_physfn(adapter)) {
+		addr = ioremap_nocache(
+				pci_resource_start(adapter->pdev, pcicfg_reg),
+				pci_resource_len(adapter->pdev, pcicfg_reg));
+		if (addr == NULL)
+			goto pci_map_err;
+		adapter->pcicfg = addr;
+	} else
+		adapter->pcicfg = adapter->db + SRIOV_VF_PCICFG_OFFSET;
 
 	return 0;
 pci_map_err:
@@ -2273,6 +2382,8 @@ static void __devexit be_remove(struct p
 
 	be_ctrl_cleanup(adapter);
 
+	be_sriov_disable(adapter);
+
 	be_msix_disable(adapter);
 
 	pci_set_drvdata(pdev, NULL);
@@ -2297,16 +2408,20 @@ static int be_get_config(struct be_adapt
 		return status;
 
 	memset(mac, 0, ETH_ALEN);
-	status = be_cmd_mac_addr_query(adapter, mac,
+
+	if (be_physfn(adapter)) {
+		status = be_cmd_mac_addr_query(adapter, mac,
 			MAC_ADDRESS_TYPE_NETWORK, true /*permanent */, 0);
-	if (status)
-		return status;
 
-	if (!is_valid_ether_addr(mac))
-		return -EADDRNOTAVAIL;
+		if (status)
+			return status;
 
-	memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
-	memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
+		if (!is_valid_ether_addr(mac))
+			return -EADDRNOTAVAIL;
+
+		memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
+		memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
+	}
 
 	if (adapter->cap & 0x400)
 		adapter->max_vlans = BE_NUM_VLANS_SUPPORTED/4;
@@ -2323,6 +2438,7 @@ static int __devinit be_probe(struct pci
 	struct be_adapter *adapter;
 	struct net_device *netdev;
 
+
 	status = pci_enable_device(pdev);
 	if (status)
 		goto do_none;
@@ -2371,24 +2487,28 @@ static int __devinit be_probe(struct pci
 		}
 	}
 
+	be_sriov_enable(adapter);
+
 	status = be_ctrl_init(adapter);
 	if (status)
 		goto free_netdev;
 
 	/* sync up with fw's ready state */
-	status = be_cmd_POST(adapter);
-	if (status)
-		goto ctrl_clean;
+	if (be_physfn(adapter)) {
+		status = be_cmd_POST(adapter);
+		if (status)
+			goto ctrl_clean;
+
+		status = be_cmd_reset_function(adapter);
+		if (status)
+			goto ctrl_clean;
+	}
 
 	/* tell fw we're ready to fire cmds */
 	status = be_cmd_fw_init(adapter);
 	if (status)
 		goto ctrl_clean;
 
-	status = be_cmd_reset_function(adapter);
-	if (status)
-		goto ctrl_clean;
-
 	status = be_stats_init(adapter);
 	if (status)
 		goto ctrl_clean;
@@ -2418,6 +2538,7 @@ ctrl_clean:
 	be_ctrl_cleanup(adapter);
 free_netdev:
 	be_msix_disable(adapter);
+	be_sriov_disable(adapter);
 	free_netdev(adapter->netdev);
 	pci_set_drvdata(pdev, NULL);
 rel_reg:
@@ -2614,6 +2735,13 @@ static int __init be_init_module(void)
 		rx_frag_size = 2048;
 	}
 
+	if (num_vfs > 32) {
+		printk(KERN_WARNING DRV_NAME
+			" : Module param num_vfs must not be greater than 32."
+			"Using 32\n");
+		num_vfs = 32;
+	}
+
 	return pci_register_driver(&be_driver);
 }
 module_init(be_init_module);
-- 
1.4.0


^ permalink raw reply related

* Severe regression in bnx2 driver with bonding in post 2.6.30 kernels
From: Stuart Shelton @ 2010-03-31 12:55 UTC (permalink / raw)
  To: mchan, netdev


Hi all,

The Broadcom NetXtreme II driver appears to have a severe regression in 
all kernels post 2.6.30 - I've observed problems with 2.6.31, 2.6.32. 
and 2.6.33.

The hardware impacted is an IBM Bladecenter LS21 Blade, model 7971.  We 
have a large number of these, and all are affected.

We use generic channel-bonding, with the following options in modprobe.conf:

alias bond0 bonding
options bond0 mode=0 miimon=100

With any kernel prior to 2.6.31, the dmesg output reads:

Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
   alloc irq_desc for 17 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
...
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
   alloc irq_desc for 18 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth1 to eg1
udev: renamed network interface eth0 to eg0
...
   alloc irq_desc for 32 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
   alloc irq_desc for 33 on cpu 0 node 0
   alloc kstat_irqs on cpu 0 node 0
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bonding: bond0: enslaving eg0 as an active interface with a down link.
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg1: using MSI
bonding: bond0: enslaving eg1 as an active interface with a down link.
bonding: bond0: link status definitely up for interface eg0.
bonding: bond0: link status definitely up for interface eg1.
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON


... however, with kernels from 2.6.31 and later, the dmesg output reads:

Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
   alloc irq_desc for 17 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
   alloc irq_desc for 18 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth1 to eg1
udev: renamed network interface eth0 to eg0
...
   alloc irq_desc for 32 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: eg0: using MSI
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
   alloc irq_desc for 33 on node 0
   alloc kstat_irqs on node 0
bnx2 0000:02:05.0: irq 33 for MSI/MSI-X
bnx2: eg0 NIC SerDes Link is Down
bnx2: eg0 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: Chip reset did not complete
bnx2: eg1 NIC SerDes Link is Down
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
NET: Registered protocol family 17
bnx2 0000:02:05.0: PCI INT A disabled
bnx2 0000:02:04.0: PCI INT A disabled
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth1: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth0 to eg0
udev: renamed network interface eth1 to eg1
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: eg1: using MSI
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2 0000:02:04.0: irq 33 for MSI/MSI-X
bnx2: eg1 NIC SerDes Link is Down
bnx2: eg1 NIC SerDes Link is Up, 1000 Mbps full duplex, receive & 
transmit flow control ON
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 33 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2: Chip reset did not complete
bnx2: fw sync timeout, reset code = 4040005
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:05.0: PCI INT A disabled
bnx2 0000:02:04.0: PCI INT A disabled
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.1 (May 6, 2009)
bnx2 0000:02:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:04.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e2000000, IRQ 17, node addr 00:1a:64:bd:21:04
bnx2 0000:02:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-mips-06-4.6.16.fw
bnx2 0000:02:05.0: firmware: requesting bnx2/bnx2-rv2p-06-4.6.16.fw
udev: renamed network interface eth0 to eg0
eth0: Broadcom NetXtreme II BCM5706 1000Base-SX (A2) PCI-X 64-bit 133MHz 
found at mem e4000000, IRQ 18, node addr 00:1a:64:be:20:80
udev: renamed network interface eth0 to eg1
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete
bnx2 0000:02:04.0: irq 32 for MSI/MSI-X
bnx2: Chip reset did not complete

... (this later ouput showing the initial attempt to raise the 
interfaces at boot, and then me manually removing and re-inserting the 
bnx2 driver).  Alongside this, the console outputs "SIOCSIFFLAGS: Device 
or resource busy".

On these more recent kernels, the SIOCSIFFLAGS line is always output, 
but about 50% of the time the network interface is raised.  When this 
fails, then sometimes removing and re-inserting the bnx2 driver can 
result in usable non-bonded interfaces - but as often as not the NICs 
won't be usable even in a standard non-bonded configuration.

With a simple reboot back to a 2.6.30 or earlier kernel, the problem 
goes away (even though the firmware file on disk is the same as that 
used with the later kernels).  Ever blade we have is affected, so this 
is not a hardware problem (or at least, if it is, then it's a very 
common one!).  I thought that the problem might only occur when bonding 
is used - but I can't now recall what made me think this, and I've not 
been able to get the server down-time to extensively test the issue further.

Any advice/guidance greatly appreciated,

Stuart

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:06 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331110345.GC12845@gondor.apana.org.au>

On Wed, 2010-03-31 at 19:03 +0800, Herbert Xu wrote:

> This seems to be bogus to me.  Just because the DB was empty
> before the flush doesn't mean that the flush didn't happen.

Herbert, If a tree falls in a forest and no one is around to hear it,
does it make a sound? ;->
A flush event is meant to be a signal to user space that what
was once a non-empty table is now empty. 
This is a consistent definition of the semantics everywhere tables
are flushed (not just in Linux)..

What makes the SPD and SAD speacial?

cheers,
jamal




^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: Herbert Xu @ 2010-03-31 13:11 UTC (permalink / raw)
  To: jamal; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270040773.26743.12.camel@bigi>

On Wed, Mar 31, 2010 at 09:06:13AM -0400, jamal wrote:
> On Wed, 2010-03-31 at 19:03 +0800, Herbert Xu wrote:
> 
> > This seems to be bogus to me.  Just because the DB was empty
> > before the flush doesn't mean that the flush didn't happen.
> 
> Herbert, If a tree falls in a forest and no one is around to hear it,
> does it make a sound? ;->
> A flush event is meant to be a signal to user space that what
> was once a non-empty table is now empty. 

I disagree.  A flush event is a signal that someone has sent a
flush command.  In any case we've had this semantics for years
and I haven't heard a good reason why this should be changed.

> This is a consistent definition of the semantics everywhere tables
> are flushed (not just in Linux)..

Please give specific examples in the kernel.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: Herbert Xu @ 2010-03-31 13:26 UTC (permalink / raw)
  To: jamal; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331131131.GA13793@gondor.apana.org.au>

On Wed, Mar 31, 2010 at 09:11:31PM +0800, Herbert Xu wrote:
> 
> > This is a consistent definition of the semantics everywhere tables
> > are flushed (not just in Linux)..
> 
> Please give specific examples in the kernel.

In fact the previous behaviour is also consistent with RFC2367:

3.1.9 SADB_FLUSH

   The SADB_FLUSH message causes the kernel to delete all entries in its
   key table for a certain sadb_msg_satype.  Only the base header is
   required for a flush message.  If sadb_msg_satype is filled in with a
   specific value, only associations of that type are deleted.  If it is
   filled in with SADB_SATYPE_UNSPEC, ALL associations are deleted.

     The messaging behavior for SADB_FLUSH is:

           Send an SADB_FLUSH message from a user process to the kernel.

           <base>

           The kernel will return an SADB_FLUSH message to all listening
           sockets.

           <base>

           The reply message happens only after the actual flushing
           of security associations has been attempted.

There is no special treatment for an empty DB.

Please send a revert.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/7] flow: allocate hash table for online cpus only
From: Timo Teräs @ 2010-03-31 13:27 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Herbert Xu, netdev
In-Reply-To: <201003312302.51683.rusty@rustcorp.com.au>

Rusty Russell wrote:
> On Tue, 30 Mar 2010 10:42:55 pm Herbert Xu wrote:
>> On Mon, Mar 29, 2010 at 05:12:40PM +0300, Timo Teras wrote:
>>> Instead of unconditionally allocating hash table for all possible
>>> cpu's, allocate it only for online cpu's and release related
>>> memory if cpu goes down.
>>>
>>> Signed-off-by: Timo Teras <timo.teras@iki.fi>
>> Hmm that's where we started but then Rusty changed it back in 2004:
>>
>> So I'd like to hear his opinion on changing it back again.
> 
> It was pretty unique at the time, it no longer is, so the arguments are less
> compelling IMHO.
> 
> However, we can now use a dynamic percpu variable and get it as a real
> per-cpu thing (which currently means it *will* be for every available cpu,
> not just online ones).  Haven't thought about it, but that change might be
> worth considering instead?

I did convert most of the static percpu variables to a struct which
is allocated dynamically using alloc_percpu. See:
 http://marc.info/?l=linux-netdev&m=127003066905912&w=2

This patch is on top of that, to avoid allocating the larger hash
table unconditionally as amount of possible cpu's can be large.
If you take a look at the actual patch to add back the hash allocation
for only 'online' cpu's, it's not that complicated IMHO:
 http://marc.info/?l=linux-netdev&m=126987200927472&w=2


^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:28 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331131131.GA13793@gondor.apana.org.au>

On Wed, 2010-03-31 at 21:11 +0800, Herbert Xu wrote:

> I disagree.  A flush event is a signal that someone has sent a
> flush command.  

Sorry - I respectfully disagree.

> In any case we've had this semantics for years
> and I haven't heard a good reason why this should be changed.

It generates unnecessary noise and it is a deviation like i mentioned.

> > This is a consistent definition of the semantics everywhere tables
> > are flushed (not just in Linux)..
> 
> Please give specific examples in the kernel.

Something i can do safely right now without messing my connection; 
Issue iproute commands in one window, observe events in another

-sudo ip route add 192.168.11.100 dev eth0 table 15
generates an event
-sudo ip route flush table 15
generates an event
-sudo ip route flush table 15
No event

But pick anything else in the other netlink knowledgeable subsystem
and youd see similar behavior.

If there was an app depending on this behavior - thats a separate reason
(but thats not the arguement you are making).

cheers,
jamal


^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:32 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331132622.GA13908@gondor.apana.org.au>

On Wed, 2010-03-31 at 21:26 +0800, Herbert Xu wrote:


> In fact the previous behaviour is also consistent with RFC2367:

I did not touch pfkey. That behavior remains there. 

cheers,
jamal



^ permalink raw reply

* Re: [PATCH] can: Add driver for SJA1000 based PCI CAN interface cards by esd
From: Wolfgang Grandegger @ 2010-03-31 13:35 UTC (permalink / raw)
  To: Matthias Fuchs
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201003311428.05749.matthias.fuchs-iOnpLzIbIdM@public.gmane.org>

Hi Matthias,

Matthias Fuchs wrote:
> This patch adds support for SJA1000 based PCI CAN interface cards
> from electronic system design gmbh.
> 
> These boards are supported:
> 
>         CAN-PCI/200 (PCI)
>         CAN-PCI/266 (PCI)
>         CAN-PMC266 (PMC module)
>         CAN-PCIe/2000 (PCI Express)
>         CAN-CPCI/200 (Compact PCI, 3U)
>         CAN-PCI104 (PCI104)
> 
> This driver is part of the SocketCAN SVN repository since
> April 2009.
> 
> Signed-off-by: Matthias Fuchs <matthias.fuchs-iOnpLzIbIdM@public.gmane.org>

Since a while we have a generic PCI PLX driver in the mainline kernel,
which could support the esd cards as well, I believe.

Wolfgang.

^ permalink raw reply

* Re: [r8169] WARNING: at net/sched/sch_generic.c
From: Neil Horman @ 2010-03-31 13:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Sergey Senozhatsky, netdev, Francois Romieu, David S. Miller,
	linux-kernel
In-Reply-To: <1270038569.2103.24.camel@edumazet-laptop>

On Wed, Mar 31, 2010 at 02:29:29PM +0200, Eric Dumazet wrote:
> Le mercredi 31 mars 2010 à 15:14 +0300, Sergey Senozhatsky a écrit :
> 
> 
> > PKT_SIZE="pkt_size 2048"
> > 
> 
> If you use 1024 bytes pktgen messages, do you still have the problem ?
> 
+1 I wouldn't be suprised if using something over the nominal 1522 byte frame
length on r8169 caused tx errors.  The driver doesn't seem to support Jumbo
frames, so my guess is you have to keep the packet size below 1522 bytes.
Neil

> 

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:39 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270042356.26743.30.camel@bigi>



On Wed, 2010-03-31 at 09:32 -0400, jamal wrote:

> I did not touch pfkey. That behavior remains there. 

Sorry - I lied. I did touch pfkey. Here was my reasoning.

RFC 2367 says flushing behavior should be:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush event to ALL listeners

This is not realistic today in the presence of selinux policies
which may reject the flush etc. So we make the sequence become:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush response to originater from #1
4) if there were no errors then:
kernel -> user space: flush event to ALL listeners

This was in the logs.

cheers,
jamal


^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 13:41 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270042795.26743.31.camel@bigi>


On Wed, 2010-03-31 at 09:39 -0400, jamal wrote:
> 
> On Wed, 2010-03-31 at 09:32 -0400, jamal wrote:
> 
> > I did not touch pfkey. That behavior remains there. 
> 
> Sorry - I lied. I did touch pfkey. Here was my reasoning.

And what I meant by not touching is that "the behavior there
remains as it was before"

cheers,
jamal

> RFC 2367 says flushing behavior should be:
> 1) user space -> kernel: flush
> 2) kernel: flush
> 3) kernel -> user space: flush event to ALL listeners
> 
> This is not realistic today in the presence of selinux policies
> which may reject the flush etc. So we make the sequence become:
> 1) user space -> kernel: flush
> 2) kernel: flush
> 3) kernel -> user space: flush response to originater from #1
> 4) if there were no errors then:
> kernel -> user space: flush event to ALL listeners
> 
> This was in the logs.
> 
> cheers,
> jamal


^ permalink raw reply

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
From: Serge E. Hallyn @ 2010-03-31 13:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tejun Heo, Greg Kroah-Hartman, Kay Sievers, linux-kernel,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev
In-Reply-To: <m1634d82e0.fsf@fess.ebiederm.org>

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Tejun Heo <tj@kernel.org> writes:
> >> index 30f5a44..030a39d 100644
> >> --- a/fs/sysfs/sysfs.h
> >> +++ b/fs/sysfs/sysfs.h
> >> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
> >>  /*
> >>   * mount.c
> >>   */
> >> +struct sysfs_super_info {
> >> +};
> >> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
> >
> > Another nit picking.  It would be better to wrap SB in the macro
> > definition.  Also, wouldn't an inline function be better?
> 
> Good spotting.  That doesn't bite today but it will certainly bite
> someday if it isn't fixed.
> 
> I wonder how that has slipped through the review all of this time.

(let me demonstrate how: )

WTH are you talking about?  Unless you mean doing (SB) inside
the definition?

I actually was going to suggest dropping the #define as it obscures
the code, but I figured it would get more complicated later.

-serge

^ permalink raw reply

* Re: [r8169] WARNING: at net/sched/sch_generic.c
From: Sergey Senozhatsky @ 2010-03-31 13:49 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, Sergey Senozhatsky, netdev, Francois Romieu,
	David S. Miller, linux-kernel
In-Reply-To: <20100331133533.GD13963@hmsreliant.think-freely.org>

[-- Attachment #1: Type: text/plain, Size: 624 bytes --]

On (03/31/10 09:35), Neil Horman wrote:
> On Wed, Mar 31, 2010 at 02:29:29PM +0200, Eric Dumazet wrote:
> > Le mercredi 31 mars 2010 à 15:14 +0300, Sergey Senozhatsky a écrit :
> > 
> > 
> > > PKT_SIZE="pkt_size 2048"
> > > 
> > 
> > If you use 1024 bytes pktgen messages, do you still have the problem ?
> > 
> +1 I wouldn't be suprised if using something over the nominal 1522 byte frame
> length on r8169 caused tx errors.  The driver doesn't seem to support Jumbo
> frames, so my guess is you have to keep the packet size below 1522 bytes.
> Neil

Hm. Good point. I'll test a bit later today.


	Sergey

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: Herbert Xu @ 2010-03-31 13:53 UTC (permalink / raw)
  To: jamal; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270042092.26743.26.camel@bigi>

On Wed, Mar 31, 2010 at 09:28:12AM -0400, jamal wrote:
> 
> -sudo ip route add 192.168.11.100 dev eth0 table 15
> generates an event
> -sudo ip route flush table 15
> generates an event
> -sudo ip route flush table 15
> No event

That's completely different.  We don't have a route flush event,
instead we're sending route delete events.  That's why when the
table is empty you get no events.

If we had a route flush event then it would behave exactly the
same.

BTW you've also made xfrm_state_flush inconsistent with respect
to xfrm_policy_flush.

Dave, please revert this patch.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: Herbert Xu @ 2010-03-31 13:55 UTC (permalink / raw)
  To: jamal; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270042795.26743.31.camel@bigi>

On Wed, Mar 31, 2010 at 09:39:55AM -0400, jamal wrote:
> 
> This is not realistic today in the presence of selinux policies
> which may reject the flush etc. So we make the sequence become:
> 1) user space -> kernel: flush
> 2) kernel: flush
> 3) kernel -> user space: flush response to originater from #1
> 4) if there were no errors then:
> kernel -> user space: flush event to ALL listeners

Eliding the notification if SELinux says so is fine, but eliding
it because the table is empty is wrong.

The flush did not fail just because the table was empty to begin
with.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: Herbert Xu @ 2010-03-31 13:56 UTC (permalink / raw)
  To: jamal; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270042883.26743.32.camel@bigi>

On Wed, Mar 31, 2010 at 09:41:23AM -0400, jamal wrote:
> 
> On Wed, 2010-03-31 at 09:39 -0400, jamal wrote:
> > 
> > On Wed, 2010-03-31 at 09:32 -0400, jamal wrote:
> > 
> > > I did not touch pfkey. That behavior remains there. 
> > 
> > Sorry - I lied. I did touch pfkey. Here was my reasoning.
> 
> And what I meant by not touching is that "the behavior there
> remains as it was before"

No you've changed it.  PF_KEY will no longer notify if the policy
table is empty.

This is inconsistent with the behaviour of SADB flushes, and the
spirit of RFC2367.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
From: Eric W. Biederman @ 2010-03-31 14:02 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Tejun Heo, Greg Kroah-Hartman, Kay Sievers, linux-kernel,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev
In-Reply-To: <20100331134757.GA6132@us.ibm.com>

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> Tejun Heo <tj@kernel.org> writes:
>> >> index 30f5a44..030a39d 100644
>> >> --- a/fs/sysfs/sysfs.h
>> >> +++ b/fs/sysfs/sysfs.h
>> >> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
>> >>  /*
>> >>   * mount.c
>> >>   */
>> >> +struct sysfs_super_info {
>> >> +};
>> >> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
>> >
>> > Another nit picking.  It would be better to wrap SB in the macro
>> > definition.  Also, wouldn't an inline function be better?
>> 
>> Good spotting.  That doesn't bite today but it will certainly bite
>> someday if it isn't fixed.
>> 
>> I wonder how that has slipped through the review all of this time.
>
> (let me demonstrate how: )
>
> WTH are you talking about?  Unless you mean doing (SB) inside
> the definition?
>
> I actually was going to suggest dropping the #define as it obscures
> the code, but I figured it would get more complicated later.

I believe the discuss change was to make the define:
#define sysfs_info(SB) ((struct sysfs_super_info *)((SB)->s_fs_info))

As for dropping the define and using s_fs_info raw.  I rather like
a light weight type safe wrapper.  Maybe I just think s_fs_info
is an ugly name.

In practice I never call sysfs_info() with any expression that has
a side effect, so it doesn't matter.

Eric




^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 14:12 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331135505.GC14082@gondor.apana.org.au>

On Wed, 2010-03-31 at 21:55 +0800, Herbert Xu wrote:

> Eliding the notification if SELinux says so is fine, but eliding
> it because the table is empty is wrong.
> 
> The flush did not fail just because the table was empty to begin
> with.

Like i said i didnt touch the behavior except for the selinux case
(which sounds very reasonable). I believe there maybe historical legacy
reasons for that semantic in pfkey. 
Can you point to something in the kernel (or anywhere else) that behaves
like this on table flushing? Actually if there was an app that depended
on netlink flush being exposed on empty table - then i think theres
reason for a revert.
Other than that i will say again: i respectfully disagree.

cheers,
jamal


^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: jamal @ 2010-03-31 14:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <20100331135629.GD14082@gondor.apana.org.au>

On Wed, 2010-03-31 at 21:56 +0800, Herbert Xu wrote:

> 
> No you've changed it.  PF_KEY will no longer notify if the policy
> table is empty.
> 
> This is inconsistent with the behaviour of SADB flushes, and the
> spirit of RFC2367.

I did not mean to change it for pfkey. I do believe there are apps that
need it.
I will run some tests and if it breaks - I will send a patch.

cheers,
jamal


^ permalink raw reply

* Re: [PATCH 3/4] xfrm: remove policy lock when accessing policy->walk.dead
From: Herbert Xu @ 2010-03-31 14:15 UTC (permalink / raw)
  To: jamal; +Cc: Timo Teras, netdev, David S. Miller
In-Reply-To: <1270044768.26743.58.camel@bigi>

On Wed, Mar 31, 2010 at 10:12:48AM -0400, jamal wrote:
> On Wed, 2010-03-31 at 21:55 +0800, Herbert Xu wrote:
> 
> > Eliding the notification if SELinux says so is fine, but eliding
> > it because the table is empty is wrong.
> > 
> > The flush did not fail just because the table was empty to begin
> > with.
> 
> Like i said i didnt touch the behavior except for the selinux case
> (which sounds very reasonable). I believe there maybe historical legacy
> reasons for that semantic in pfkey. 
> Can you point to something in the kernel (or anywhere else) that behaves
> like this on table flushing? Actually if there was an app that depended
> on netlink flush being exposed on empty table - then i think theres
> reason for a revert.
> Other than that i will say again: i respectfully disagree.

OK I give up.

Dave can keep or revert this as he likes.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] can: Add driver for SJA1000 based PCI CAN interface cards by esd
From: Matthias Fuchs @ 2010-03-31 14:17 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4BB34F92.5010308-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>

Hi Wolfgang,

I am aware of the PCI PLX driver. It should be possible to
adapt that one. It lacks PLX9056 support so it's not done by just
adding some PCI device IDs.

But if that's how it should be done I will do so.

Matthias

On Wednesday 31 March 2010 15:35, Wolfgang Grandegger wrote:
> Hi Matthias,
> 
> Matthias Fuchs wrote:
> > This patch adds support for SJA1000 based PCI CAN interface cards
> > from electronic system design gmbh.
> > 
> > These boards are supported:
> > 
> >         CAN-PCI/200 (PCI)
> >         CAN-PCI/266 (PCI)
> >         CAN-PMC266 (PMC module)
> >         CAN-PCIe/2000 (PCI Express)
> >         CAN-CPCI/200 (Compact PCI, 3U)
> >         CAN-PCI104 (PCI104)
> > 
> > This driver is part of the SocketCAN SVN repository since
> > April 2009.
> > 
> > Signed-off-by: Matthias Fuchs <matthias.fuchs-iOnpLzIbIdM@public.gmane.org>
> 
> Since a while we have a generic PCI PLX driver in the mainline kernel,
> which could support the esd cards as well, I believe.
> 
> Wolfgang.
> 
> 

-- 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox