Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next-2.6 2/7] caif: Rename functions in cfcnfg and caif_dev
From: David Miller @ 2010-04-28 20:03 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-2-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:35 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
>  o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
>  o Fixed typo cfcfg to cfcnfg
>  o Renamed linkid to channel_id
>  o Updated documentation in caif_dev.h
>  o Minor formatting changes
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/7] caif: Ldisc add permission check and mem-alloc error check
From: David Miller @ 2010-04-28 20:02 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, marcel, daniel.martensson, sjurbr, linus.walleij
In-Reply-To: <1272480880-30672-1-git-send-email-sjur.brandeland@stericsson.com>

From: sjur.brandeland@stericsson.com
Date: Wed, 28 Apr 2010 20:54:34 +0200

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Changes:
>    o Added permission checks for installing. CAP_SYS_ADMIN and
>      CAP_SYS_TTY_CONFIG can install the ldisc.
>    o Check if allocation of skb was successful.
> 
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>

Applied.

^ permalink raw reply

* [PATCH v2] net/sb1250: register mdio bus in probe
From: Sebastian Andrzej Siewior @ 2010-04-28 19:57 UTC (permalink / raw)
  To: David Miller; +Cc: ralf, netdev
In-Reply-To: <20100427.155215.228945283.davem@davemloft.net>

"ifconfig eth0 up && ifconfig eth0 down" triggers:
| kobject (a8000000cfa5a480): tried to init an initialized object, something is seriously wrong.
| Call Trace:
| [<ffffffff8010aabc>] dump_stack+0x8/0x34
| [<ffffffff80293128>] kobject_init+0xe8/0xf0
| [<ffffffff802d922c>] device_initialize+0x2c/0x98
| [<ffffffff802d9cfc>] device_register+0x14/0x28
| [<ffffffff80312cd4>] mdiobus_register+0xdc/0x1e0
| [<ffffffff80314cf0>] sbmac_open+0x58/0x220
| [<ffffffff803519bc>] __dev_open+0x11c/0x180
| [<ffffffff8034d578>] __dev_change_flags+0x120/0x180
| [<ffffffff80351848>] dev_change_flags+0x20/0x78
| [<ffffffff803a753c>] devinet_ioctl+0x7cc/0x820
| [<ffffffff80339ac8>] sock_do_ioctl+0x38/0x90
| [<ffffffff8033a258>] compat_sock_ioctl_trans+0x408/0x1030
| [<ffffffff8033af30>] compat_sock_ioctl+0xb0/0xd0
| [<ffffffff80208b08>] compat_sys_ioctl+0xa0/0x18b8
| [<ffffffff80102f94>] handle_sys+0x114/0x130
|
| sb1250-mac-mdio: probed

mdiobus_register() calls device_register() which initializes the kobj of
the device. mdiobus_unregister() calls only device_del() so we have one
reference left. That one is leaving with mdiobus_free() which is only
called on remove.
Since I don't see any reason why mdiobus_register()/mdiobus_unregister()
should happen in ->open()/->close() I move them to probe & exit.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
v2: register mdio bus before registering netdev device

 drivers/net/sb1250-mac.c |   67 ++++++++++++++++++++++-----------------------
 1 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/drivers/net/sb1250-mac.c b/drivers/net/sb1250-mac.c
index 3f882d5..b0161b4 100644
--- a/drivers/net/sb1250-mac.c
+++ b/drivers/net/sb1250-mac.c
@@ -2256,17 +2256,36 @@ static int sbmac_init(struct platform_device *pldev, long long base)
 
 	sc->mii_bus = mdiobus_alloc();
 	if (sc->mii_bus == NULL) {
-		sbmac_uninitctx(sc);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto uninit_ctx;
 	}
 
+	sc->mii_bus->name = sbmac_mdio_string;
+	snprintf(sc->mii_bus->id, MII_BUS_ID_SIZE, "%x", idx);
+	sc->mii_bus->priv = sc;
+	sc->mii_bus->read = sbmac_mii_read;
+	sc->mii_bus->write = sbmac_mii_write;
+	sc->mii_bus->irq = sc->phy_irq;
+	for (i = 0; i < PHY_MAX_ADDR; ++i)
+		sc->mii_bus->irq[i] = SBMAC_PHY_INT;
+
+	sc->mii_bus->parent = &pldev->dev;
+	/*
+	 * Probe PHY address
+	 */
+	err = mdiobus_register(sc->mii_bus);
+	if (err) {
+		printk(KERN_ERR "%s: unable to register MDIO bus\n",
+		       dev->name);
+		goto free_mdio;
+	}
+	dev_set_drvdata(&pldev->dev, sc->mii_bus);
+
 	err = register_netdev(dev);
 	if (err) {
 		printk(KERN_ERR "%s.%d: unable to register netdev\n",
 		       sbmac_string, idx);
-		mdiobus_free(sc->mii_bus);
-		sbmac_uninitctx(sc);
-		return err;
+		goto unreg_mdio;
 	}
 
 	pr_info("%s.%d: registered as %s\n", sbmac_string, idx, dev->name);
@@ -2282,19 +2301,15 @@ static int sbmac_init(struct platform_device *pldev, long long base)
 	pr_info("%s: SiByte Ethernet at 0x%08Lx, address: %pM\n",
 	       dev->name, base, eaddr);
 
-	sc->mii_bus->name = sbmac_mdio_string;
-	snprintf(sc->mii_bus->id, MII_BUS_ID_SIZE, "%x", idx);
-	sc->mii_bus->priv = sc;
-	sc->mii_bus->read = sbmac_mii_read;
-	sc->mii_bus->write = sbmac_mii_write;
-	sc->mii_bus->irq = sc->phy_irq;
-	for (i = 0; i < PHY_MAX_ADDR; ++i)
-		sc->mii_bus->irq[i] = SBMAC_PHY_INT;
-
-	sc->mii_bus->parent = &pldev->dev;
-	dev_set_drvdata(&pldev->dev, sc->mii_bus);
-
 	return 0;
+unreg_mdio:
+	mdiobus_unregister(sc->mii_bus);
+	dev_set_drvdata(&pldev->dev, NULL);
+free_mdio:
+	mdiobus_free(sc->mii_bus);
+uninit_ctx:
+	sbmac_uninitctx(sc);
+	return err;
 }
 
 
@@ -2320,16 +2335,6 @@ static int sbmac_open(struct net_device *dev)
 		goto out_err;
 	}
 
-	/*
-	 * Probe PHY address
-	 */
-	err = mdiobus_register(sc->mii_bus);
-	if (err) {
-		printk(KERN_ERR "%s: unable to register MDIO bus\n",
-		       dev->name);
-		goto out_unirq;
-	}
-
 	sc->sbm_speed = sbmac_speed_none;
 	sc->sbm_duplex = sbmac_duplex_none;
 	sc->sbm_fc = sbmac_fc_none;
@@ -2360,11 +2365,7 @@ static int sbmac_open(struct net_device *dev)
 	return 0;
 
 out_unregister:
-	mdiobus_unregister(sc->mii_bus);
-
-out_unirq:
 	free_irq(dev->irq, dev);
-
 out_err:
 	return err;
 }
@@ -2553,9 +2554,6 @@ static int sbmac_close(struct net_device *dev)
 
 	phy_disconnect(sc->phy_dev);
 	sc->phy_dev = NULL;
-
-	mdiobus_unregister(sc->mii_bus);
-
 	free_irq(dev->irq, dev);
 
 	sbdma_emptyring(&(sc->sbm_txdma));
@@ -2663,6 +2661,7 @@ static int __exit sbmac_remove(struct platform_device *pldev)
 
 	unregister_netdev(dev);
 	sbmac_uninitctx(sc);
+	mdiobus_unregister(sc->mii_bus);
 	mdiobus_free(sc->mii_bus);
 	iounmap(sc->sbm_base);
 	free_netdev(dev);
-- 
1.6.6.1

^ permalink raw reply related

* Re: [PATCH 17/17] sfc: Create multiple TX queues
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483043.2549.33.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:43 +0100

> Create a core TX queue and 2 hardware TX queues for each channel.
> If separate_tx_channels is set, create equal numbers of RX and TX
> channels instead.
> 
> Rewrite the channel and queue iteration macros accordingly.
> Eliminate efx_channel::used_flags as redundant.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 16/17] sfc: Test only the first pair of TX queues
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483030.2549.32.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:30 +0100

> This makes no immediate difference, but we definitely do not want
> to test all TX queues once we allocate a pair of TX queues to each
> channel.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 15/17] sfc: Add Siena PHY BIST and cable diagnostic support
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483022.2549.31.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:22 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 14/17] sfc: Clean up efx_nic::irq_zero_count
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272483000.2549.30.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:30:00 +0100

> There is no need for this to be unsigned long; make it unsigned int.
> It does need a line in kernel-doc, so add that.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 13/17] sfc: Add necessary parentheses to macro definitions in net_driver.h
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482990.2549.29.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:50 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 12/17] sfc: Break NAPI processing after one ring-full of TX completions
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482982.2549.28.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:42 +0100

> Currently TX completions do not count towards the NAPI budget.  This
> means a continuous stream of TX completions can cause the polling
> function to loop indefinitely with scheduling disabled.  To avoid
> this, follow the common practice of reporting the budget spent after
> processing one ring-full of TX completions.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 11/17] sfc: Set PERIODIC_NOEVENT flag for MC_CMD_MAC_STATS
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482972.2549.27.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:32 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> When set, an event is not sent whenever periodic MAC statistics are
> raised.  This avoids unnecessary wake-ups.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 10/17] sfc: Update MCDI protocol definitions
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482954.2549.26.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:14 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 09/17] sfc: Enable IPv6 RSS using random key for Toeplitz hash
From: David Miller @ 2010-04-28 19:46 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482942.2549.25.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:29:02 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 08/17] sfc: Read MEM_STAT for SRM_PERR as well as MEM_PERR errors
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482932.2549.24.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:52 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Parity errors in different blocks of SRAM may set one of two different
> interrupt flags.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 07/17] sfc: Log specific message for failure of NVRAM self-test
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482916.2549.23.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:36 +0100

> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 06/17] sfc: Extend the legacy interrupt workarounds
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482907.2549.22.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:27 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Siena has two problems with legacy interrupts:
>   1. There is no synchronisation between the ISR read completion,
>      and the interrupt deassert message.
>   2. A downstream read at the "wrong" moment can return 0, and
>      suppress generating the next interrupt.
> 
> Falcon should suffer from both of these, and it appears it does.
> Enable EFX_WORKAROUND_15783 on Falcon as well.
> 
> Also, when we see queues == 0, ensure we always schedule or rearm
> every event queue.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 05/17] sfc: Reconfigure the XAUI serdes after an EM reset
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482890.2549.21.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:28:10 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Fix a regression introduced in d3245b28ef2a45ec4e115062a38100bd06229289
> "sfc: Refactor link configuration".
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 04/17] sfc: Stop masking out XGMII faults over reconfigures
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482874.2549.20.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:27:54 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> The aim of this code was to avoid a spurious XGMII fault over a MAC
> reconfigure. It's less relevant now that the PHY reconfigure isn't
> called from the MAC reconfigure.
> 
> After applying this patch, our link stress test passed 48 hours of
> testing without ever resetting the PHY.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 03/17] sfc: Handle serious errors in exactly one interrupt handler
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482856.2549.19.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:27:36 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> 'Fatal' errors set an interrupt flag associated with a specific event
> queue; only read the syndrome vector if we see that queue's flag set
> (legacy interrupts) or in the interrupt handler for that queue (MSI).
> 
> Do not ignore an interrupt if the fatal error flag is set but specific
> error flags are all zero.  Even if we don't schedule a reset, we must
> respect the queue mask and rearm the appropriate event queues.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 02/17] sfc: Consistently report short MCDI responses as EIO
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482834.2549.18.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:27:14 +0100

> In some cases failing functions were returning 0 which is obviously wrong.
> In other cases they were returning inappropriate error codes.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH 01/17] sfc: Ignore parity errors in the other port's SRAM
From: David Miller @ 2010-04-28 19:45 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272482722.2549.17.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 28 Apr 2010 20:25:22 +0100

> From: Steve Hodgson <shodgson@solarflare.com>
> 
> Siena has a separate SRAM bank for each port.  On single-port boards
> these can be merged together, so each port has an interrupt flag for
> parity errors in the other port's SRAM.  Currently we do not enable
> such merging and should mask this interrupt source.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage.
From: Eric Dumazet @ 2010-04-28 19:38 UTC (permalink / raw)
  To: paulmck
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, netdev, Jens Axboe, Gui Jianfeng,
	Li Zefan, Johannes Berg, shemminger
In-Reply-To: <20100428175426.GK2540@linux.vnet.ibm.com>

Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit :
> On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote:
> > This one occurred during the wakeup from suspend to RAM.
> > 
> > [  984.724697] [ INFO: suspicious rcu_dereference_check() usage. ]
> > [  984.724700] ---------------------------------------------------
> > [  984.724703] include/linux/fdtable.h:88 invoked
> > rcu_dereference_check() without protection!
> > [  984.724706]
> > [  984.724707] other info that might help us debug this:
> > [  984.724708]
> > [  984.724711]
> > [  984.724711] rcu_scheduler_active = 1, debug_locks = 1
> > [  984.724714] no locks held by dbus-daemon/4680.
> > [  984.724717]
> > [  984.724717] stack backtrace:
> > [  984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33
> > [  984.724724] Call Trace:
> > [  984.724734]  [<ffffffff81074556>] lockdep_rcu_dereference+0x9d/0xa6
> > [  984.724740]  [<ffffffff810fc785>] fcheck_files+0xb1/0xc9
> > [  984.724745]  [<ffffffff810fc7f5>] fget_light+0x35/0xab
> > [  984.724751]  [<ffffffff81433e1b>] ? sock_poll_wait+0x13/0x18
> > [  984.724755]  [<ffffffff81433e39>] ? unix_poll+0x19/0x95
> > [  984.724762]  [<ffffffff8110aa95>] do_sys_poll+0x1ff/0x3e5
> > [  984.724766]  [<ffffffff8110a19e>] ? __pollwait+0x0/0xc7
> > [  984.724771]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724776]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724780]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724784]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724788]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724793]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724797]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724802]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724806]  [<ffffffff8110a265>] ? pollwake+0x0/0x4f
> > [  984.724812]  [<ffffffff8110ae0f>] sys_poll+0x50/0xbb
> > [  984.724818]  [<ffffffff81009d82>] system_call_fastpath+0x16/0x1b
> 
> Hmmm...  I am not convinced that this is a false positive.  Couldn't
> there be a multi-threaded process where one thread is invoking poll()
> on a UNIX socket just as another thread is calling close() on it?
> 
> The current fcheck_files() logic requires that the caller either (1) be in
> an RCU read-side critical section, (2) hold ->files_lock, or (3) passing
> in a files_struct with ->count equal to 1 (initialization or cleanup).
> 
> So I don't feel comfortable just slapping an RCU read-side critical
> section around this one, at least not unless someone who understands
> the locking says that doing so is OK.
> 
> 		

Its a single threaded program.

So fget_light() calls fcheck_files(files, fd); without rcu lock,
but some /proc/pid/fd/... user temporarly raised files->count just
before we perform the condition check.

^ permalink raw reply

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v3)
From: Neil Horman @ 2010-04-28 19:37 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428174711.GC4818@hmsreliant.think-freely.org>

Ok, version 3.

Change notes:

1) As per our discussion, uses a fixed size alloation stragetgy, allocating only
a mtu sized chunk (up to 1500 bytes).

2) add sctp_init_cause_fixed and sctp_addto_chunk_fixed helpers that, instead of
oopsing on insufficient data in the skb, instead simple fail to preform the
copy, and return an appropriate error code.

Summary:
	Recently, it was reported to me that the kernel could oops in the
following way:

<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU:    0
<5> EIP:    0060:[<c02bff27>]    Not tainted VLI
<5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
<5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
<5> ds: 007b   es: 007b   ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d 
<5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490 
<5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004 
<5> Call Trace:
<5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5>  [<c01555a4>] cache_grow+0x140/0x233
<5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5>  [<c02d005e>] nf_iterate+0x40/0x81
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
<5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e103e>] ip_rcv+0x334/0x3b4
<5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5>  [<c02c67a4>] process_backlog+0x6c/0xd9
<5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
<5>  [<c012a7b1>] __do_softirq+0x35/0x79
<5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5>  [<c01094de>] do_softirq+0x46/0x4d

Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.

The problem is in sctp_process_unk_param:
if (NULL == *errp)
	*errp = sctp_make_op_error_space(asoc, chunk,
					 ntohs(chunk->chunk_hdr->length));

	if (*errp) {
		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
				 WORD_ROUND(ntohs(param.p->length)));
		sctp_addto_chunk(*errp,
			WORD_ROUND(ntohs(param.p->length)),
				  param.v);

When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.

The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173

This fix solves the problem by allowing our implementation to only report a
fixed number of errors.  When we encounter an error in parameter processing we
allocate a chunk that is min(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT), limiting
our error reporting to a single mtu sized chunk.  Parameter errors that grow
beyond that value are discarded.

I've tested this fix using the reproducer that I was provided (send an init chunk to a
listening sctp socket with a few dozen parameters of type 0xc001 and length 4),
and it solves the problem nicely.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/net/sctp/structs.h |    1 
 net/sctp/sm_make_chunk.c   |   70 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 62 insertions(+), 9 deletions(-)


diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ff30177..597f8e2 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -778,6 +778,7 @@ int sctp_user_addto_chunk(struct sctp_chunk *chunk, int off, int len,
 			  struct iovec *data);
 void sctp_chunk_free(struct sctp_chunk *);
 void  *sctp_addto_chunk(struct sctp_chunk *, int len, const void *data);
+void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len, const void *data);
 struct sctp_chunk *sctp_chunkify(struct sk_buff *,
 				 const struct sctp_association *,
 				 struct sock *);
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index f592163..9623112 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -107,7 +107,7 @@ static const struct sctp_paramhdr prsctp_param = {
 	cpu_to_be16(sizeof(struct sctp_paramhdr)),
 };
 
-/* A helper to initialize to initialize an op error inside a
+/* A helper to initialize an op error inside a
  * provided chunk, as most cause codes will be embedded inside an
  * abort chunk.
  */
@@ -124,6 +124,29 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
 	chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(sctp_errhdr_t), &err);
 }
 
+/* A helper to initialize an op error inside a
+ * provided chunk, as most cause codes will be embedded inside an
+ * abort chunk.  Differs from sctp_init_cause in that it won't oops
+ * if there isn't enough space in the op error chunk
+ */
+int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
+		      size_t paylen)
+{
+	sctp_errhdr_t err;
+	__u16 len;
+
+	/* Cause code constants are now defined in network order.  */
+	err.cause = cause_code;
+	len = sizeof(sctp_errhdr_t) + paylen;
+	err.length  = htons(len);
+
+	if (skb_tailroom(chunk->skb) >  len)
+		return -ENOSPC;
+	chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk,
+						     sizeof(sctp_errhdr_t),
+						     &err);
+	return 0;
+}
 /* 3.3.2 Initiation (INIT) (1)
  *
  * This chunk is used to initiate a SCTP association between two
@@ -1131,6 +1154,24 @@ nodata:
 	return retval;
 }
 
+/* Create an Operation Error chunk of a fixed size,
+ * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
+ * This is a helper function to allocate an error chunk for
+ * for those invalid parameter codes in which we may not want
+ * to report all the errors, if the incomming chunk is large
+ */
+static inline struct sctp_chunk *sctp_make_op_error_fixed(
+	const struct sctp_association *asoc,
+	const struct sctp_chunk *chunk)
+{
+	size_t size = asoc ? asoc->pathmtu : 0;
+
+	if (size > SCTP_DEFAULT_MAXSEGMENT)
+		size = SCTP_DEFAULT_MAXSEGMENT;
+
+	return sctp_make_op_error_space(asoc, chunk, size);
+}
+
 /* Create an Operation Error chunk.  */
 struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
 				 const struct sctp_chunk *chunk,
@@ -1373,6 +1414,18 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
 	return target;
 }
 
+/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
+ * space in the chunk
+ */
+void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
+			     int len, const void *data)
+{
+	if (skb_tailroom(chunk->skb) > len)
+		return sctp_addto_chunk(chunk, len, data);
+	else
+		return NULL;
+}
+
 /* Append bytes from user space to the end of a chunk.  Will panic if
  * chunk is not big enough.
  * Returns a kernel err value.
@@ -1793,9 +1846,9 @@ static int sctp_process_missing_param(const struct sctp_association *asoc,
 	if (*errp) {
 		report.num_missing = htonl(1);
 		report.type = paramtype;
-		sctp_init_cause(*errp, SCTP_ERROR_MISS_PARAM,
-				sizeof(report));
-		sctp_addto_chunk(*errp, sizeof(report), &report);
+		sctp_init_cause_fixed(*errp, SCTP_ERROR_MISS_PARAM,
+				      sizeof(report));
+		sctp_addto_chunk_fixed(*errp, sizeof(report), &report);
 	}
 
 	/* Stop processing this chunk. */
@@ -1813,7 +1866,7 @@ static int sctp_process_inv_mandatory(const struct sctp_association *asoc,
 		*errp = sctp_make_op_error_space(asoc, chunk, 0);
 
 	if (*errp)
-		sctp_init_cause(*errp, SCTP_ERROR_INV_PARAM, 0);
+		sctp_init_cause_fixed(*errp, SCTP_ERROR_INV_PARAM, 0);
 
 	/* Stop processing this chunk. */
 	return 0;
@@ -1976,13 +2029,12 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
 		 * returning multiple unknown parameters.
 		 */
 		if (NULL == *errp)
-			*errp = sctp_make_op_error_space(asoc, chunk,
-					ntohs(chunk->chunk_hdr->length));
+			*errp = sctp_make_op_error_fixed(asoc, chunk);
 
 		if (*errp) {
-			sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
+			sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
 					WORD_ROUND(ntohs(param.p->length)));
-			sctp_addto_chunk(*errp,
+			sctp_addto_chunk_fixed(*errp,
 					WORD_ROUND(ntohs(param.p->length)),
 					param.v);
 		} else {

^ permalink raw reply related

* Re: [net-next-2.6 PATCH 1/2] Add netdev port-profile support (take III, was iovnl)
From: Arnd Bergmann @ 2010-04-28 19:37 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw, Jens Osterkamp
In-Reply-To: <C7FDD272.2C86D%scofeldm@cisco.com>

On Wednesday 28 April 2010, Scott Feldman wrote:
> On 4/28/10 6:13 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
> > What I could imagine to unify this is something like
> > 
> >    ip port_profile set DEVICE [ { pre_associate | pre_associate_rr } ]
> >                               { name PORT-PROFILE | vsi MGR:VTID:VER }
> >                                     mac LLADDR
> >    [ vlan VID ]
> >                                     [ host_uuid HOST_UUID ]
> >                                     [ client_uuid CLIENT_UUID ]
> >                                     [ client_name CLIENT_NAME ]
> >    ip port_profile del DEVICE [ mac LLADDR [ vlan VID ] ]
> >    ip port_profile show DEVICE
> 
> Arnd, can someone test this with VDP today?  I don't have access to that
> equipment so it's difficult for me to fully test the unified patch.  I can
> test the previous patch with enic easily because I have access to production
> systems.  I'd like to make sure someone can test this with VDP before I
> respin the patch one more time.

Sorry, but I don't have access to production hardware at this time. Jens wants to
implement both sides so we can test this in simulation mode, but it's not done
yet.

	Arnd

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2010-04-28 19:37 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Mainly I'm pushing this to revert the TCP bind() bug fix, it's
causing too much fallout to try and sort out this late in the
-rc series.

Bunch of OOPS'ers for SCTP, Intel chip fixes (lack of PHY powerdown in
ixgbe lets crap into chip FIFO wedging the chip completely, etc.), a
regression fix for tg3 on systems that do run out of MSI vectors, bnx2
driver fixes (lost MSI-X vectors, 'scheduling while atomic' fix) and
hopefully r8169 MAC address programming is solid now.

Please pull, thanks a lot!

The following changes since commit b91ce4d14a21fc04d165be30319541e0f9204f15:
  Linus Torvalds (1):
        Merge git://git.kernel.org/.../davem/net-2.6

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Andre Detsch (2):
      tg3: Fix INTx fallback when MSI fails
      cxgb3: Wait longer for control packets on initialization

Andreas Hartmann (1):
      drivers/usb/net/kaweth.c: add device "Allied Telesyn AT-USB10 USB Ethernet Adapter"

Andy Fleming (1):
      gianfar: Wait for both RX and TX to stop

Ben Hutchings (3):
      sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
      sfc: Always close net device at the end of a disabling reset
      sfc: Change falcon_probe_board() to fail for unsupported boards

Bruce Allan (1):
      e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata

Dan Carpenter (2):
      ipheth: potential null dereferences on error path
      bluetooth: handle l2cap_create_connless_pdu() errors

David S. Miller (1):
      Revert "tcp: bind() fix when many ports are bound"

Elina Pasheva (1):
      net/usb: add sierra_net.c driver

Ken Kawasaki (1):
      smc91c92_cs: spin_unlock_irqrestore before calling smc_interrupt()

Michael Chan (3):
      bnx2: Fix lost MSI-X problem on 5709 NICs.
      bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
      bnx2: Update version to 2.0.9.

Peter Waskiewicz (1):
      ixgbe: Power down PHY during driver resets

Stefan Schmidt (1):
      ieee802154: Fix oops during ieee802154_sock_ioctl

Torgny Johansson (1):
      cdc_ether: fix autosuspend for mbm devices

Vlad Yasevich (3):
      sctp: fix potential reference of a freed pointer
      sctp: per_cpu variables should be in bh_disabled section
      sctp: Fix oops when sending queued ASCONF chunks

Wei Yongjun (2):
      sctp: avoid irq lock inversion while call sk->sk_data_ready()
      sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set

YOSHIFUJI Hideaki / 吉藤英明 (1):
      bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only.

françois romieu (2):
      r8169: failure to enable mwi should not be fatal
      r8169: more broken register writes workaround

 drivers/net/bnx2.c               |   48 +-
 drivers/net/cxgb3/cxgb3_main.c   |    2 +-
 drivers/net/e1000e/82571.c       |   20 +-
 drivers/net/e1000e/e1000.h       |    5 +-
 drivers/net/e1000e/netdev.c      |   70 ++-
 drivers/net/gianfar.c            |    6 +-
 drivers/net/ixgbe/ixgbe_82599.c  |   62 ++-
 drivers/net/ixgbe/ixgbe_main.c   |   22 +-
 drivers/net/ixgbe/ixgbe_type.h   |    2 +
 drivers/net/pcmcia/smc91c92_cs.c |   31 +-
 drivers/net/r8169.c              |   32 +-
 drivers/net/sfc/efx.c            |    4 +-
 drivers/net/sfc/falcon.c         |    4 +-
 drivers/net/sfc/falcon_boards.c  |   13 +-
 drivers/net/sfc/nic.h            |    2 +-
 drivers/net/sfc/siena.c          |   13 +-
 drivers/net/tg3.c                |    1 +
 drivers/net/usb/Kconfig          |   10 +
 drivers/net/usb/Makefile         |    1 +
 drivers/net/usb/cdc_ether.c      |    1 +
 drivers/net/usb/ipheth.c         |   15 +-
 drivers/net/usb/kaweth.c         |    1 +
 drivers/net/usb/sierra_net.c     | 1001 ++++++++++++++++++++++++++++++++++++++
 include/net/sctp/command.h       |    1 +
 include/net/sctp/sctp.h          |    1 +
 net/bluetooth/l2cap.c            |    5 +-
 net/bridge/br_multicast.c        |    6 +-
 net/ieee802154/af_ieee802154.c   |    3 +
 net/ipv4/inet_connection_sock.c  |   16 +-
 net/ipv6/inet6_connection_sock.c |   15 +-
 net/sctp/associola.c             |    6 +-
 net/sctp/endpointola.c           |    1 +
 net/sctp/sm_make_chunk.c         |   32 +-
 net/sctp/sm_sideeffect.c         |   26 +
 net/sctp/sm_statefuns.c          |    8 +-
 net/sctp/socket.c                |   14 +-
 36 files changed, 1312 insertions(+), 188 deletions(-)
 create mode 100644 drivers/net/usb/sierra_net.c

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] Add netdev port-profile support (take III, was iovnl)
From: Arnd Bergmann @ 2010-04-28 19:33 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw
In-Reply-To: <C7FDC3B8.2C7EB%scofeldm@cisco.com>

On Wednesday 28 April 2010, Scott Feldman wrote:
> On 4/28/10 6:13 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > 
> > We will need a few more options to cover draft VDP in addition to the protocol
> > your NIC is using. I still think it's possible to use the same interface
> > for both, but the differences are obviously showing.
> > 
> > The missing bits that I can see so far are:
> > 
> > - You only have 'get' and 'set'. We will also need a 'unset' or 'delete'
> >   option in order to get rid of a port profile association.
> 
> That's there in my patch.  If you don't specify anything after port-profile
> keyword, it's an unset.  See extra "[" and "]" above.  Will that work for
> you?

Well, this won't work if you want to have multiple slave interfaces connected
to a single master and record the port profile in the master.

> > - VDP has three different ways to 'set' a port profile: 'associate',
> >   'pre-associate with resource reservation' and 'pre-associate without
> >   resource reservation'. This could become an extra option flag.
> 
> Ok, let's add an option flag bit field.  I'm not sure how that looks from
> the iproute2 cmd line.  Would you take a stab at defining these?
> 
> > - Instead of a port profile name, VDP specifies a tuple like
> >    struct vsi_associate {
> > unsigned char VSI_Mgr_ID;       /* VSI manager ID */
> > unsigned char VSI_Type_ID[3];   /* 24 bit VSI Type ID */
> > unsigned char VSI_Type_Version; /* VSI Type version */
> >    };
> >    I'm not sure how to deal with that best, but there needs to be
> >    some parsing of these numbers.
> 
> PORT-PROFILE above is a u8* array.  You could decide to encode the tuple in
> a string, e.g. "1.12345.1", and let the receiver parse it?  Or pack it in as
> binary.  PORT-PROFILE for us is just a string identifier, e.g. "corp-net-10"
> or "joes-garage".

I guess either one would work, but I'd prefer to do the parsing at
the front-end and passing it as binary to avoid libvirt encoding
it as ascii and lldpad (or the kernel) parsing the data again, which
would be more error-prone.

Another alternative would be to make this a distinct argument (or even
three of them) and only allow passing one or the other. That would
also implicitly choose the protocol (VDP or port extender).
 
> > - VDP requires a vlan ID to be part of the association, in addition to
> >   the MAC address. In theory, we could have multiple tuples of MAC+VLAN
> >   addresses, but we can probably just associate each tuple separately
> >   and ignore that part of the standard.
> 
> I don't think I have enough information to spec this out for this item.
> Would you take a stab at how this would look in the struct and how it would
> look from the iproute2 cmd line?  (Note I'm using iproute2 cmd line for
> illustrative purposes, but the sender of the msg could be something like
> libvirt).

Just adding the VID would be done I wrote at the end of my last mail.
Adding multiple VLAN/MAC pairs is probably not necessary if we can
do multiple associations.

> > - we have a set of possible error conditions that can be returned by
> >   the switch (invalid format, insufficient resources, unknown VTID,
> >   VTID violation, VTID verison violation, out of sync). It should be
> >   possible to return each of these to the user with 'get'.
> 
> There is a status code in the get cmd as defined in my patch.  I have it as
> a u8 with some enum codes.  Can we add to the enum code list?  Or do you
> want to return a full string?  Our requirements are we return one of:
> {success, error, in-progress}.

enum is fine, but I think it would be good to use the same numbers
as the VDP standard where possible. Maybe we could use two bytes,
the first one for the overall status (success, error, in progress)
and the second one for the specific error (as above)?

> >> Since we're using netlink sockets, the receiver of the RTM_SETLINK msg can
> >> be in kernel- or user-space.  For kernel-space recipient, rtnetlink.c, the
> >> new ndo_set_port_profile netdev op is called to set the port-profile.
> >> User-space recipients can decide how they propagate the msg to the switch.
> >> There is also a RTM_GETLINK cmd to to return port-profile setting of an
> >> interface and to also return the status of the last port-profile.
> > 
> > More on a stylistic note, I'm not convinced that using RTM_SETLINK/GETLINK
> > is the right interface for this, unlike the 'ip link set DEV vf ...' stuff,
> > because it seems to suggest that this is an option of the adapter itself.
> > I actually liked the iovnl family better in this regard, because it kept
> > the protocols separate.
> 
> Wait a second...I abandoned iovnl and moved to if_link based on suggestions
> from you and others.  On 4/21/10 2:13 PM, "Arnd Bergmann" <arnd@arndb.de>
> wrote:
>
>    > Right. My preference would probably be make these a subcategory of
>    > the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
> 
> My latest with if_link fits best with what we're trying to do with enic.

Sorry for the misunderstanding, I was probably not clear enough. This
was a side-discussion about how we should create and destroy virtual
interfaces on IOV capable adapters. My point was that this should be
_separate_ from the port profile association. For creating a
macvlan/macvtap device, we already use RTM_NEWLINK, which does not
associate the device with the switch, so I suggested that for creating
a virtual interface with hardware support we should do the same thing,
but leave the association somewhere else. iovnl sounds like a good
place for that.

> Note the current interface is per netdev interface (a link), where the
> netdev interface could be the adapter itself or any other netdev such as
> macvlan, bond, etc.

We certainly missed each others arguments here, see my other mail.
In order to do the port profile association from software, we
definitely need the master device (nic, PF, bond, bridge, ...) so
we have a way to communicate to the switch, not the slave device
(VF, tap, macvtap, ...) that we are trying to associate.

> > What I could imagine to unify this is something like
> > 
> >    ip port_profile set DEVICE [ { pre_associate | pre_associate_rr } ]
> >                               { name PORT-PROFILE | vsi MGR:VTID:VER }
> >                                     mac LLADDR
> >    [ vlan VID ]
> >                                     [ host_uuid HOST_UUID ]
> >                                     [ client_uuid CLIENT_UUID ]
> >                                     [ client_name CLIENT_NAME ]
> >    ip port_profile del DEVICE [ mac LLADDR [ vlan VID ] ]
> >    ip port_profile show DEVICE
> 
> If we want to break port_profile out into it's own ip cmd, I'm ok with that.
> What you have above would work for enic.  The netdev would have these ops:
> 
>     ndo_set_port_profile
>     ndo_get_port_profile
>     ndo_del_port_profile
> 
> Sounds OK?

That sounds good.

	Arnd

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox