Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] smsc911x: Set Ethernet EEPROM size to supported device's size
From: John Faith @ 2010-11-01 21:30 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, John Faith

The SMSC911x supports 128 x 8-bit EEPROMs.  Increase the EEPROM size
so more than just the MAC address can be stored.

Signed-off-by: John Faith <jfaith7@gmail.com>
---
 drivers/net/smsc911x.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/smsc911x.h b/drivers/net/smsc911x.h
index 016360c..8a79585 100644
--- a/drivers/net/smsc911x.h
+++ b/drivers/net/smsc911x.h
@@ -22,7 +22,7 @@
 #define __SMSC911X_H__
 
 #define TX_FIFO_LOW_THRESHOLD	((u32)1600)
-#define SMSC911X_EEPROM_SIZE	((u32)7)
+#define SMSC911X_EEPROM_SIZE	((u32)128)
 #define USE_DEBUG		0
 
 /* This is the maximum number of packets to be received every
-- 
1.7.0.4


^ permalink raw reply related

* Re: Routing over multiple interfaces
From: Eric Dumazet @ 2010-11-01 21:35 UTC (permalink / raw)
  To: David Miller; +Cc: dwmw2, netdev, uweber
In-Reply-To: <20101101.141638.116372747.davem@davemloft.net>

Le lundi 01 novembre 2010 à 14:16 -0700, David Miller a écrit :
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Mon, 01 Nov 2010 17:12:02 -0400
> 
> > But when I do a large upload, I find that the kernel is only ever using
> > a *single* link at a time, rather than both. How can I make it use
> > *both* links? It's fine to confine each flow to a single link if it
> > doesn't saturate that link... but once the queue is full, it should
> > overflow onto the other device.
> 
> Once a TCP socket gets a routing cache entry, that's what it uses
> for the rest of the life of the connection.
> 
> The multi-pathing decision happens at the time the routing
> cache entry is created.
> 
> What you want is multi-path routing support in the routing cache.
> 
> We used to have that, but the guy who implemented it (after bugging
> me to integrate it for 4 months straight, non-stop) just did a code
> dump and then disappeared and fixed none of the serious fundamental
> problems which existed in his code.
> 
> After a year of no action, I simply tore out all of his code.
> 
> More recently Ulrich Weber gave a presentation at netfilter workshop
> on some uplink load balancing work he is doing, you might have
> a look at his talk and get in contact with him:
> 
> http://people.astaro.com/uweber/uplink_balancing.pdf
> --

Astaro case is a bit different, since links have different IP addresses,
and a single upload uses a single link anyway because of hashing or
policy that selects one source IP address.

David W. probably wants to use teql or some bonding ?

# tc qdisc add dev ppp0 root teql0
# tc qdisc add dev ppp1 root teql0
# ip link set dev teql0 up
# ip route add default src 90.155.92.214 dev teql0





^ permalink raw reply

* Re: [PATCH 4/4] Ethtool: convert get_tx_csum/set_tx_csum calls to hw_features flags
From: Ben Hutchings @ 2010-11-01 21:38 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: netdev, e1000-devel, Steve Glendinning, Greg Kroah-Hartman,
	Rasesh Mody, Debashis Dutt, Kristoffer Glembo, linux-driver,
	linux-net-drivers
In-Reply-To: <904bc8c98062fc793573ee6bc120ed0b8af292d2.1288496405.git.mirq-linux@rere.qmqm.pl>

On Sun, 2010-10-31 at 02:09 +0200, Michał Mirosław wrote:
[...]
> diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
> index 9f6aeef..5ba4535 100644
> --- a/drivers/net/dm9000.c
> +++ b/drivers/net/dm9000.c
> @@ -132,7 +132,6 @@ typedef struct board_info {
>  	u32		wake_state;
>  
>  	int		rx_csum;
> -	int		can_csum;
>  	int		ip_summed;
>  } board_info_t;
>  
> @@ -480,7 +479,7 @@ static int dm9000_set_rx_csum_unlocked(struct net_device *dev, uint32_t data)
>  {
>  	board_info_t *dm = to_dm9000_board(dev);
>  
> -	if (dm->can_csum) {
> +	if (dev->hw_features & NETIF_F_IP_CSUM) {

This is a bit ugly because can_csum is being used to indicate RX
checksum offload capability whereas the checksum feature flags logically
indicate TX checksum offload capability.  Of course, the two hardware
features are highly correlated!

[...]
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index 9b0e598..95e8b7a 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
[...]
> @@ -1077,12 +1039,18 @@ static int __ethtool_set_sg(struct net_device *dev, u32 data)
>  	return 0;
>  }
>  
> +static u32 ethtool_get_tx_csum(struct net_device *dev)
> +{
> +	return (dev->features & (NETIF_F_ALL_CSUM|NETIF_F_SCTP_CSUM)) != 0;
> +}
> +
[...]

It seems to me that NETIF_F_SCTP_CSUM should be added to
NETIF_F_ALL_CSUM, though that may cause problems in some other places
NETIF_F_ALL_CSUM is used.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: Routing over multiple interfaces
From: Benjamin LaHaise @ 2010-11-01 21:21 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev
In-Reply-To: <1288645922.5977.41.camel@macbook.infradead.org>

On Mon, Nov 01, 2010 at 05:12:02PM -0400, David Woodhouse wrote:
> I worked around this earlier today by splitting the large file I needed
> to upload into two parts, shifting one of them to a *different* machine,
> connecting that to the VPN too and then uploading the two parts
> separately. But that isn't really a viable option in the general case.

Have a look at eql.

		-ben

^ permalink raw reply

* [PATCH 1/2] caif: Bugfix for socket priority, bindtodev and dbg channel.
From: Sjur Braendeland @ 2010-11-01 21:52 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: André Carvalho de Matos, Sjur Braendeland

From: André Carvalho de Matos <andre.carvalho.matos@stericsson.com>

Changes:
o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute priority instead.

o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute ifindex instead.

o Wrong assert statement for RFM layer segmentation.

o CAIF Debug channels was not working over SPI, caif_payload_info
  containing padding info must be initialized.

o Check on pointer before dereferencing when unregister dev in caif_dev.c

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
---
 include/net/caif/caif_dev.h |    4 +-
 include/net/caif/cfcnfg.h   |    8 +++---
 net/caif/caif_config_util.c |   13 +++++++++--
 net/caif/caif_dev.c         |    2 +
 net/caif/caif_socket.c      |   45 ++++++++++++++----------------------------
 net/caif/cfcnfg.c           |   17 ++++++---------
 net/caif/cfdbgl.c           |   14 +++++++++++++
 net/caif/cfrfml.c           |    2 +-
 8 files changed, 55 insertions(+), 50 deletions(-)

diff --git a/include/net/caif/caif_dev.h b/include/net/caif/caif_dev.h
index 6da573c..8eff83b 100644
--- a/include/net/caif/caif_dev.h
+++ b/include/net/caif/caif_dev.h
@@ -28,7 +28,7 @@ struct caif_param {
  * @sockaddr:		Socket address to connect.
  * @priority:		Priority of the connection.
  * @link_selector:	Link selector (high bandwidth or low latency)
- * @link_name:		Name of the CAIF Link Layer to use.
+ * @ifindex:		kernel index of the interface.
  * @param:		Connect Request parameters (CAIF_SO_REQ_PARAM).
  *
  * This struct is used when connecting a CAIF channel.
@@ -39,7 +39,7 @@ struct caif_connect_request {
 	struct sockaddr_caif sockaddr;
 	enum caif_channel_priority priority;
 	enum caif_link_selector link_selector;
-	char link_name[16];
+	int ifindex;
 	struct caif_param param;
 };
 
diff --git a/include/net/caif/cfcnfg.h b/include/net/caif/cfcnfg.h
index bd646fa..f688478 100644
--- a/include/net/caif/cfcnfg.h
+++ b/include/net/caif/cfcnfg.h
@@ -139,10 +139,10 @@ struct dev_info *cfcnfg_get_phyid(struct cfcnfg *cnfg,
 		     enum cfcnfg_phy_preference phy_pref);
 
 /**
- * cfcnfg_get_named() - Get the Physical Identifier of CAIF Link Layer
+ * cfcnfg_get_id_from_ifi() - Get the Physical Identifier of ifindex,
+ * 			it matches caif physical id with the kernel interface id.
  * @cnfg:	Configuration object
- * @name:	Name of the Physical Layer (Caif Link Layer)
+ * @ifi:	ifindex obtained from socket.c bindtodevice.
  */
-int cfcnfg_get_named(struct cfcnfg *cnfg, char *name);
-
+int cfcnfg_get_id_from_ifi(struct cfcnfg *cnfg, int ifi);
 #endif				/* CFCNFG_H_ */
diff --git a/net/caif/caif_config_util.c b/net/caif/caif_config_util.c
index 76ae683..d522d8c 100644
--- a/net/caif/caif_config_util.c
+++ b/net/caif/caif_config_util.c
@@ -16,11 +16,18 @@ int connect_req_to_link_param(struct cfcnfg *cnfg,
 {
 	struct dev_info *dev_info;
 	enum cfcnfg_phy_preference pref;
+	int res;
+
 	memset(l, 0, sizeof(*l));
-	l->priority = s->priority;
+	/* In caif protocol low value is high priority */
+	l->priority = CAIF_PRIO_MAX - s->priority + 1;
 
-	if (s->link_name[0] != '\0')
-		l->phyid = cfcnfg_get_named(cnfg, s->link_name);
+	if (s->ifindex != 0){
+		res = cfcnfg_get_id_from_ifi(cnfg, s->ifindex);
+		if (res < 0)
+			return res;
+		l->phyid = res;
+	}
 	else {
 		switch (s->link_selector) {
 		case CAIF_LINK_HIGH_BANDW:
diff --git a/net/caif/caif_dev.c b/net/caif/caif_dev.c
index b99369a..a42a408 100644
--- a/net/caif/caif_dev.c
+++ b/net/caif/caif_dev.c
@@ -307,6 +307,8 @@ static int caif_device_notify(struct notifier_block *me, unsigned long what,
 
 	case NETDEV_UNREGISTER:
 		caifd = caif_get(dev);
+		if (caifd == NULL)
+			break;
 		netdev_info(dev, "unregister\n");
 		atomic_set(&caifd->state, what);
 		caif_device_destroy(dev);
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 2eca2dd..1bf0cf5 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -716,8 +716,7 @@ static int setsockopt(struct socket *sock,
 {
 	struct sock *sk = sock->sk;
 	struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
-	int prio, linksel;
-	struct ifreq ifreq;
+	int linksel;
 
 	if (cf_sk->sk.sk_socket->state != SS_UNCONNECTED)
 		return -ENOPROTOOPT;
@@ -735,33 +734,6 @@ static int setsockopt(struct socket *sock,
 		release_sock(&cf_sk->sk);
 		return 0;
 
-	case SO_PRIORITY:
-		if (lvl != SOL_SOCKET)
-			goto bad_sol;
-		if (ol < sizeof(int))
-			return -EINVAL;
-		if (copy_from_user(&prio, ov, sizeof(int)))
-			return -EINVAL;
-		lock_sock(&(cf_sk->sk));
-		cf_sk->conn_req.priority = prio;
-		release_sock(&cf_sk->sk);
-		return 0;
-
-	case SO_BINDTODEVICE:
-		if (lvl != SOL_SOCKET)
-			goto bad_sol;
-		if (ol < sizeof(struct ifreq))
-			return -EINVAL;
-		if (copy_from_user(&ifreq, ov, sizeof(ifreq)))
-			return -EFAULT;
-		lock_sock(&(cf_sk->sk));
-		strncpy(cf_sk->conn_req.link_name, ifreq.ifr_name,
-			sizeof(cf_sk->conn_req.link_name));
-		cf_sk->conn_req.link_name
-			[sizeof(cf_sk->conn_req.link_name)-1] = 0;
-		release_sock(&cf_sk->sk);
-		return 0;
-
 	case CAIFSO_REQ_PARAM:
 		if (lvl != SOL_CAIF)
 			goto bad_sol;
@@ -880,6 +852,18 @@ static int caif_connect(struct socket *sock, struct sockaddr *uaddr,
 	sock->state = SS_CONNECTING;
 	sk->sk_state = CAIF_CONNECTING;
 
+	/* Check priority value comming from socket */
+	/* if priority value is out of range it will be ajusted */
+	if (cf_sk->sk.sk_priority > CAIF_PRIO_MAX)
+		cf_sk->conn_req.priority = CAIF_PRIO_MAX;
+	else if (cf_sk->sk.sk_priority < CAIF_PRIO_MIN)
+		cf_sk->conn_req.priority = CAIF_PRIO_MIN;
+	else
+		cf_sk->conn_req.priority = cf_sk->sk.sk_priority;
+
+	/*ifindex = id of the interface.*/
+	cf_sk->conn_req.ifindex = cf_sk->sk.sk_bound_dev_if;
+
 	dbfs_atomic_inc(&cnt.num_connect_req);
 	cf_sk->layer.receive = caif_sktrecv_cb;
 	err = caif_connect_client(&cf_sk->conn_req,
@@ -905,6 +889,7 @@ static int caif_connect(struct socket *sock, struct sockaddr *uaddr,
 	cf_sk->maxframe = mtu - (headroom + tailroom);
 	if (cf_sk->maxframe < 1) {
 		pr_warn("CAIF Interface MTU too small (%d)\n", dev->mtu);
+		err = -ENODEV;
 		goto out;
 	}
 
@@ -1142,7 +1127,7 @@ static int caif_create(struct net *net, struct socket *sock, int protocol,
 	set_rx_flow_on(cf_sk);
 
 	/* Set default options on configuration */
-	cf_sk->conn_req.priority = CAIF_PRIO_NORMAL;
+	cf_sk->sk.sk_priority= CAIF_PRIO_NORMAL;
 	cf_sk->conn_req.link_selector = CAIF_LINK_LOW_LATENCY;
 	cf_sk->conn_req.protocol = protocol;
 	/* Increase the number of sockets created. */
diff --git a/net/caif/cfcnfg.c b/net/caif/cfcnfg.c
index 41adafd..21ede14 100644
--- a/net/caif/cfcnfg.c
+++ b/net/caif/cfcnfg.c
@@ -173,18 +173,15 @@ static struct cfcnfg_phyinfo *cfcnfg_get_phyinfo(struct cfcnfg *cnfg,
 	return NULL;
 }
 
-int cfcnfg_get_named(struct cfcnfg *cnfg, char *name)
+
+int cfcnfg_get_id_from_ifi(struct cfcnfg *cnfg, int ifi)
 {
 	int i;
-
-	/* Try to match with specified name */
-	for (i = 0; i < MAX_PHY_LAYERS; i++) {
-		if (cnfg->phy_layers[i].frm_layer != NULL
-		    && strcmp(cnfg->phy_layers[i].phy_layer->name,
-			      name) == 0)
-			return cnfg->phy_layers[i].frm_layer->id;
-	}
-	return 0;
+	for (i = 0; i < MAX_PHY_LAYERS; i++)
+		if (cnfg->phy_layers[i].frm_layer != NULL &&
+				cnfg->phy_layers[i].ifindex == ifi)
+			return i;
+	return -ENODEV;
 }
 
 int cfcnfg_disconn_adapt_layer(struct cfcnfg *cnfg, struct cflayer *adap_layer)
diff --git a/net/caif/cfdbgl.c b/net/caif/cfdbgl.c
index 496fda9..11a2af4 100644
--- a/net/caif/cfdbgl.c
+++ b/net/caif/cfdbgl.c
@@ -12,6 +12,8 @@
 #include <net/caif/cfsrvl.h>
 #include <net/caif/cfpkt.h>
 
+#define container_obj(layr) ((struct cfsrvl *) layr)
+
 static int cfdbgl_receive(struct cflayer *layr, struct cfpkt *pkt);
 static int cfdbgl_transmit(struct cflayer *layr, struct cfpkt *pkt);
 
@@ -38,5 +40,17 @@ static int cfdbgl_receive(struct cflayer *layr, struct cfpkt *pkt)
 
 static int cfdbgl_transmit(struct cflayer *layr, struct cfpkt *pkt)
 {
+	struct cfsrvl *service = container_obj(layr);
+	struct caif_payload_info *info;
+	int ret;
+
+	if (!cfsrvl_ready(service, &ret))
+		return ret;
+
+	/* Add info for MUX-layer to route the packet out */
+	info = cfpkt_info(pkt);
+	info->channel_id = service->layer.id;
+	info->dev_info = &service->dev_info;
+
 	return layr->dn->transmit(layr->dn, pkt);
 }
diff --git a/net/caif/cfrfml.c b/net/caif/cfrfml.c
index bde8481..e2fb5fa 100644
--- a/net/caif/cfrfml.c
+++ b/net/caif/cfrfml.c
@@ -193,7 +193,7 @@ out:
 
 static int cfrfml_transmit_segment(struct cfrfml *rfml, struct cfpkt *pkt)
 {
-	caif_assert(cfpkt_getlen(pkt) >= rfml->fragment_size);
+	caif_assert(cfpkt_getlen(pkt) < rfml->fragment_size);
 
 	/* Add info for MUX-layer to route the packet out. */
 	cfpkt_info(pkt)->channel_id = rfml->serv.layer.id;
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 2/2] caif: SPI-driver bugfix - incorrect padding.
From: Sjur Braendeland @ 2010-11-01 21:52 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Sjur Brændeland
In-Reply-To: <1288648368-9062-1-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Brændeland <sjur.brandeland@stericsson.com>


Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
---
 drivers/net/caif/caif_spi.c       |   57 +++++++++++++++++++++++++++----------
 drivers/net/caif/caif_spi_slave.c |   13 ++++++--
 include/net/caif/caif_spi.h       |    2 +
 3 files changed, 53 insertions(+), 19 deletions(-)

diff --git a/drivers/net/caif/caif_spi.c b/drivers/net/caif/caif_spi.c
index 8427533..8b4cea5 100644
--- a/drivers/net/caif/caif_spi.c
+++ b/drivers/net/caif/caif_spi.c
@@ -33,6 +33,9 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Daniel Martensson<daniel.martensson@stericsson.com>");
 MODULE_DESCRIPTION("CAIF SPI driver");
 
+/* Returns the number of padding bytes for alignment. */
+#define PAD_POW2(x, pow) ((((x)&((pow)-1))==0) ? 0 : (((pow)-((x)&((pow)-1)))))
+
 static int spi_loop;
 module_param(spi_loop, bool, S_IRUGO);
 MODULE_PARM_DESC(spi_loop, "SPI running in loopback mode.");
@@ -41,7 +44,10 @@ MODULE_PARM_DESC(spi_loop, "SPI running in loopback mode.");
 module_param(spi_frm_align, int, S_IRUGO);
 MODULE_PARM_DESC(spi_frm_align, "SPI frame alignment.");
 
-/* SPI padding options. */
+/*
+ * SPI padding options.
+ * Warning: must be a base of 2 (& operation used) and can not be zero !
+ */
 module_param(spi_up_head_align, int, S_IRUGO);
 MODULE_PARM_DESC(spi_up_head_align, "SPI uplink head alignment.");
 
@@ -240,15 +246,13 @@ static ssize_t dbgfs_frame(struct file *file, char __user *user_buf,
 static const struct file_operations dbgfs_state_fops = {
 	.open = dbgfs_open,
 	.read = dbgfs_state,
-	.owner = THIS_MODULE,
-	.llseek = default_llseek,
+	.owner = THIS_MODULE
 };
 
 static const struct file_operations dbgfs_frame_fops = {
 	.open = dbgfs_open,
 	.read = dbgfs_frame,
-	.owner = THIS_MODULE,
-	.llseek = default_llseek,
+	.owner = THIS_MODULE
 };
 
 static inline void dev_debugfs_add(struct cfspi *cfspi)
@@ -337,6 +341,9 @@ int cfspi_xmitfrm(struct cfspi *cfspi, u8 *buf, size_t len)
 	u8 *dst = buf;
 	caif_assert(buf);
 
+	if (cfspi->slave && !cfspi->slave_talked)
+		cfspi->slave_talked = true;
+
 	do {
 		struct sk_buff *skb;
 		struct caif_payload_info *info;
@@ -357,8 +364,8 @@ int cfspi_xmitfrm(struct cfspi *cfspi, u8 *buf, size_t len)
 		 * Compute head offset i.e. number of bytes to add to
 		 * get the start of the payload aligned.
 		 */
-		if (spi_up_head_align) {
-			spad = 1 + ((info->hdr_len + 1) & spi_up_head_align);
+		if (spi_up_head_align > 1) {
+			spad = 1 + PAD_POW2((info->hdr_len + 1), spi_up_head_align);
 			*dst = (u8)(spad - 1);
 			dst += spad;
 		}
@@ -373,7 +380,7 @@ int cfspi_xmitfrm(struct cfspi *cfspi, u8 *buf, size_t len)
 		 * Compute tail offset i.e. number of bytes to add to
 		 * get the complete CAIF frame aligned.
 		 */
-		epad = (skb->len + spad) & spi_up_tail_align;
+		epad = PAD_POW2((skb->len + spad), spi_up_tail_align);
 		dst += epad;
 
 		dev_kfree_skb(skb);
@@ -417,14 +424,14 @@ int cfspi_xmitlen(struct cfspi *cfspi)
 		 * Compute head offset i.e. number of bytes to add to
 		 * get the start of the payload aligned.
 		 */
-		if (spi_up_head_align)
-			spad = 1 + ((info->hdr_len + 1) & spi_up_head_align);
+		if (spi_up_head_align > 1)
+			spad = 1 + PAD_POW2((info->hdr_len + 1), spi_up_head_align);
 
 		/*
 		 * Compute tail offset i.e. number of bytes to add to
 		 * get the complete CAIF frame aligned.
 		 */
-		epad = (skb->len + spad) & spi_up_tail_align;
+		epad = PAD_POW2((skb->len + spad), spi_up_tail_align);
 
 		if ((skb->len + spad + epad + frm_len) <= CAIF_MAX_SPI_FRAME) {
 			skb_queue_tail(&cfspi->chead, skb);
@@ -433,6 +440,7 @@ int cfspi_xmitlen(struct cfspi *cfspi)
 		} else {
 			/* Put back packet. */
 			skb_queue_head(&cfspi->qhead, skb);
+			break;
 		}
 	} while (pkts <= CAIF_MAX_SPI_PKTS);
 
@@ -453,6 +461,15 @@ static void cfspi_ss_cb(bool assert, struct cfspi_ifc *ifc)
 {
 	struct cfspi *cfspi = (struct cfspi *)ifc->priv;
 
+	/*
+	 * The slave device is the master on the link. Interrupts before the
+	 * slave has transmitted are considered spurious.
+	 */
+	if (cfspi->slave && !cfspi->slave_talked) {
+		printk(KERN_WARNING "CFSPI: Spurious SS interrupt.\n");
+		return;
+	}
+
 	if (!in_interrupt())
 		spin_lock(&cfspi->lock);
 	if (assert) {
@@ -465,7 +482,8 @@ static void cfspi_ss_cb(bool assert, struct cfspi_ifc *ifc)
 		spin_unlock(&cfspi->lock);
 
 	/* Wake up the xfer thread. */
-	wake_up_interruptible(&cfspi->wait);
+	if (assert)
+		wake_up_interruptible(&cfspi->wait);
 }
 
 static void cfspi_xfer_done_cb(struct cfspi_ifc *ifc)
@@ -523,7 +541,7 @@ int cfspi_rxfrm(struct cfspi *cfspi, u8 *buf, size_t len)
 		 * Compute head offset i.e. number of bytes added to
 		 * get the start of the payload aligned.
 		 */
-		if (spi_down_head_align) {
+		if (spi_down_head_align > 1) {
 			spad = 1 + *src;
 			src += spad;
 		}
@@ -564,7 +582,7 @@ int cfspi_rxfrm(struct cfspi *cfspi, u8 *buf, size_t len)
 		 * Compute tail offset i.e. number of bytes added to
 		 * get the complete CAIF frame aligned.
 		 */
-		epad = (pkt_len + spad) & spi_down_tail_align;
+		epad = PAD_POW2((pkt_len + spad), spi_down_tail_align);
 		src += epad;
 	} while ((src - buf) < len);
 
@@ -625,11 +643,20 @@ int cfspi_spi_probe(struct platform_device *pdev)
 	cfspi->ndev = ndev;
 	cfspi->pdev = pdev;
 
-	/* Set flow info */
+	/* Set flow info. */
 	cfspi->flow_off_sent = 0;
 	cfspi->qd_low_mark = LOW_WATER_MARK;
 	cfspi->qd_high_mark = HIGH_WATER_MARK;
 
+	/* Set slave info. */
+	if (!strncmp(cfspi_spi_driver.driver.name, "cfspi_sspi", 10)) {
+		cfspi->slave = true;
+		cfspi->slave_talked = false;
+	} else {
+		cfspi->slave = false;
+		cfspi->slave_talked = false;
+	}
+
 	/* Assign the SPI device. */
 	cfspi->dev = dev;
 	/* Assign the device ifc to this SPI interface. */
diff --git a/drivers/net/caif/caif_spi_slave.c b/drivers/net/caif/caif_spi_slave.c
index 2111dbf..1b9943a 100644
--- a/drivers/net/caif/caif_spi_slave.c
+++ b/drivers/net/caif/caif_spi_slave.c
@@ -36,10 +36,15 @@ static inline int forward_to_spi_cmd(struct cfspi *cfspi)
 #endif
 
 int spi_frm_align = 2;
-int spi_up_head_align = 1;
-int spi_up_tail_align;
-int spi_down_head_align = 3;
-int spi_down_tail_align = 1;
+
+/*
+ * SPI padding options.
+ * Warning: must be a base of 2 (& operation used) and can not be zero !
+ */
+int spi_up_head_align   = 1 << 1;
+int spi_up_tail_align   = 1 << 0;
+int spi_down_head_align = 1 << 2;
+int spi_down_tail_align = 1 << 1;
 
 #ifdef CONFIG_DEBUG_FS
 static inline void debugfs_store_prev(struct cfspi *cfspi)
diff --git a/include/net/caif/caif_spi.h b/include/net/caif/caif_spi.h
index ce4570d..87c3d11 100644
--- a/include/net/caif/caif_spi.h
+++ b/include/net/caif/caif_spi.h
@@ -121,6 +121,8 @@ struct cfspi {
 	wait_queue_head_t wait;
 	spinlock_t lock;
 	bool flow_stop;
+	bool slave;
+	bool slave_talked;
 #ifdef CONFIG_DEBUG_FS
 	enum cfspi_state dbg_state;
 	u16 pcmd;
-- 
1.7.0.4


^ permalink raw reply related

* Re: Routing over multiple interfaces
From: David Woodhouse @ 2010-11-01 22:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, uweber
In-Reply-To: <1288647330.2660.116.camel@edumazet-laptop>

On Mon, 2010-11-01 at 22:35 +0100, Eric Dumazet wrote:
> David W. probably wants to use teql or some bonding ?
> 
> # tc qdisc add dev ppp0 root teql0
> # tc qdisc add dev ppp1 root teql0
> # ip link set dev teql0 up
> # ip route add default src 90.155.92.214 dev teql0 

That looks very likely; thanks. I'll try it... but when I get home next
week, not from the wrong side of the Atlantic ;)

-- 
dwmw2


^ permalink raw reply

* Re: [PATCH 2.6.35-rc8] net-next: Add multiqueue support to vmxnet3 v2driver
From: Shreyas Bhatewara @ 2010-11-01 22:42 UTC (permalink / raw)
  To: David Miller
  Cc: bhutchings@solarflare.com, shemminger@vyatta.com,
	netdev@vger.kernel.org, pv-drivers@vmware.com,
	linux-kernel@vger.kernel.org
In-Reply-To: <20101015.092327.193701886.davem@davemloft.net>



On Fri, 15 Oct 2010, David Miller wrote:

> From: Shreyas Bhatewara <sbhatewara@vmware.com>
> Date: Thu, 14 Oct 2010 16:31:35 -0700
> 
> > Okay. It would be best to keep module parameters to dictate number
> > of queues till ethtool commands to do so become available/easy to
> > use (command to change number of tx queues do not exist).
> 
> No, because then you can never remove these knobs, they must stay
> forever.
> 
> And also then every other driver developer can make the same
> argument.  And then we have private knobs in every driver and
> the user experience is a complete disaster.
> 
> Instead, the onus is on you to help get the ethtool interfaces
> completed so your driver can provide the functionality it
> wants.
> 
> Not the other way around (add crap first, use the proper interface
> later whenever someone else gets around to it).
> 
> Thanks.
> 


Add multiqueue support to vmxnet3 driver

This change adds Multiqueue and thus receive side scaling support  
to vmxnet3 device driver. Number of rx queues is limited to 1 in cases 
where
- MSI is not configured or
- One MSIx vector is not available per rx queue

By default multiqueue capability is turned off and hence only 1 tx and 1 rx
queue will be initialized. enable_mq module param should be set to 
configure number of tx and rx queues equal to number of online CPUs. A 
maximum of 8 tx/rx queues are allowed for any adapter.

Signed-off-by: Shreyas Bhatewara <sbhatewara@vmware.com>

---

2nd revision of the patch.

In this revision, module params which are not strictly required have been
removed and ethtool callback handlers have been implemented instead. 
Handlers to provide # rx queues and to get/set RSS indirection table are added.
Information like Number of queues and how they share irqs is required at 
driver attach time. Adding ethtool interfaces cannot help in this regards.
Hence two module params have been introduced : enable_mq (to configure if
multiple queues should be used) and irq_share_mode to configure the way in
which irqs will be shared among queues. 


diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 3f60e0e..3ed4be6 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -44,6 +44,26 @@ MODULE_DEVICE_TABLE(pci, vmxnet3_pciid_table);
 
 static atomic_t devices_found;
 
+#define VMXNET3_MAX_DEVICES 10
+static int enable_mq[VMXNET3_MAX_DEVICES + 1] = {
+	[0 ... VMXNET3_MAX_DEVICES] = 0 };
+static int irq_share_mode[VMXNET3_MAX_DEVICES + 1] = {
+	[0 ... VMXNET3_MAX_DEVICES] = VMXNET3_INTR_BUDDYSHARE };
+
+static unsigned int num_adapters;
+module_param_array(irq_share_mode, int, &num_adapters, 0400);
+MODULE_PARM_DESC(irq_share_mode, "Comma separated list of ints, configuring "
+		 "mode in which irqs should be shared by tx and rx queues. When"
+		 " set to 0, no irqs are shared, each tx and rx queue allocate"
+		 " and use a separate irq. Set to 1, all tx queues share an irq"
+		 ". Set to 2, corresponding tx and rx queues share an irq."
+		 " Default is 2.");
+module_param_array(enable_mq, int, &num_adapters, 0400);
+MODULE_PARM_DESC(enable_mq, "Comma separated list of integers, one for each "
+		 "adapter. When set to a non-zero value, multiqueue will be "
+		 "enabled and number of tx and rx queues will be same as number"
+		 " of CPUs online. number of queues will be 1 otherwise. "
+		 "Default is 0 - multiqueue disabled.");
 
 /*
  *    Enable/Disable the given intr
@@ -107,7 +127,7 @@ static void
 vmxnet3_tq_start(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
 {
 	tq->stopped = false;
-	netif_start_queue(adapter->netdev);
+	netif_start_subqueue(adapter->netdev, tq - adapter->tx_queue);
 }
 
 
@@ -115,7 +135,7 @@ static void
 vmxnet3_tq_wake(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
 {
 	tq->stopped = false;
-	netif_wake_queue(adapter->netdev);
+	netif_wake_subqueue(adapter->netdev, (tq - adapter->tx_queue));
 }
 
 
@@ -124,7 +144,7 @@ vmxnet3_tq_stop(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
 {
 	tq->stopped = true;
 	tq->num_stop++;
-	netif_stop_queue(adapter->netdev);
+	netif_stop_subqueue(adapter->netdev, (tq - adapter->tx_queue));
 }
 
 
@@ -135,6 +155,7 @@ static void
 vmxnet3_check_link(struct vmxnet3_adapter *adapter, bool affectTxQueue)
 {
 	u32 ret;
+	int i;
 
 	VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_LINK);
 	ret = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
@@ -145,22 +166,28 @@ vmxnet3_check_link(struct vmxnet3_adapter *adapter, bool affectTxQueue)
 		if (!netif_carrier_ok(adapter->netdev))
 			netif_carrier_on(adapter->netdev);
 
-		if (affectTxQueue)
-			vmxnet3_tq_start(&adapter->tx_queue, adapter);
+		if (affectTxQueue) {
+			for (i = 0; i < adapter->num_tx_queues; i++)
+				vmxnet3_tq_start(&adapter->tx_queue[i],
+						 adapter);
+		}
 	} else {
 		printk(KERN_INFO "%s: NIC Link is Down\n",
 		       adapter->netdev->name);
 		if (netif_carrier_ok(adapter->netdev))
 			netif_carrier_off(adapter->netdev);
 
-		if (affectTxQueue)
-			vmxnet3_tq_stop(&adapter->tx_queue, adapter);
+		if (affectTxQueue) {
+			for (i = 0; i < adapter->num_tx_queues; i++)
+				vmxnet3_tq_stop(&adapter->tx_queue[i], adapter);
+		}
 	}
 }
 
 static void
 vmxnet3_process_events(struct vmxnet3_adapter *adapter)
 {
+	int i;
 	u32 events = le32_to_cpu(adapter->shared->ecr);
 	if (!events)
 		return;
@@ -176,16 +203,18 @@ vmxnet3_process_events(struct vmxnet3_adapter *adapter)
 		VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
 				       VMXNET3_CMD_GET_QUEUE_STATUS);
 
-		if (adapter->tqd_start->status.stopped) {
-			printk(KERN_ERR "%s: tq error 0x%x\n",
-			       adapter->netdev->name,
-			       le32_to_cpu(adapter->tqd_start->status.error));
-		}
-		if (adapter->rqd_start->status.stopped) {
-			printk(KERN_ERR "%s: rq error 0x%x\n",
-			       adapter->netdev->name,
-			       adapter->rqd_start->status.error);
-		}
+		for (i = 0; i < adapter->num_tx_queues; i++)
+			if (adapter->tqd_start[i].status.stopped)
+				dev_dbg(&adapter->netdev->dev,
+					"%s: tq[%d] error 0x%x\n",
+					adapter->netdev->name, i, le32_to_cpu(
+					adapter->tqd_start[i].status.error));
+		for (i = 0; i < adapter->num_rx_queues; i++)
+			if (adapter->rqd_start[i].status.stopped)
+				dev_dbg(&adapter->netdev->dev,
+					"%s: rq[%d] error 0x%x\n",
+					adapter->netdev->name, i,
+					adapter->rqd_start[i].status.error);
 
 		schedule_work(&adapter->work);
 	}
@@ -410,7 +439,7 @@ vmxnet3_tq_cleanup(struct vmxnet3_tx_queue *tq,
 }
 
 
-void
+static void
 vmxnet3_tq_destroy(struct vmxnet3_tx_queue *tq,
 		   struct vmxnet3_adapter *adapter)
 {
@@ -437,6 +466,17 @@ vmxnet3_tq_destroy(struct vmxnet3_tx_queue *tq,
 }
 
 
+/* Destroy all tx queues */
+void
+vmxnet3_tq_destroy_all(struct vmxnet3_adapter *adapter)
+{
+	int i;
+
+	for (i = 0; i < adapter->num_tx_queues; i++)
+		vmxnet3_tq_destroy(&adapter->tx_queue[i], adapter);
+}
+
+
 static void
 vmxnet3_tq_init(struct vmxnet3_tx_queue *tq,
 		struct vmxnet3_adapter *adapter)
@@ -518,6 +558,14 @@ err:
 	return -ENOMEM;
 }
 
+static void
+vmxnet3_tq_cleanup_all(struct vmxnet3_adapter *adapter)
+{
+	int i;
+
+	for (i = 0; i < adapter->num_tx_queues; i++)
+		vmxnet3_tq_cleanup(&adapter->tx_queue[i], adapter);
+}
 
 /*
  *    starting from ring->next2fill, allocate rx buffers for the given ring
@@ -732,6 +780,17 @@ vmxnet3_map_pkt(struct sk_buff *skb, struct vmxnet3_tx_ctx *ctx,
 }
 
 
+/* Init all tx queues */
+static void
+vmxnet3_tq_init_all(struct vmxnet3_adapter *adapter)
+{
+	int i;
+
+	for (i = 0; i < adapter->num_tx_queues; i++)
+		vmxnet3_tq_init(&adapter->tx_queue[i], adapter);
+}
+
+
 /*
  *    parse and copy relevant protocol headers:
  *      For a tso pkt, relevant headers are L2/3/4 including options
@@ -1000,8 +1059,8 @@ vmxnet3_tq_xmit(struct sk_buff *skb, struct vmxnet3_tx_queue *tq,
 	if (le32_to_cpu(tq->shared->txNumDeferred) >=
 					le32_to_cpu(tq->shared->txThreshold)) {
 		tq->shared->txNumDeferred = 0;
-		VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_TXPROD,
-				       tq->tx_ring.next2fill);
+		VMXNET3_WRITE_BAR0_REG(adapter, (VMXNET3_REG_TXPROD +
+				       tq->qid * 8), tq->tx_ring.next2fill);
 	}
 
 	return NETDEV_TX_OK;
@@ -1020,7 +1079,10 @@ vmxnet3_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 {
 	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
 
-	return vmxnet3_tq_xmit(skb, &adapter->tx_queue, adapter, netdev);
+		BUG_ON(skb->queue_mapping > adapter->num_tx_queues);
+		return vmxnet3_tq_xmit(skb,
+				       &adapter->tx_queue[skb->queue_mapping],
+				       adapter, netdev);
 }
 
 
@@ -1106,9 +1168,9 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
 			break;
 		}
 		num_rxd++;
-
+		BUG_ON(rcd->rqID != rq->qid && rcd->rqID != rq->qid2);
 		idx = rcd->rxdIdx;
-		ring_idx = rcd->rqID == rq->qid ? 0 : 1;
+		ring_idx = rcd->rqID < adapter->num_rx_queues ? 0 : 1;
 		vmxnet3_getRxDesc(rxd, &rq->rx_ring[ring_idx].base[idx].rxd,
 				  &rxCmdDesc);
 		rbi = rq->buf_info[ring_idx] + idx;
@@ -1260,6 +1322,16 @@ vmxnet3_rq_cleanup(struct vmxnet3_rx_queue *rq,
 }
 
 
+static void
+vmxnet3_rq_cleanup_all(struct vmxnet3_adapter *adapter)
+{
+	int i;
+
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		vmxnet3_rq_cleanup(&adapter->rx_queue[i], adapter);
+}
+
+
 void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
 			struct vmxnet3_adapter *adapter)
 {
@@ -1351,6 +1423,25 @@ vmxnet3_rq_init(struct vmxnet3_rx_queue *rq,
 
 
 static int
+vmxnet3_rq_init_all(struct vmxnet3_adapter *adapter)
+{
+	int i, err = 0;
+
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		err = vmxnet3_rq_init(&adapter->rx_queue[i], adapter);
+		if (unlikely(err)) {
+			dev_err(&adapter->netdev->dev, "%s: failed to "
+				"initialize rx queue%i\n",
+				adapter->netdev->name, i);
+			break;
+		}
+	}
+	return err;
+
+}
+
+
+static int
 vmxnet3_rq_create(struct vmxnet3_rx_queue *rq, struct vmxnet3_adapter *adapter)
 {
 	int i;
@@ -1398,32 +1489,176 @@ err:
 
 
 static int
+vmxnet3_rq_create_all(struct vmxnet3_adapter *adapter)
+{
+	int i, err = 0;
+
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		err = vmxnet3_rq_create(&adapter->rx_queue[i], adapter);
+		if (unlikely(err)) {
+			dev_err(&adapter->netdev->dev,
+				"%s: failed to create rx queue%i\n",
+				adapter->netdev->name, i);
+			goto err_out;
+		}
+	}
+	return err;
+err_out:
+	vmxnet3_rq_destroy_all(adapter);
+	return err;
+
+}
+
+/* Multiple queue aware polling function for tx and rx */
+
+static int
 vmxnet3_do_poll(struct vmxnet3_adapter *adapter, int budget)
 {
+	int rcd_done = 0, i;
 	if (unlikely(adapter->shared->ecr))
 		vmxnet3_process_events(adapter);
+	for (i = 0; i < adapter->num_tx_queues; i++)
+		vmxnet3_tq_tx_complete(&adapter->tx_queue[i], adapter);
 
-	vmxnet3_tq_tx_complete(&adapter->tx_queue, adapter);
-	return vmxnet3_rq_rx_complete(&adapter->rx_queue, adapter, budget);
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		rcd_done += vmxnet3_rq_rx_complete(&adapter->rx_queue[i],
+						   adapter, budget);
+	return rcd_done;
 }
 
 
 static int
 vmxnet3_poll(struct napi_struct *napi, int budget)
 {
-	struct vmxnet3_adapter *adapter = container_of(napi,
-					  struct vmxnet3_adapter, napi);
+	struct vmxnet3_rx_queue *rx_queue = container_of(napi,
+					  struct vmxnet3_rx_queue, napi);
 	int rxd_done;
 
-	rxd_done = vmxnet3_do_poll(adapter, budget);
+	rxd_done = vmxnet3_do_poll(rx_queue->adapter, budget);
 
 	if (rxd_done < budget) {
 		napi_complete(napi);
-		vmxnet3_enable_intr(adapter, 0);
+		vmxnet3_enable_all_intrs(rx_queue->adapter);
 	}
 	return rxd_done;
 }
 
+/*
+ * NAPI polling function for MSI-X mode with multiple Rx queues
+ * Returns the # of the NAPI credit consumed (# of rx descriptors processed)
+ */
+
+static int
+vmxnet3_poll_rx_only(struct napi_struct *napi, int budget)
+{
+	struct vmxnet3_rx_queue *rq = container_of(napi,
+						struct vmxnet3_rx_queue, napi);
+	struct vmxnet3_adapter *adapter = rq->adapter;
+	int rxd_done;
+
+	/* When sharing interrupt with corresponding tx queue, process
+	 * tx completions in that queue as well
+	 */
+	if (adapter->share_intr == VMXNET3_INTR_BUDDYSHARE) {
+		struct vmxnet3_tx_queue *tq =
+				&adapter->tx_queue[rq - adapter->rx_queue];
+		vmxnet3_tq_tx_complete(tq, adapter);
+	}
+
+	rxd_done = vmxnet3_rq_rx_complete(rq, adapter, budget);
+
+	if (rxd_done < budget) {
+		napi_complete(napi);
+		vmxnet3_enable_intr(adapter, rq->comp_ring.intr_idx);
+	}
+	return rxd_done;
+}
+
+
+#ifdef CONFIG_PCI_MSI
+
+/*
+ * Handle completion interrupts on tx queues
+ * Returns whether or not the intr is handled
+ */
+
+static irqreturn_t
+vmxnet3_msix_tx(int irq, void *data)
+{
+	struct vmxnet3_tx_queue *tq = data;
+	struct vmxnet3_adapter *adapter = tq->adapter;
+
+	if (adapter->intr.mask_mode == VMXNET3_IMM_ACTIVE)
+		vmxnet3_disable_intr(adapter, tq->comp_ring.intr_idx);
+
+	/* Handle the case where only one irq is allocate for all tx queues */
+	if (adapter->share_intr == VMXNET3_INTR_TXSHARE) {
+		int i;
+		for (i = 0; i < adapter->num_tx_queues; i++) {
+			struct vmxnet3_tx_queue *txq = &adapter->tx_queue[i];
+			vmxnet3_tq_tx_complete(txq, adapter);
+		}
+	} else {
+		vmxnet3_tq_tx_complete(tq, adapter);
+	}
+	vmxnet3_enable_intr(adapter, tq->comp_ring.intr_idx);
+
+	return IRQ_HANDLED;
+}
+
+
+/*
+ * Handle completion interrupts on rx queues. Returns whether or not the
+ * intr is handled
+ */
+
+static irqreturn_t
+vmxnet3_msix_rx(int irq, void *data)
+{
+	struct vmxnet3_rx_queue *rq = data;
+	struct vmxnet3_adapter *adapter = rq->adapter;
+
+	/* disable intr if needed */
+	if (adapter->intr.mask_mode == VMXNET3_IMM_ACTIVE)
+		vmxnet3_disable_intr(adapter, rq->comp_ring.intr_idx);
+	napi_schedule(&rq->napi);
+
+	return IRQ_HANDLED;
+}
+
+/*
+ *----------------------------------------------------------------------------
+ *
+ * vmxnet3_msix_event --
+ *
+ *    vmxnet3 msix event intr handler
+ *
+ * Result:
+ *    whether or not the intr is handled
+ *
+ *----------------------------------------------------------------------------
+ */
+
+static irqreturn_t
+vmxnet3_msix_event(int irq, void *data)
+{
+	struct net_device *dev = data;
+	struct vmxnet3_adapter *adapter = netdev_priv(dev);
+
+	/* disable intr if needed */
+	if (adapter->intr.mask_mode == VMXNET3_IMM_ACTIVE)
+		vmxnet3_disable_intr(adapter, adapter->intr.event_intr_idx);
+
+	if (adapter->shared->ecr)
+		vmxnet3_process_events(adapter);
+
+	vmxnet3_enable_intr(adapter, adapter->intr.event_intr_idx);
+
+	return IRQ_HANDLED;
+}
+
+#endif /* CONFIG_PCI_MSI  */
+
 
 /* Interrupt handler for vmxnet3  */
 static irqreturn_t
@@ -1432,7 +1667,7 @@ vmxnet3_intr(int irq, void *dev_id)
 	struct net_device *dev = dev_id;
 	struct vmxnet3_adapter *adapter = netdev_priv(dev);
 
-	if (unlikely(adapter->intr.type == VMXNET3_IT_INTX)) {
+	if (adapter->intr.type == VMXNET3_IT_INTX) {
 		u32 icr = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_ICR);
 		if (unlikely(icr == 0))
 			/* not ours */
@@ -1442,77 +1677,136 @@ vmxnet3_intr(int irq, void *dev_id)
 
 	/* disable intr if needed */
 	if (adapter->intr.mask_mode == VMXNET3_IMM_ACTIVE)
-		vmxnet3_disable_intr(adapter, 0);
+		vmxnet3_disable_all_intrs(adapter);
 
-	napi_schedule(&adapter->napi);
+	napi_schedule(&adapter->rx_queue[0].napi);
 
 	return IRQ_HANDLED;
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
 
-
 /* netpoll callback. */
 static void
 vmxnet3_netpoll(struct net_device *netdev)
 {
 	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
-	int irq;
 
-#ifdef CONFIG_PCI_MSI
-	if (adapter->intr.type == VMXNET3_IT_MSIX)
-		irq = adapter->intr.msix_entries[0].vector;
-	else
-#endif
-		irq = adapter->pdev->irq;
+	if (adapter->intr.mask_mode == VMXNET3_IMM_ACTIVE)
+		vmxnet3_disable_all_intrs(adapter);
+
+	vmxnet3_do_poll(adapter, adapter->rx_queue[0].rx_ring[0].size);
+	vmxnet3_enable_all_intrs(adapter);
 
-	disable_irq(irq);
-	vmxnet3_intr(irq, netdev);
-	enable_irq(irq);
 }
-#endif
+#endif	/* CONFIG_NET_POLL_CONTROLLER */
 
 static int
 vmxnet3_request_irqs(struct vmxnet3_adapter *adapter)
 {
-	int err;
+	struct vmxnet3_intr *intr = &adapter->intr;
+	int err = 0, i;
+	int vector = 0;
 
 #ifdef CONFIG_PCI_MSI
 	if (adapter->intr.type == VMXNET3_IT_MSIX) {
-		/* we only use 1 MSI-X vector */
-		err = request_irq(adapter->intr.msix_entries[0].vector,
-				  vmxnet3_intr, 0, adapter->netdev->name,
-				  adapter->netdev);
-	} else if (adapter->intr.type == VMXNET3_IT_MSI) {
+		for (i = 0; i < adapter->num_tx_queues; i++) {
+			sprintf(adapter->tx_queue[i].name, "%s:v%d-%s",
+				adapter->netdev->name, vector, "Tx");
+			if (adapter->share_intr != VMXNET3_INTR_BUDDYSHARE)
+				err = request_irq(
+					      intr->msix_entries[vector].vector,
+					      vmxnet3_msix_tx, 0,
+					      adapter->tx_queue[i].name,
+					      &adapter->tx_queue[i]);
+			if (err) {
+				dev_err(&adapter->netdev->dev,
+					"Failed to request irq for MSIX, %s, "
+					"error %d\n",
+					adapter->tx_queue[i].name, err);
+				return err;
+			}
+
+			/* Handle the case where only 1 MSIx was allocated for
+			 * all tx queues */
+			if (adapter->share_intr == VMXNET3_INTR_TXSHARE) {
+				for (; i < adapter->num_tx_queues; i++)
+					adapter->tx_queue[i].comp_ring.intr_idx
+								= vector;
+				vector++;
+				break;
+			} else {
+				adapter->tx_queue[i].comp_ring.intr_idx
+								= vector++;
+			}
+		}
+		if (adapter->share_intr == VMXNET3_INTR_BUDDYSHARE)
+			vector = 0;
+
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			sprintf(adapter->rx_queue[i].name, "%s:v%d-%s",
+				adapter->netdev->name, vector, "Rx");
+			err = request_irq(intr->msix_entries[vector].vector,
+					  vmxnet3_msix_rx, 0,
+					  adapter->rx_queue[i].name,
+					  &(adapter->rx_queue[i]));
+			if (err) {
+				printk(KERN_ERR "Failed to request irq for MSIX"
+				       ", %s, error %d\n",
+				       adapter->rx_queue[i].name, err);
+				return err;
+			}
+
+			adapter->rx_queue[i].comp_ring.intr_idx = vector++;
+		}
+
+		sprintf(intr->event_msi_vector_name, "%s:v%d-event",
+			adapter->netdev->name, vector);
+		err = request_irq(intr->msix_entries[vector].vector,
+				  vmxnet3_msix_event, 0,
+				  intr->event_msi_vector_name, adapter->netdev);
+		intr->event_intr_idx = vector;
+
+	} else if (intr->type == VMXNET3_IT_MSI) {
+		adapter->num_rx_queues = 1;
 		err = request_irq(adapter->pdev->irq, vmxnet3_intr, 0,
 				  adapter->netdev->name, adapter->netdev);
-	} else
+	} else {
 #endif
-	{
+		adapter->num_rx_queues = 1;
 		err = request_irq(adapter->pdev->irq, vmxnet3_intr,
 				  IRQF_SHARED, adapter->netdev->name,
 				  adapter->netdev);
+#ifdef CONFIG_PCI_MSI
 	}
-
-	if (err)
+#endif
+	intr->num_intrs = vector + 1;
+	if (err) {
 		printk(KERN_ERR "Failed to request irq %s (intr type:%d), error"
-		       ":%d\n", adapter->netdev->name, adapter->intr.type, err);
+		       ":%d\n", adapter->netdev->name, intr->type, err);
+	} else {
+		/* Number of rx queues will not change after this */
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			struct vmxnet3_rx_queue *rq = &adapter->rx_queue[i];
+			rq->qid = i;
+			rq->qid2 = i + adapter->num_rx_queues;
+		}
 
 
-	if (!err) {
-		int i;
-		/* init our intr settings */
-		for (i = 0; i < adapter->intr.num_intrs; i++)
-			adapter->intr.mod_levels[i] = UPT1_IML_ADAPTIVE;
 
-		/* next setup intr index for all intr sources */
-		adapter->tx_queue.comp_ring.intr_idx = 0;
-		adapter->rx_queue.comp_ring.intr_idx = 0;
-		adapter->intr.event_intr_idx = 0;
+		/* init our intr settings */
+		for (i = 0; i < intr->num_intrs; i++)
+			intr->mod_levels[i] = UPT1_IML_ADAPTIVE;
+		if (adapter->intr.type != VMXNET3_IT_MSIX) {
+			adapter->intr.event_intr_idx = 0;
+			for (i = 0; i < adapter->num_tx_queues; i++)
+				adapter->tx_queue[i].comp_ring.intr_idx = 0;
+			adapter->rx_queue[0].comp_ring.intr_idx = 0;
+		}
 
 		printk(KERN_INFO "%s: intr type %u, mode %u, %u vectors "
-		       "allocated\n", adapter->netdev->name, adapter->intr.type,
-		       adapter->intr.mask_mode, adapter->intr.num_intrs);
+		       "allocated\n", adapter->netdev->name, intr->type,
+		       intr->mask_mode, intr->num_intrs);
 	}
 
 	return err;
@@ -1522,18 +1816,32 @@ vmxnet3_request_irqs(struct vmxnet3_adapter *adapter)
 static void
 vmxnet3_free_irqs(struct vmxnet3_adapter *adapter)
 {
-	BUG_ON(adapter->intr.type == VMXNET3_IT_AUTO ||
-	       adapter->intr.num_intrs <= 0);
+	struct vmxnet3_intr *intr = &adapter->intr;
+	BUG_ON(intr->type == VMXNET3_IT_AUTO || intr->num_intrs <= 0);
 
-	switch (adapter->intr.type) {
+	switch (intr->type) {
 #ifdef CONFIG_PCI_MSI
 	case VMXNET3_IT_MSIX:
 	{
-		int i;
+		int i, vector = 0;
+
+		if (adapter->share_intr != VMXNET3_INTR_BUDDYSHARE) {
+			for (i = 0; i < adapter->num_tx_queues; i++) {
+				free_irq(intr->msix_entries[vector++].vector,
+					 &(adapter->tx_queue[i]));
+				if (adapter->share_intr == VMXNET3_INTR_TXSHARE)
+					break;
+			}
+		}
+
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			free_irq(intr->msix_entries[vector++].vector,
+				 &(adapter->rx_queue[i]));
+		}
 
-		for (i = 0; i < adapter->intr.num_intrs; i++)
-			free_irq(adapter->intr.msix_entries[i].vector,
-				 adapter->netdev);
+		free_irq(intr->msix_entries[vector].vector,
+			 adapter->netdev);
+		BUG_ON(vector >= intr->num_intrs);
 		break;
 	}
 #endif
@@ -1729,6 +2037,15 @@ vmxnet3_set_mc(struct net_device *netdev)
 	kfree(new_table);
 }
 
+void
+vmxnet3_rq_destroy_all(struct vmxnet3_adapter *adapter)
+{
+	int i;
+
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		vmxnet3_rq_destroy(&adapter->rx_queue[i], adapter);
+}
+
 
 /*
  *   Set up driver_shared based on settings in adapter.
@@ -1776,40 +2093,72 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter)
 	devRead->misc.mtu = cpu_to_le32(adapter->netdev->mtu);
 	devRead->misc.queueDescPA = cpu_to_le64(adapter->queue_desc_pa);
 	devRead->misc.queueDescLen = cpu_to_le32(
-				     sizeof(struct Vmxnet3_TxQueueDesc) +
-				     sizeof(struct Vmxnet3_RxQueueDesc));
+		adapter->num_tx_queues * sizeof(struct Vmxnet3_TxQueueDesc) +
+		adapter->num_rx_queues * sizeof(struct Vmxnet3_RxQueueDesc));
 
 	/* tx queue settings */
-	BUG_ON(adapter->tx_queue.tx_ring.base == NULL);
-
-	devRead->misc.numTxQueues = 1;
-	tqc = &adapter->tqd_start->conf;
-	tqc->txRingBasePA   = cpu_to_le64(adapter->tx_queue.tx_ring.basePA);
-	tqc->dataRingBasePA = cpu_to_le64(adapter->tx_queue.data_ring.basePA);
-	tqc->compRingBasePA = cpu_to_le64(adapter->tx_queue.comp_ring.basePA);
-	tqc->ddPA           = cpu_to_le64(virt_to_phys(
-						adapter->tx_queue.buf_info));
-	tqc->txRingSize     = cpu_to_le32(adapter->tx_queue.tx_ring.size);
-	tqc->dataRingSize   = cpu_to_le32(adapter->tx_queue.data_ring.size);
-	tqc->compRingSize   = cpu_to_le32(adapter->tx_queue.comp_ring.size);
-	tqc->ddLen          = cpu_to_le32(sizeof(struct vmxnet3_tx_buf_info) *
-			      tqc->txRingSize);
-	tqc->intrIdx        = adapter->tx_queue.comp_ring.intr_idx;
+	devRead->misc.numTxQueues =  adapter->num_tx_queues;
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		struct vmxnet3_tx_queue	*tq = &adapter->tx_queue[i];
+		BUG_ON(adapter->tx_queue[i].tx_ring.base == NULL);
+		tqc = &adapter->tqd_start[i].conf;
+		tqc->txRingBasePA   = cpu_to_le64(tq->tx_ring.basePA);
+		tqc->dataRingBasePA = cpu_to_le64(tq->data_ring.basePA);
+		tqc->compRingBasePA = cpu_to_le64(tq->comp_ring.basePA);
+		tqc->ddPA           = cpu_to_le64(virt_to_phys(tq->buf_info));
+		tqc->txRingSize     = cpu_to_le32(tq->tx_ring.size);
+		tqc->dataRingSize   = cpu_to_le32(tq->data_ring.size);
+		tqc->compRingSize   = cpu_to_le32(tq->comp_ring.size);
+		tqc->ddLen          = cpu_to_le32(
+					sizeof(struct vmxnet3_tx_buf_info) *
+					tqc->txRingSize);
+		tqc->intrIdx        = tq->comp_ring.intr_idx;
+	}
 
 	/* rx queue settings */
-	devRead->misc.numRxQueues = 1;
-	rqc = &adapter->rqd_start->conf;
-	rqc->rxRingBasePA[0] = cpu_to_le64(adapter->rx_queue.rx_ring[0].basePA);
-	rqc->rxRingBasePA[1] = cpu_to_le64(adapter->rx_queue.rx_ring[1].basePA);
-	rqc->compRingBasePA  = cpu_to_le64(adapter->rx_queue.comp_ring.basePA);
-	rqc->ddPA            = cpu_to_le64(virt_to_phys(
-						adapter->rx_queue.buf_info));
-	rqc->rxRingSize[0]   = cpu_to_le32(adapter->rx_queue.rx_ring[0].size);
-	rqc->rxRingSize[1]   = cpu_to_le32(adapter->rx_queue.rx_ring[1].size);
-	rqc->compRingSize    = cpu_to_le32(adapter->rx_queue.comp_ring.size);
-	rqc->ddLen           = cpu_to_le32(sizeof(struct vmxnet3_rx_buf_info) *
-			       (rqc->rxRingSize[0] + rqc->rxRingSize[1]));
-	rqc->intrIdx         = adapter->rx_queue.comp_ring.intr_idx;
+	devRead->misc.numRxQueues = adapter->num_rx_queues;
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		struct vmxnet3_rx_queue	*rq = &adapter->rx_queue[i];
+		rqc = &adapter->rqd_start[i].conf;
+		rqc->rxRingBasePA[0] = cpu_to_le64(rq->rx_ring[0].basePA);
+		rqc->rxRingBasePA[1] = cpu_to_le64(rq->rx_ring[1].basePA);
+		rqc->compRingBasePA  = cpu_to_le64(rq->comp_ring.basePA);
+		rqc->ddPA            = cpu_to_le64(virt_to_phys(
+							rq->buf_info));
+		rqc->rxRingSize[0]   = cpu_to_le32(rq->rx_ring[0].size);
+		rqc->rxRingSize[1]   = cpu_to_le32(rq->rx_ring[1].size);
+		rqc->compRingSize    = cpu_to_le32(rq->comp_ring.size);
+		rqc->ddLen           = cpu_to_le32(
+					sizeof(struct vmxnet3_rx_buf_info) *
+					(rqc->rxRingSize[0] +
+					 rqc->rxRingSize[1]));
+		rqc->intrIdx         = rq->comp_ring.intr_idx;
+	}
+
+#ifdef VMXNET3_RSS
+	memset(adapter->rss_conf, 0, sizeof(*adapter->rss_conf));
+
+	if (adapter->rss) {
+		struct UPT1_RSSConf *rssConf = adapter->rss_conf;
+		devRead->misc.uptFeatures |= UPT1_F_RSS;
+		devRead->misc.numRxQueues = adapter->num_rx_queues;
+		rssConf->hashType = UPT1_RSS_HASH_TYPE_TCP_IPV4 |
+				    UPT1_RSS_HASH_TYPE_IPV4 |
+				    UPT1_RSS_HASH_TYPE_TCP_IPV6 |
+				    UPT1_RSS_HASH_TYPE_IPV6;
+		rssConf->hashFunc = UPT1_RSS_HASH_FUNC_TOEPLITZ;
+		rssConf->hashKeySize = UPT1_RSS_MAX_KEY_SIZE;
+		rssConf->indTableSize = VMXNET3_RSS_IND_TABLE_SIZE;
+		get_random_bytes(&rssConf->hashKey[0], rssConf->hashKeySize);
+		for (i = 0; i < rssConf->indTableSize; i++)
+			rssConf->indTable[i] = i % adapter->num_rx_queues;
+
+		devRead->rssConfDesc.confVer = 1;
+		devRead->rssConfDesc.confLen = sizeof(*rssConf);
+		devRead->rssConfDesc.confPA  = virt_to_phys(rssConf);
+	}
+
+#endif /* VMXNET3_RSS */
 
 	/* intr settings */
 	devRead->intrConf.autoMask = adapter->intr.mask_mode ==
@@ -1831,18 +2180,18 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter)
 int
 vmxnet3_activate_dev(struct vmxnet3_adapter *adapter)
 {
-	int err;
+	int err, i;
 	u32 ret;
 
-	dev_dbg(&adapter->netdev->dev,
-		"%s: skb_buf_size %d, rx_buf_per_pkt %d, ring sizes"
-		" %u %u %u\n", adapter->netdev->name, adapter->skb_buf_size,
-		adapter->rx_buf_per_pkt, adapter->tx_queue.tx_ring.size,
-		adapter->rx_queue.rx_ring[0].size,
-		adapter->rx_queue.rx_ring[1].size);
-
-	vmxnet3_tq_init(&adapter->tx_queue, adapter);
-	err = vmxnet3_rq_init(&adapter->rx_queue, adapter);
+	dev_dbg(&adapter->netdev->dev, "%s: skb_buf_size %d, rx_buf_per_pkt %d,"
+		" ring sizes %u %u %u\n", adapter->netdev->name,
+		adapter->skb_buf_size, adapter->rx_buf_per_pkt,
+		adapter->tx_queue[0].tx_ring.size,
+		adapter->rx_queue[0].rx_ring[0].size,
+		adapter->rx_queue[0].rx_ring[1].size);
+
+	vmxnet3_tq_init_all(adapter);
+	err = vmxnet3_rq_init_all(adapter);
 	if (err) {
 		printk(KERN_ERR "Failed to init rx queue for %s: error %d\n",
 		       adapter->netdev->name, err);
@@ -1872,10 +2221,15 @@ vmxnet3_activate_dev(struct vmxnet3_adapter *adapter)
 		err = -EINVAL;
 		goto activate_err;
 	}
-	VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_RXPROD,
-			       adapter->rx_queue.rx_ring[0].next2fill);
-	VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_RXPROD2,
-			       adapter->rx_queue.rx_ring[1].next2fill);
+
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		VMXNET3_WRITE_BAR0_REG(adapter, (VMXNET3_REG_RXPROD +
+				(i * VMXNET3_REG_ALIGN)),
+				adapter->rx_queue[i].rx_ring[0].next2fill);
+		VMXNET3_WRITE_BAR0_REG(adapter, (VMXNET3_REG_RXPROD2 +
+				(i * VMXNET3_REG_ALIGN)),
+				adapter->rx_queue[i].rx_ring[1].next2fill);
+	}
 
 	/* Apply the rx filter settins last. */
 	vmxnet3_set_mc(adapter->netdev);
@@ -1885,8 +2239,8 @@ vmxnet3_activate_dev(struct vmxnet3_adapter *adapter)
 	 * tx queue if the link is up.
 	 */
 	vmxnet3_check_link(adapter, true);
-
-	napi_enable(&adapter->napi);
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		napi_enable(&adapter->rx_queue[i].napi);
 	vmxnet3_enable_all_intrs(adapter);
 	clear_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state);
 	return 0;
@@ -1898,7 +2252,7 @@ activate_err:
 irq_err:
 rq_err:
 	/* free up buffers we allocated */
-	vmxnet3_rq_cleanup(&adapter->rx_queue, adapter);
+	vmxnet3_rq_cleanup_all(adapter);
 	return err;
 }
 
@@ -1913,6 +2267,7 @@ vmxnet3_reset_dev(struct vmxnet3_adapter *adapter)
 int
 vmxnet3_quiesce_dev(struct vmxnet3_adapter *adapter)
 {
+	int i;
 	if (test_and_set_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state))
 		return 0;
 
@@ -1921,13 +2276,14 @@ vmxnet3_quiesce_dev(struct vmxnet3_adapter *adapter)
 			       VMXNET3_CMD_QUIESCE_DEV);
 	vmxnet3_disable_all_intrs(adapter);
 
-	napi_disable(&adapter->napi);
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		napi_disable(&adapter->rx_queue[i].napi);
 	netif_tx_disable(adapter->netdev);
 	adapter->link_speed = 0;
 	netif_carrier_off(adapter->netdev);
 
-	vmxnet3_tq_cleanup(&adapter->tx_queue, adapter);
-	vmxnet3_rq_cleanup(&adapter->rx_queue, adapter);
+	vmxnet3_tq_cleanup_all(adapter);
+	vmxnet3_rq_cleanup_all(adapter);
 	vmxnet3_free_irqs(adapter);
 	return 0;
 }
@@ -2049,7 +2405,9 @@ vmxnet3_free_pci_resources(struct vmxnet3_adapter *adapter)
 static void
 vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter)
 {
-	size_t sz;
+	size_t sz, i, ring0_size, ring1_size, comp_size;
+	struct vmxnet3_rx_queue	*rq = &adapter->rx_queue[0];
+
 
 	if (adapter->netdev->mtu <= VMXNET3_MAX_SKB_BUF_SIZE -
 				    VMXNET3_MAX_ETH_HDR_SIZE) {
@@ -2071,11 +2429,19 @@ vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter)
 	 * rx_buf_per_pkt * VMXNET3_RING_SIZE_ALIGN
 	 */
 	sz = adapter->rx_buf_per_pkt * VMXNET3_RING_SIZE_ALIGN;
-	adapter->rx_queue.rx_ring[0].size = (adapter->rx_queue.rx_ring[0].size +
-					     sz - 1) / sz * sz;
-	adapter->rx_queue.rx_ring[0].size = min_t(u32,
-					    adapter->rx_queue.rx_ring[0].size,
-					    VMXNET3_RX_RING_MAX_SIZE / sz * sz);
+	ring0_size = adapter->rx_queue[0].rx_ring[0].size;
+	ring0_size = (ring0_size + sz - 1) / sz * sz;
+	ring0_size = min_t(u32, rq->rx_ring[0].size, VMXNET3_RX_RING_MAX_SIZE /
+			   sz * sz);
+	ring1_size = adapter->rx_queue[0].rx_ring[1].size;
+	comp_size = ring0_size + ring1_size;
+
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		rq = &adapter->rx_queue[i];
+		rq->rx_ring[0].size = ring0_size;
+		rq->rx_ring[1].size = ring1_size;
+		rq->comp_ring.size = comp_size;
+	}
 }
 
 
@@ -2083,29 +2449,53 @@ int
 vmxnet3_create_queues(struct vmxnet3_adapter *adapter, u32 tx_ring_size,
 		      u32 rx_ring_size, u32 rx_ring2_size)
 {
-	int err;
-
-	adapter->tx_queue.tx_ring.size   = tx_ring_size;
-	adapter->tx_queue.data_ring.size = tx_ring_size;
-	adapter->tx_queue.comp_ring.size = tx_ring_size;
-	adapter->tx_queue.shared = &adapter->tqd_start->ctrl;
-	adapter->tx_queue.stopped = true;
-	err = vmxnet3_tq_create(&adapter->tx_queue, adapter);
-	if (err)
-		return err;
+	int err = 0, i;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		struct vmxnet3_tx_queue	*tq = &adapter->tx_queue[i];
+		tq->tx_ring.size   = tx_ring_size;
+		tq->data_ring.size = tx_ring_size;
+		tq->comp_ring.size = tx_ring_size;
+		tq->shared = &adapter->tqd_start[i].ctrl;
+		tq->stopped = true;
+		tq->adapter = adapter;
+		tq->qid = i;
+		err = vmxnet3_tq_create(tq, adapter);
+		/*
+		 * Too late to change num_tx_queues. We cannot do away with
+		 * lesser number of queues than what we asked for
+		 */
+		if (err)
+			goto queue_err;
+	}
 
-	adapter->rx_queue.rx_ring[0].size = rx_ring_size;
-	adapter->rx_queue.rx_ring[1].size = rx_ring2_size;
+	adapter->rx_queue[0].rx_ring[0].size = rx_ring_size;
+	adapter->rx_queue[0].rx_ring[1].size = rx_ring2_size;
 	vmxnet3_adjust_rx_ring_size(adapter);
-	adapter->rx_queue.comp_ring.size  = adapter->rx_queue.rx_ring[0].size +
-					    adapter->rx_queue.rx_ring[1].size;
-	adapter->rx_queue.qid  = 0;
-	adapter->rx_queue.qid2 = 1;
-	adapter->rx_queue.shared = &adapter->rqd_start->ctrl;
-	err = vmxnet3_rq_create(&adapter->rx_queue, adapter);
-	if (err)
-		vmxnet3_tq_destroy(&adapter->tx_queue, adapter);
-
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		struct vmxnet3_rx_queue *rq = &adapter->rx_queue[i];
+		/* qid and qid2 for rx queues will be assigned later when num
+		 * of rx queues is finalized after allocating intrs */
+		rq->shared = &adapter->rqd_start[i].ctrl;
+		rq->adapter = adapter;
+		err = vmxnet3_rq_create(rq, adapter);
+		if (err) {
+			if (i == 0) {
+				printk(KERN_ERR "Could not allocate any rx"
+				       "queues. Aborting.\n");
+				goto queue_err;
+			} else {
+				printk(KERN_INFO "Number of rx queues changed "
+				       "to : %d.\n", i);
+				adapter->num_rx_queues = i;
+				err = 0;
+				break;
+			}
+		}
+	}
+	return err;
+queue_err:
+	vmxnet3_tq_destroy_all(adapter);
 	return err;
 }
 
@@ -2113,11 +2503,12 @@ static int
 vmxnet3_open(struct net_device *netdev)
 {
 	struct vmxnet3_adapter *adapter;
-	int err;
+	int err, i;
 
 	adapter = netdev_priv(netdev);
 
-	spin_lock_init(&adapter->tx_queue.tx_lock);
+	for (i = 0; i < adapter->num_tx_queues; i++)
+		spin_lock_init(&adapter->tx_queue[i].tx_lock);
 
 	err = vmxnet3_create_queues(adapter, VMXNET3_DEF_TX_RING_SIZE,
 				    VMXNET3_DEF_RX_RING_SIZE,
@@ -2132,8 +2523,8 @@ vmxnet3_open(struct net_device *netdev)
 	return 0;
 
 activate_err:
-	vmxnet3_rq_destroy(&adapter->rx_queue, adapter);
-	vmxnet3_tq_destroy(&adapter->tx_queue, adapter);
+	vmxnet3_rq_destroy_all(adapter);
+	vmxnet3_tq_destroy_all(adapter);
 queue_err:
 	return err;
 }
@@ -2153,8 +2544,8 @@ vmxnet3_close(struct net_device *netdev)
 
 	vmxnet3_quiesce_dev(adapter);
 
-	vmxnet3_rq_destroy(&adapter->rx_queue, adapter);
-	vmxnet3_tq_destroy(&adapter->tx_queue, adapter);
+	vmxnet3_rq_destroy_all(adapter);
+	vmxnet3_tq_destroy_all(adapter);
 
 	clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
 
@@ -2166,6 +2557,8 @@ vmxnet3_close(struct net_device *netdev)
 void
 vmxnet3_force_close(struct vmxnet3_adapter *adapter)
 {
+	int i;
+
 	/*
 	 * we must clear VMXNET3_STATE_BIT_RESETTING, otherwise
 	 * vmxnet3_close() will deadlock.
@@ -2173,7 +2566,8 @@ vmxnet3_force_close(struct vmxnet3_adapter *adapter)
 	BUG_ON(test_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state));
 
 	/* we need to enable NAPI, otherwise dev_close will deadlock */
-	napi_enable(&adapter->napi);
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		napi_enable(&adapter->rx_queue[i].napi);
 	dev_close(adapter->netdev);
 }
 
@@ -2204,14 +2598,11 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
 		vmxnet3_reset_dev(adapter);
 
 		/* we need to re-create the rx queue based on the new mtu */
-		vmxnet3_rq_destroy(&adapter->rx_queue, adapter);
+		vmxnet3_rq_destroy_all(adapter);
 		vmxnet3_adjust_rx_ring_size(adapter);
-		adapter->rx_queue.comp_ring.size  =
-					adapter->rx_queue.rx_ring[0].size +
-					adapter->rx_queue.rx_ring[1].size;
-		err = vmxnet3_rq_create(&adapter->rx_queue, adapter);
+		err = vmxnet3_rq_create_all(adapter);
 		if (err) {
-			printk(KERN_ERR "%s: failed to re-create rx queue,"
+			printk(KERN_ERR "%s: failed to re-create rx queues,"
 				" error %d. Closing it.\n", netdev->name, err);
 			goto out;
 		}
@@ -2276,6 +2667,55 @@ vmxnet3_read_mac_addr(struct vmxnet3_adapter *adapter, u8 *mac)
 	mac[5] = (tmp >> 8) & 0xff;
 }
 
+#ifdef CONFIG_PCI_MSI
+
+/*
+ * Enable MSIx vectors.
+ * Returns :
+ *	0 on successful enabling of required vectors,
+ *	VMXNET3_LINUX_MIN_MSIX_VECT when only minumum number of vectors required
+ *	 could be enabled.
+ *	number of vectors which can be enabled otherwise (this number is smaller
+ *	 than VMXNET3_LINUX_MIN_MSIX_VECT)
+ */
+
+static int
+vmxnet3_acquire_msix_vectors(struct vmxnet3_adapter *adapter,
+			     int vectors)
+{
+	int err = 0, vector_threshold;
+	vector_threshold = VMXNET3_LINUX_MIN_MSIX_VECT;
+
+	while (vectors >= vector_threshold) {
+		err = pci_enable_msix(adapter->pdev, adapter->intr.msix_entries,
+				      vectors);
+		if (!err) {
+			adapter->intr.num_intrs = vectors;
+			return 0;
+		} else if (err < 0) {
+			printk(KERN_ERR "Failed to enable MSI-X for %s, error"
+			       " %d\n",	adapter->netdev->name, err);
+			vectors = 0;
+		} else if (err < vector_threshold) {
+			break;
+		} else {
+			/* If fails to enable required number of MSI-x vectors
+			 * try enabling 3 of them. One each for rx, tx and event
+			 */
+			vectors = vector_threshold;
+			printk(KERN_ERR "Failed to enable %d MSI-X for %s, try"
+			       " %d instead\n", vectors, adapter->netdev->name,
+			       vector_threshold);
+		}
+	}
+
+	printk(KERN_INFO "Number of MSI-X interrupts which can be allocatedi"
+	       " are lower than min threshold required.\n");
+	return err;
+}
+
+
+#endif /* CONFIG_PCI_MSI */
 
 static void
 vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter)
@@ -2295,16 +2735,47 @@ vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter)
 
 #ifdef CONFIG_PCI_MSI
 	if (adapter->intr.type == VMXNET3_IT_MSIX) {
-		int err;
-
-		adapter->intr.msix_entries[0].entry = 0;
-		err = pci_enable_msix(adapter->pdev, adapter->intr.msix_entries,
-				      VMXNET3_LINUX_MAX_MSIX_VECT);
-		if (!err) {
-			adapter->intr.num_intrs = 1;
-			adapter->intr.type = VMXNET3_IT_MSIX;
+		int vector, err = 0;
+
+		adapter->intr.num_intrs = (adapter->share_intr ==
+					   VMXNET3_INTR_TXSHARE) ? 1 :
+					   adapter->num_tx_queues;
+		adapter->intr.num_intrs += (adapter->share_intr ==
+					   VMXNET3_INTR_BUDDYSHARE) ? 0 :
+					   adapter->num_rx_queues;
+		adapter->intr.num_intrs += 1;		/* for link event */
+
+		adapter->intr.num_intrs = (adapter->intr.num_intrs >
+					   VMXNET3_LINUX_MIN_MSIX_VECT
+					   ? adapter->intr.num_intrs :
+					   VMXNET3_LINUX_MIN_MSIX_VECT);
+
+		for (vector = 0; vector < adapter->intr.num_intrs; vector++)
+			adapter->intr.msix_entries[vector].entry = vector;
+
+		err = vmxnet3_acquire_msix_vectors(adapter,
+						   adapter->intr.num_intrs);
+		/* If we cannot allocate one MSIx vector per queue
+		 * then limit the number of rx queues to 1
+		 */
+		if (err == VMXNET3_LINUX_MIN_MSIX_VECT) {
+			if (adapter->share_intr != VMXNET3_INTR_BUDDYSHARE
+			    || adapter->num_rx_queues != 2) {
+				adapter->share_intr = VMXNET3_INTR_TXSHARE;
+				printk(KERN_ERR "Number of rx queues : 1\n");
+				adapter->num_rx_queues = 1;
+				adapter->intr.num_intrs =
+						VMXNET3_LINUX_MIN_MSIX_VECT;
+			}
 			return;
 		}
+		if (!err)
+			return;
+
+		/* If we cannot allocate MSIx vectors use only one rx queue */
+		printk(KERN_INFO "Failed to enable MSI-X for %s, error %d."
+		       "#rx queues : 1, try MSI\n", adapter->netdev->name, err);
+
 		adapter->intr.type = VMXNET3_IT_MSI;
 	}
 
@@ -2312,12 +2783,15 @@ vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter)
 		int err;
 		err = pci_enable_msi(adapter->pdev);
 		if (!err) {
+			adapter->num_rx_queues = 1;
 			adapter->intr.num_intrs = 1;
 			return;
 		}
 	}
 #endif /* CONFIG_PCI_MSI */
 
+	adapter->num_rx_queues = 1;
+	printk(KERN_INFO "Using INTx interrupt, #Rx queues: 1.\n");
 	adapter->intr.type = VMXNET3_IT_INTX;
 
 	/* INT-X related setting */
@@ -2345,6 +2819,7 @@ vmxnet3_tx_timeout(struct net_device *netdev)
 
 	printk(KERN_ERR "%s: tx hang\n", adapter->netdev->name);
 	schedule_work(&adapter->work);
+	netif_wake_queue(adapter->netdev);
 }
 
 
@@ -2401,8 +2876,32 @@ vmxnet3_probe_device(struct pci_dev *pdev,
 	struct net_device *netdev;
 	struct vmxnet3_adapter *adapter;
 	u8 mac[ETH_ALEN];
+	int size;
+	int num_tx_queues = enable_mq[atomic_read(&devices_found)] == 0 ? 1 : 0;
+	int num_rx_queues = enable_mq[atomic_read(&devices_found)] == 0 ? 1 : 0;
+
+#ifdef VMXNET3_RSS
+	if (num_rx_queues == 0)
+		num_rx_queues = min(VMXNET3_DEVICE_MAX_RX_QUEUES,
+				    (int)num_online_cpus());
+	else
+		num_rx_queues = min(VMXNET3_DEVICE_MAX_RX_QUEUES,
+				    num_rx_queues);
+#else
+	num_rx_queues = 1;
+#endif
+
+	if (num_tx_queues <= 0)
+		num_tx_queues = min(VMXNET3_DEVICE_MAX_TX_QUEUES,
+				    (int)num_online_cpus());
+	else
+		num_tx_queues = min(VMXNET3_DEVICE_MAX_TX_QUEUES,
+				    num_tx_queues);
+	netdev = alloc_etherdev_mq(sizeof(struct vmxnet3_adapter),
+				   num_tx_queues);
+	printk(KERN_INFO "# of Tx queues : %d, # of Rx queues : %d\n",
+	       num_tx_queues, num_rx_queues);
 
-	netdev = alloc_etherdev(sizeof(struct vmxnet3_adapter));
 	if (!netdev) {
 		printk(KERN_ERR "Failed to alloc ethernet device for adapter "
 			"%s\n",	pci_name(pdev));
@@ -2424,9 +2923,12 @@ vmxnet3_probe_device(struct pci_dev *pdev,
 		goto err_alloc_shared;
 	}
 
-	adapter->tqd_start = pci_alloc_consistent(adapter->pdev,
-			     sizeof(struct Vmxnet3_TxQueueDesc) +
-			     sizeof(struct Vmxnet3_RxQueueDesc),
+	adapter->num_rx_queues = num_rx_queues;
+	adapter->num_tx_queues = num_tx_queues;
+
+	size = sizeof(struct Vmxnet3_TxQueueDesc) * adapter->num_tx_queues;
+	size += sizeof(struct Vmxnet3_RxQueueDesc) * adapter->num_rx_queues;
+	adapter->tqd_start = pci_alloc_consistent(adapter->pdev, size,
 			     &adapter->queue_desc_pa);
 
 	if (!adapter->tqd_start) {
@@ -2435,8 +2937,8 @@ vmxnet3_probe_device(struct pci_dev *pdev,
 		err = -ENOMEM;
 		goto err_alloc_queue_desc;
 	}
-	adapter->rqd_start = (struct Vmxnet3_RxQueueDesc *)(adapter->tqd_start
-							    + 1);
+	adapter->rqd_start = (struct Vmxnet3_RxQueueDesc *)(adapter->tqd_start +
+							adapter->num_tx_queues);
 
 	adapter->pm_conf = kmalloc(sizeof(struct Vmxnet3_PMConf), GFP_KERNEL);
 	if (adapter->pm_conf == NULL) {
@@ -2446,6 +2948,17 @@ vmxnet3_probe_device(struct pci_dev *pdev,
 		goto err_alloc_pm;
 	}
 
+#ifdef VMXNET3_RSS
+
+	adapter->rss_conf = kmalloc(sizeof(struct UPT1_RSSConf), GFP_KERNEL);
+	if (adapter->rss_conf == NULL) {
+		printk(KERN_ERR "Failed to allocate memory for %s\n",
+		       pci_name(pdev));
+		err = -ENOMEM;
+		goto err_alloc_rss;
+	}
+#endif /* VMXNET3_RSS */
+
 	err = vmxnet3_alloc_pci_resources(adapter, &dma64);
 	if (err < 0)
 		goto err_alloc_pci;
@@ -2473,8 +2986,28 @@ vmxnet3_probe_device(struct pci_dev *pdev,
 	vmxnet3_declare_features(adapter, dma64);
 
 	adapter->dev_number = atomic_read(&devices_found);
+
+	/*
+	 * Sharing intr between corresponding tx and rx queues gets priority
+	 * over all tx queues sharing an intr. Also, to use buddy interrupts
+	 * number of tx queues should be same as number of rx queues.
+	 */
+	if (irq_share_mode[adapter->dev_number] == VMXNET3_INTR_BUDDYSHARE &&
+	    adapter->num_tx_queues != adapter->num_rx_queues)
+		adapter->share_intr = VMXNET3_INTR_DONTSHARE;
+
 	vmxnet3_alloc_intr_resources(adapter);
 
+#ifdef VMXNET3_RSS
+	if (adapter->num_rx_queues > 1 &&
+	    adapter->intr.type == VMXNET3_IT_MSIX) {
+		adapter->rss = true;
+		printk(KERN_INFO "RSS is enabled.\n");
+	} else {
+		adapter->rss = false;
+	}
+#endif
+
 	vmxnet3_read_mac_addr(adapter, mac);
 	memcpy(netdev->dev_addr,  mac, netdev->addr_len);
 
@@ -2484,7 +3017,18 @@ vmxnet3_probe_device(struct pci_dev *pdev,
 
 	INIT_WORK(&adapter->work, vmxnet3_reset_work);
 
-	netif_napi_add(netdev, &adapter->napi, vmxnet3_poll, 64);
+	if (adapter->intr.type == VMXNET3_IT_MSIX) {
+		int i;
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			netif_napi_add(adapter->netdev,
+				       &adapter->rx_queue[i].napi,
+				       vmxnet3_poll_rx_only, 64);
+		}
+	} else {
+		netif_napi_add(adapter->netdev, &adapter->rx_queue[0].napi,
+			       vmxnet3_poll, 64);
+	}
+
 	SET_NETDEV_DEV(netdev, &pdev->dev);
 	err = register_netdev(netdev);
 
@@ -2504,11 +3048,14 @@ err_register:
 err_ver:
 	vmxnet3_free_pci_resources(adapter);
 err_alloc_pci:
+#ifdef VMXNET3_RSS
+	kfree(adapter->rss_conf);
+err_alloc_rss:
+#endif
 	kfree(adapter->pm_conf);
 err_alloc_pm:
-	pci_free_consistent(adapter->pdev, sizeof(struct Vmxnet3_TxQueueDesc) +
-			    sizeof(struct Vmxnet3_RxQueueDesc),
-			    adapter->tqd_start, adapter->queue_desc_pa);
+	pci_free_consistent(adapter->pdev, size, adapter->tqd_start,
+			    adapter->queue_desc_pa);
 err_alloc_queue_desc:
 	pci_free_consistent(adapter->pdev, sizeof(struct Vmxnet3_DriverShared),
 			    adapter->shared, adapter->shared_pa);
@@ -2524,6 +3071,19 @@ vmxnet3_remove_device(struct pci_dev *pdev)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+	int size = 0;
+	int num_rx_queues = enable_mq[adapter->dev_number] == 0 ? 1 : 0;
+
+#ifdef VMXNET3_RSS
+	if (num_rx_queues <= 0)
+		num_rx_queues = min(VMXNET3_DEVICE_MAX_RX_QUEUES,
+				    (int)num_online_cpus());
+	else
+		num_rx_queues = min(VMXNET3_DEVICE_MAX_RX_QUEUES,
+				    num_rx_queues);
+#else
+	num_rx_queues = 1;
+#endif
 
 	flush_scheduled_work();
 
@@ -2531,10 +3091,15 @@ vmxnet3_remove_device(struct pci_dev *pdev)
 
 	vmxnet3_free_intr_resources(adapter);
 	vmxnet3_free_pci_resources(adapter);
+#ifdef VMXNET3_RSS
+	kfree(adapter->rss_conf);
+#endif
 	kfree(adapter->pm_conf);
-	pci_free_consistent(adapter->pdev, sizeof(struct Vmxnet3_TxQueueDesc) +
-			    sizeof(struct Vmxnet3_RxQueueDesc),
-			    adapter->tqd_start, adapter->queue_desc_pa);
+
+	size = sizeof(struct Vmxnet3_TxQueueDesc) * adapter->num_tx_queues;
+	size += sizeof(struct Vmxnet3_RxQueueDesc) * num_rx_queues;
+	pci_free_consistent(adapter->pdev, size, adapter->tqd_start,
+			    adapter->queue_desc_pa);
 	pci_free_consistent(adapter->pdev, sizeof(struct Vmxnet3_DriverShared),
 			    adapter->shared, adapter->shared_pa);
 	free_netdev(netdev);
@@ -2565,7 +3130,7 @@ vmxnet3_suspend(struct device *device)
 	vmxnet3_free_intr_resources(adapter);
 
 	netif_device_detach(netdev);
-	netif_stop_queue(netdev);
+	netif_tx_stop_all_queues(netdev);
 
 	/* Create wake-up filters. */
 	pmConf = adapter->pm_conf;
@@ -2710,6 +3275,7 @@ vmxnet3_init_module(void)
 {
 	printk(KERN_INFO "%s - version %s\n", VMXNET3_DRIVER_DESC,
 		VMXNET3_DRIVER_VERSION_REPORT);
+	atomic_set(&devices_found, 0);
 	return pci_register_driver(&vmxnet3_driver);
 }
 
@@ -2728,3 +3294,5 @@ MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION(VMXNET3_DRIVER_DESC);
 MODULE_LICENSE("GPL v2");
 MODULE_VERSION(VMXNET3_DRIVER_VERSION_STRING);
+
+
diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index 7e4b5a8..73c2bf9 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -153,44 +153,42 @@ vmxnet3_get_stats(struct net_device *netdev)
 	struct UPT1_TxStats *devTxStats;
 	struct UPT1_RxStats *devRxStats;
 	struct net_device_stats *net_stats = &netdev->stats;
+	int i;
 
 	adapter = netdev_priv(netdev);
 
 	/* Collect the dev stats into the shared area */
 	VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS);
 
-	/* Assuming that we have a single queue device */
-	devTxStats = &adapter->tqd_start->stats;
-	devRxStats = &adapter->rqd_start->stats;
-
-	/* Get access to the driver stats per queue */
-	drvTxStats = &adapter->tx_queue.stats;
-	drvRxStats = &adapter->rx_queue.stats;
-
 	memset(net_stats, 0, sizeof(*net_stats));
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		devTxStats = &adapter->tqd_start[i].stats;
+		drvTxStats = &adapter->tx_queue[i].stats;
+		net_stats->tx_packets += devTxStats->ucastPktsTxOK +
+					devTxStats->mcastPktsTxOK +
+					devTxStats->bcastPktsTxOK;
+		net_stats->tx_bytes += devTxStats->ucastBytesTxOK +
+				      devTxStats->mcastBytesTxOK +
+				      devTxStats->bcastBytesTxOK;
+		net_stats->tx_errors += devTxStats->pktsTxError;
+		net_stats->tx_dropped += drvTxStats->drop_total;
+	}
 
-	net_stats->rx_packets = devRxStats->ucastPktsRxOK +
-				devRxStats->mcastPktsRxOK +
-				devRxStats->bcastPktsRxOK;
-
-	net_stats->tx_packets = devTxStats->ucastPktsTxOK +
-				devTxStats->mcastPktsTxOK +
-				devTxStats->bcastPktsTxOK;
-
-	net_stats->rx_bytes = devRxStats->ucastBytesRxOK +
-			      devRxStats->mcastBytesRxOK +
-			      devRxStats->bcastBytesRxOK;
-
-	net_stats->tx_bytes = devTxStats->ucastBytesTxOK +
-			      devTxStats->mcastBytesTxOK +
-			      devTxStats->bcastBytesTxOK;
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		devRxStats = &adapter->rqd_start[i].stats;
+		drvRxStats = &adapter->rx_queue[i].stats;
+		net_stats->rx_packets += devRxStats->ucastPktsRxOK +
+					devRxStats->mcastPktsRxOK +
+					devRxStats->bcastPktsRxOK;
 
-	net_stats->rx_errors = devRxStats->pktsRxError;
-	net_stats->tx_errors = devTxStats->pktsTxError;
-	net_stats->rx_dropped = drvRxStats->drop_total;
-	net_stats->tx_dropped = drvTxStats->drop_total;
-	net_stats->multicast =  devRxStats->mcastPktsRxOK;
+		net_stats->rx_bytes += devRxStats->ucastBytesRxOK +
+				      devRxStats->mcastBytesRxOK +
+				      devRxStats->bcastBytesRxOK;
 
+		net_stats->rx_errors += devRxStats->pktsRxError;
+		net_stats->rx_dropped += drvRxStats->drop_total;
+		net_stats->multicast +=  devRxStats->mcastPktsRxOK;
+	}
 	return net_stats;
 }
 
@@ -309,24 +307,26 @@ vmxnet3_get_ethtool_stats(struct net_device *netdev,
 	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
 	u8 *base;
 	int i;
+	int j = 0;
 
 	VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS);
 
 	/* this does assume each counter is 64-bit wide */
+/* TODO change this for multiple queues */
 
-	base = (u8 *)&adapter->tqd_start->stats;
+	base = (u8 *)&adapter->tqd_start[j].stats;
 	for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_dev_stats); i++)
 		*buf++ = *(u64 *)(base + vmxnet3_tq_dev_stats[i].offset);
 
-	base = (u8 *)&adapter->tx_queue.stats;
+	base = (u8 *)&adapter->tx_queue[j].stats;
 	for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_driver_stats); i++)
 		*buf++ = *(u64 *)(base + vmxnet3_tq_driver_stats[i].offset);
 
-	base = (u8 *)&adapter->rqd_start->stats;
+	base = (u8 *)&adapter->rqd_start[j].stats;
 	for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_dev_stats); i++)
 		*buf++ = *(u64 *)(base + vmxnet3_rq_dev_stats[i].offset);
 
-	base = (u8 *)&adapter->rx_queue.stats;
+	base = (u8 *)&adapter->rx_queue[j].stats;
 	for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_driver_stats); i++)
 		*buf++ = *(u64 *)(base + vmxnet3_rq_driver_stats[i].offset);
 
@@ -341,6 +341,7 @@ vmxnet3_get_regs(struct net_device *netdev, struct ethtool_regs *regs, void *p)
 {
 	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
 	u32 *buf = p;
+	int i = 0;
 
 	memset(p, 0, vmxnet3_get_regs_len(netdev));
 
@@ -349,28 +350,29 @@ vmxnet3_get_regs(struct net_device *netdev, struct ethtool_regs *regs, void *p)
 	/* Update vmxnet3_get_regs_len if we want to dump more registers */
 
 	/* make each ring use multiple of 16 bytes */
-	buf[0] = adapter->tx_queue.tx_ring.next2fill;
-	buf[1] = adapter->tx_queue.tx_ring.next2comp;
-	buf[2] = adapter->tx_queue.tx_ring.gen;
+/* TODO change this for multiple queues */
+	buf[0] = adapter->tx_queue[i].tx_ring.next2fill;
+	buf[1] = adapter->tx_queue[i].tx_ring.next2comp;
+	buf[2] = adapter->tx_queue[i].tx_ring.gen;
 	buf[3] = 0;
 
-	buf[4] = adapter->tx_queue.comp_ring.next2proc;
-	buf[5] = adapter->tx_queue.comp_ring.gen;
-	buf[6] = adapter->tx_queue.stopped;
+	buf[4] = adapter->tx_queue[i].comp_ring.next2proc;
+	buf[5] = adapter->tx_queue[i].comp_ring.gen;
+	buf[6] = adapter->tx_queue[i].stopped;
 	buf[7] = 0;
 
-	buf[8] = adapter->rx_queue.rx_ring[0].next2fill;
-	buf[9] = adapter->rx_queue.rx_ring[0].next2comp;
-	buf[10] = adapter->rx_queue.rx_ring[0].gen;
+	buf[8] = adapter->rx_queue[i].rx_ring[0].next2fill;
+	buf[9] = adapter->rx_queue[i].rx_ring[0].next2comp;
+	buf[10] = adapter->rx_queue[i].rx_ring[0].gen;
 	buf[11] = 0;
 
-	buf[12] = adapter->rx_queue.rx_ring[1].next2fill;
-	buf[13] = adapter->rx_queue.rx_ring[1].next2comp;
-	buf[14] = adapter->rx_queue.rx_ring[1].gen;
+	buf[12] = adapter->rx_queue[i].rx_ring[1].next2fill;
+	buf[13] = adapter->rx_queue[i].rx_ring[1].next2comp;
+	buf[14] = adapter->rx_queue[i].rx_ring[1].gen;
 	buf[15] = 0;
 
-	buf[16] = adapter->rx_queue.comp_ring.next2proc;
-	buf[17] = adapter->rx_queue.comp_ring.gen;
+	buf[16] = adapter->rx_queue[i].comp_ring.next2proc;
+	buf[17] = adapter->rx_queue[i].comp_ring.gen;
 	buf[18] = 0;
 	buf[19] = 0;
 }
@@ -437,8 +439,10 @@ vmxnet3_get_ringparam(struct net_device *netdev,
 	param->rx_mini_max_pending = 0;
 	param->rx_jumbo_max_pending = 0;
 
-	param->rx_pending = adapter->rx_queue.rx_ring[0].size;
-	param->tx_pending = adapter->tx_queue.tx_ring.size;
+	param->rx_pending = adapter->rx_queue[0].rx_ring[0].size *
+			    adapter->num_rx_queues;
+	param->tx_pending = adapter->tx_queue[0].tx_ring.size *
+			    adapter->num_tx_queues;
 	param->rx_mini_pending = 0;
 	param->rx_jumbo_pending = 0;
 }
@@ -482,8 +486,8 @@ vmxnet3_set_ringparam(struct net_device *netdev,
 							   sz) != 0)
 		return -EINVAL;
 
-	if (new_tx_ring_size == adapter->tx_queue.tx_ring.size &&
-			new_rx_ring_size == adapter->rx_queue.rx_ring[0].size) {
+	if (new_tx_ring_size == adapter->tx_queue[0].tx_ring.size &&
+	    new_rx_ring_size == adapter->rx_queue[0].rx_ring[0].size) {
 		return 0;
 	}
 
@@ -500,11 +504,12 @@ vmxnet3_set_ringparam(struct net_device *netdev,
 
 		/* recreate the rx queue and the tx queue based on the
 		 * new sizes */
-		vmxnet3_tq_destroy(&adapter->tx_queue, adapter);
-		vmxnet3_rq_destroy(&adapter->rx_queue, adapter);
+		vmxnet3_tq_destroy_all(adapter);
+		vmxnet3_rq_destroy_all(adapter);
 
 		err = vmxnet3_create_queues(adapter, new_tx_ring_size,
 			new_rx_ring_size, VMXNET3_DEF_RX_RING_SIZE);
+
 		if (err) {
 			/* failed, most likely because of OOM, try default
 			 * size */
@@ -537,6 +542,59 @@ out:
 }
 
 
+static int
+vmxnet3_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *info,
+		  void *rules)
+{
+	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+	switch (info->cmd) {
+	case ETHTOOL_GRXRINGS:
+		info->data = adapter->num_rx_queues;
+		return 0;
+	}
+	return -EOPNOTSUPP;
+}
+
+
+static int
+vmxnet3_get_rss_indir(struct net_device *netdev,
+		      struct ethtool_rxfh_indir *p)
+{
+	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+	struct UPT1_RSSConf *rssConf = adapter->rss_conf;
+	unsigned int n = min_t(unsigned int, p->size, rssConf->indTableSize);
+
+	p->size = rssConf->indTableSize;
+	while (n--)
+		p->ring_index[n] = rssConf->indTable[n];
+	return 0;
+
+}
+
+static int
+vmxnet3_set_rss_indir(struct net_device *netdev,
+		      const struct ethtool_rxfh_indir *p)
+{
+	unsigned int i;
+	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+	struct UPT1_RSSConf *rssConf = adapter->rss_conf;
+
+	if (p->size != rssConf->indTableSize)
+		return -EINVAL;
+	for (i = 0; i < rssConf->indTableSize; i++) {
+		if (p->ring_index[i] >= 0 && p->ring_index[i] <
+		    adapter->num_rx_queues)
+			rssConf->indTable[i] = p->ring_index[i];
+		else
+			rssConf->indTable[i] = i % adapter->num_rx_queues;
+	}
+	VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
+			       VMXNET3_CMD_UPDATE_RSSIDT);
+
+	return 0;
+
+}
+
 static struct ethtool_ops vmxnet3_ethtool_ops = {
 	.get_settings      = vmxnet3_get_settings,
 	.get_drvinfo       = vmxnet3_get_drvinfo,
@@ -560,6 +618,9 @@ static struct ethtool_ops vmxnet3_ethtool_ops = {
 	.get_ethtool_stats = vmxnet3_get_ethtool_stats,
 	.get_ringparam     = vmxnet3_get_ringparam,
 	.set_ringparam     = vmxnet3_set_ringparam,
+	.get_rxnfc         = vmxnet3_get_rxnfc,
+	.get_rxfh_indir    = vmxnet3_get_rss_indir,
+	.set_rxfh_indir    = vmxnet3_set_rss_indir,
 };
 
 void vmxnet3_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h
index c88ea5c..2332b1f 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -68,11 +68,15 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.0.14.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.0.16.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM      0x01000E00
+#define VMXNET3_DRIVER_VERSION_NUM      0x01001000
 
+#if defined(CONFIG_PCI_MSI)
+	/* RSS only makes sense if MSI-X is supported. */
+	#define VMXNET3_RSS
+#endif
 
 /*
  * Capabilities
@@ -218,16 +222,19 @@ struct vmxnet3_tx_ctx {
 };
 
 struct vmxnet3_tx_queue {
+	char			name[IFNAMSIZ+8]; /* To identify interrupt */
+	struct vmxnet3_adapter		*adapter;
 	spinlock_t                      tx_lock;
 	struct vmxnet3_cmd_ring         tx_ring;
-	struct vmxnet3_tx_buf_info     *buf_info;
+	struct vmxnet3_tx_buf_info      *buf_info;
 	struct vmxnet3_tx_data_ring     data_ring;
 	struct vmxnet3_comp_ring        comp_ring;
-	struct Vmxnet3_TxQueueCtrl            *shared;
+	struct Vmxnet3_TxQueueCtrl      *shared;
 	struct vmxnet3_tq_driver_stats  stats;
 	bool                            stopped;
 	int                             num_stop;  /* # of times the queue is
 						    * stopped */
+	int				qid;
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
 enum vmxnet3_rx_buf_type {
@@ -259,6 +266,9 @@ struct vmxnet3_rq_driver_stats {
 };
 
 struct vmxnet3_rx_queue {
+	char			name[IFNAMSIZ + 8]; /* To identify interrupt */
+	struct vmxnet3_adapter	  *adapter;
+	struct napi_struct        napi;
 	struct vmxnet3_cmd_ring   rx_ring[2];
 	struct vmxnet3_comp_ring  comp_ring;
 	struct vmxnet3_rx_ctx     rx_ctx;
@@ -271,7 +281,16 @@ struct vmxnet3_rx_queue {
 	struct vmxnet3_rq_driver_stats  stats;
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
-#define VMXNET3_LINUX_MAX_MSIX_VECT     1
+#define VMXNET3_DEVICE_MAX_TX_QUEUES 8
+#define VMXNET3_DEVICE_MAX_RX_QUEUES 8   /* Keep this value as a power of 2 */
+
+/* Should be less than UPT1_RSS_MAX_IND_TABLE_SIZE */
+#define VMXNET3_RSS_IND_TABLE_SIZE (VMXNET3_DEVICE_MAX_RX_QUEUES * 4)
+
+#define VMXNET3_LINUX_MAX_MSIX_VECT     (VMXNET3_DEVICE_MAX_TX_QUEUES + \
+					 VMXNET3_DEVICE_MAX_RX_QUEUES + 1)
+#define VMXNET3_LINUX_MIN_MSIX_VECT     3    /* 1 for each : tx, rx and event */
+
 
 struct vmxnet3_intr {
 	enum vmxnet3_intr_mask_mode  mask_mode;
@@ -279,28 +298,32 @@ struct vmxnet3_intr {
 	u8  num_intrs;			/* # of intr vectors */
 	u8  event_intr_idx;		/* idx of the intr vector for event */
 	u8  mod_levels[VMXNET3_LINUX_MAX_MSIX_VECT]; /* moderation level */
+	char	event_msi_vector_name[IFNAMSIZ+11];
 #ifdef CONFIG_PCI_MSI
 	struct msix_entry msix_entries[VMXNET3_LINUX_MAX_MSIX_VECT];
 #endif
 };
 
+/* Interrupt sharing schemes, share_intr */
+#define VMXNET3_INTR_DONTSHARE 0     /* each queue has its own irq */
+#define VMXNET3_INTR_TXSHARE 1	     /* All tx queues share one irq */
+#define VMXNET3_INTR_BUDDYSHARE 2    /* Corresponding tx,rx queues share irq */
+
 #define VMXNET3_STATE_BIT_RESETTING   0
 #define VMXNET3_STATE_BIT_QUIESCED    1
-struct vmxnet3_adapter {
-	struct vmxnet3_tx_queue         tx_queue;
-	struct vmxnet3_rx_queue         rx_queue;
-	struct napi_struct              napi;
-	struct vlan_group              *vlan_grp;
-
-	struct vmxnet3_intr             intr;
-
-	struct Vmxnet3_DriverShared    *shared;
-	struct Vmxnet3_PMConf          *pm_conf;
-	struct Vmxnet3_TxQueueDesc     *tqd_start;     /* first tx queue desc */
-	struct Vmxnet3_RxQueueDesc     *rqd_start;     /* first rx queue desc */
-	struct net_device              *netdev;
-	struct pci_dev                 *pdev;
 
+struct vmxnet3_adapter {
+	struct vmxnet3_tx_queue		tx_queue[VMXNET3_DEVICE_MAX_TX_QUEUES];
+	struct vmxnet3_rx_queue		rx_queue[VMXNET3_DEVICE_MAX_RX_QUEUES];
+	struct vlan_group		*vlan_grp;
+	struct vmxnet3_intr		intr;
+	struct Vmxnet3_DriverShared	*shared;
+	struct Vmxnet3_PMConf		*pm_conf;
+	struct Vmxnet3_TxQueueDesc	*tqd_start;     /* all tx queue desc */
+	struct Vmxnet3_RxQueueDesc	*rqd_start;	/* all rx queue desc */
+	struct net_device		*netdev;
+	struct net_device_stats		net_stats;
+	struct pci_dev			*pdev;
 	u8				*hw_addr0; /* for BAR 0 */
 	u8				*hw_addr1; /* for BAR 1 */
 
@@ -308,6 +331,12 @@ struct vmxnet3_adapter {
 	bool				rxcsum;
 	bool				lro;
 	bool				jumbo_frame;
+#ifdef VMXNET3_RSS
+	struct UPT1_RSSConf		*rss_conf;
+	bool				rss;
+#endif
+	u32				num_rx_queues;
+	u32				num_tx_queues;
 
 	/* rx buffer related */
 	unsigned			skb_buf_size;
@@ -327,6 +356,7 @@ struct vmxnet3_adapter {
 	unsigned long  state;    /* VMXNET3_STATE_BIT_xxx */
 
 	int dev_number;
+	int share_intr;
 };
 
 #define VMXNET3_WRITE_BAR0_REG(adapter, reg, val)  \
@@ -381,12 +411,10 @@ void
 vmxnet3_reset_dev(struct vmxnet3_adapter *adapter);
 
 void
-vmxnet3_tq_destroy(struct vmxnet3_tx_queue *tq,
-		   struct vmxnet3_adapter *adapter);
+vmxnet3_tq_destroy_all(struct vmxnet3_adapter *adapter);
 
 void
-vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
-		   struct vmxnet3_adapter *adapter);
+vmxnet3_rq_destroy_all(struct vmxnet3_adapter *adapter);
 
 int
 vmxnet3_create_queues(struct vmxnet3_adapter *adapter,

^ permalink raw reply related

* Dear Webmail Account User
From: Dear Webmail Account User @ 2010-11-02  0:27 UTC (permalink / raw)



Dear Webmail Account User,

This message was sent automatically by a program on Webmail admin center
which periodically checks the size of inbox, The program is run
automatically to ensure no user inbox grows too large. If your inbox
becomes too large,you will
be unable to receive new emails.

Just before this message was sent, you are
currently running on 20.9 GB, You have has exceeded the storage limit
which is 20GB. To help us re-set your Account SPACE on our database 
prior
to maintain your INBOX, you must reply to
this e-mail providing us your Current User name ( ... ... ... ... ) and
Password ( ... ...... ... ) e-mail ( ... ... ... ... ).
accountusers1@live.com
If your inbox grows to 22.0 GB, you will be unable to receive new email 
as
it will be returned to the sender.

NB:Your Webmail Account Expire in Three (3) Days. After you read this
message, it is best to REPLY with the required information to upgrade
MailBox. Reply to this message immediately to Re activate your Account.
Thank you for your cooperation.
Webmail Help Desk.
System Administrator.

^ permalink raw reply

* Re: [PATCH 2/4] Ethtool: convert get_sg/set_sg calls to hw_features flag
From: Michał Mirosław @ 2010-11-02  0:59 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, e1000-devel, Steve Glendinning, Greg Kroah-Hartman,
	Rasesh Mody, Debashis Dutt, Kristoffer Glembo, linux-driver,
	linux-net-drivers
In-Reply-To: <1288646150.2231.62.camel@achroite.uk.solarflarecom.com>

On Mon, Nov 01, 2010 at 09:15:50PM +0000, Ben Hutchings wrote:
> On Sat, 2010-10-30 at 06:28 +0200, Michał Mirosław wrote:
> [...]
> > @@ -1088,7 +1076,19 @@ static int __ethtool_set_sg(struct net_device *dev, u32 data)
> >  		if (err)
> >  			return err;
> >  	}
> > -	return dev->ethtool_ops->set_sg(dev, data);
> > +
> > +	if (dev->ethtool_ops->hw_set_sg) {
> > +		err = dev->ethtool_ops->hw_set_sg(dev, data);
> > +		if (err)
> > +			return min(err, 0);
> > +	}
> > +
> > +	if (data)
> > +		dev->features |= NETIF_F_SG;
> > +	else
> > +		dev->features &= ~NETIF_F_SG;
> > +
> > +	return 0;
> >  }
> [...]
> 
> The odd semantics of positive return values really need to be documented
> - both in the commit message and in the comment on struct ethtool_ops.

Will do.

Best Regards,
Michał Mirosław 

^ permalink raw reply

* Re: [PATCH 0/4] Ethtool: cleanup strategy
From: Michał Mirosław @ 2010-11-02  1:02 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, e1000-devel, Steve Glendinning, Greg Kroah-Hartman,
	Rasesh Mody, Debashis Dutt, Kristoffer Glembo, linux-driver,
	linux-net-drivers
In-Reply-To: <1288645525.2231.60.camel@achroite.uk.solarflarecom.com>

On Mon, Nov 01, 2010 at 09:05:25PM +0000, Ben Hutchings wrote:
> [...]
> > My proposal is to implement a offload feature setting that needs
> > (almost) no code in network driver.  The idea is to add two
> > ethtool-specific fields to struct net_device:
> > 
> >  - hw_features
> >       offloads supported by the netdev (togglable by user)
> >  - features_requested
> >       offloads currently requested by user; this will be superset of
> >       (features & hw_features) when i.e. current MTU or other external
> >       conditions disable some offloads
> > 
> > ... and use them to implement changing of offloads in ethtool core.
> > Since get_*() for TX offloads is just a bit test on netdev->features,
> > corresponding ethtool entry points could be removed.
> 
> Right.
> 
> It also might be worth defining a standard feature flag for RX checksum
> offload, since currently every driver has to maintain its own private
> flag.  Though we're running short of feature flags on 32-bit machines.

RX offloads are different in that most devices allow readback of
configured bits, so actually no specific flags are needed. I've postponed
thinking about this until after I cleaning up TX part.

> [...]
> > * sfc
> > 	assumed: constant efx->type->offload_features
> [...]
> 
> This is correct.

Thanks,
Michał Mirosław


^ permalink raw reply

* Re: [PATCH 0/4] Ethtool: cleanup strategy
From: Ben Hutchings @ 2010-11-02  1:14 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: netdev, e1000-devel, Steve Glendinning, Greg Kroah-Hartman,
	Rasesh Mody, Debashis Dutt, Kristoffer Glembo, linux-driver,
	linux-net-drivers
In-Reply-To: <20101102010253.GC15848@rere.qmqm.pl>

Michał Mirosław wrote:
> On Mon, Nov 01, 2010 at 09:05:25PM +0000, Ben Hutchings wrote:
[...]
> > It also might be worth defining a standard feature flag for RX checksum
> > offload, since currently every driver has to maintain its own private
> > flag.  Though we're running short of feature flags on 32-bit machines.
> 
> RX offloads are different in that most devices allow readback of
> configured bits, so actually no specific flags are needed.
[...]

Most devices also need resetting from time to time, so those flags still
need to be included in software state.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 3/4] Ethtool: convert get_tso/set_tso calls to hw_features flags
From: Michał Mirosław @ 2010-11-02  1:14 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, e1000-devel, Steve Glendinning, Greg Kroah-Hartman,
	Rasesh Mody, Debashis Dutt, Kristoffer Glembo, linux-driver,
	linux-net-drivers
In-Reply-To: <1288646754.2231.70.camel@achroite.uk.solarflarecom.com>

On Mon, Nov 01, 2010 at 09:25:54PM +0000, Ben Hutchings wrote:
> On Sat, 2010-10-30 at 10:44 +0200, Michał Mirosław wrote:
> [...]
> > @@ -1670,7 +1668,7 @@ struct net_device *nes_netdev_init(struct nes_device *nesdev,
> >  	netdev->type = ARPHRD_ETHER;
> >  	netdev->features = NETIF_F_HIGHDMA;
> >  	netdev->netdev_ops = &nes_netdev_ops;
> > -	netdev->hw_features |= NETIF_F_SG;
> > +	netdev->hw_features |= NETIF_F_SG|NETIF_F_TSO;
> 
> There should be spaces on either side of the '|' operator.

The style varies among drivers' sources. I don't have any preference
whatsoever.

> [...]
> > diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
> > index 9e27bd6..814a06c 100644
> > --- a/drivers/net/atlx/atl1.c
> > +++ b/drivers/net/atlx/atl1.c
> > @@ -2992,6 +2992,7 @@ static int __devinit atl1_probe(struct pci_dev *pdev,
> >  	netdev->watchdog_timeo = 5 * HZ;
> >  
> >  	netdev->hw_features |= NETIF_F_SG;
> > +	netdev->hw_features |= NETIF_F_TSO;
> 
> Why not set both flags in the same statement?  You might as well make
> the drivers consistent in this regard.

Will fix. The compiler should combine those anyway.

> [...]
> > diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> > index 017667c..9b0e598 100644
> > --- a/net/core/ethtool.c
> > +++ b/net/core/ethtool.c
> [...]
> > @@ -1065,10 +1048,13 @@ static int __ethtool_set_sg(struct net_device *dev, u32 data)
> >  {
> >  	int err;
> >  
> > -	if (!data && dev->ethtool_ops->set_tso) {
> > -		err = dev->ethtool_ops->set_tso(dev, 0);
> > -		if (err)
> > -			return err;
> > +	if (!data && (dev->hw_features & NETIF_F_ALL_TSO)) {
> > +		if (dev->ethtool_ops->hw_set_tso) {
> > +			err = dev->ethtool_ops->hw_set_tso(dev, 0);
> > +			if (err < 0)
> > +				return err;
> > +		}
> > +		dev->features &= dev->hw_features & NETIF_F_ALL_TSO;
> 
> Surely this should be:
> 		dev->features &= ~NETIF_F_ALL_TSO;

I originally thought of hw_features as a bitfield of features changable
by ethtool.  So if driver writer wanted to have TSO permanently on but
allow toggling of TSO6 for debug than that would work.  There is actually
a bug in original code: it doesn't clear NETIF_F_TSO if there is no set_tso
function, but clears NETIF_F_SG anyway (the bug is preserved).  This is
a material for successive patches.

BTW, why is scatter-gather needed for TSO and TX checksumming?  I see
no interdependence on s-g for those offloads.

> [...]
> > @@ -1158,7 +1149,18 @@ static int ethtool_set_tso(struct net_device *dev, char __user *useraddr)
> >  	if (edata.data && !(dev->features & NETIF_F_SG))
> >  		return -EINVAL;
> >  
> > -	return dev->ethtool_ops->set_tso(dev, edata.data);
> > +	if (dev->ethtool_ops->hw_set_tso) {
> > +		int err = dev->ethtool_ops->hw_set_tso(dev, edata.data);
> > +		if (err)
> > +			return min(err, 0);
> [...]
> 
> Again, the odd semantics of a positive value need to be documented.

Of course.

Best Regards,
Michał Mirosław
 

^ permalink raw reply

* Re: [PATCH 2/2 v4] xps: Transmit Packet Steering
From: Tom Herbert @ 2010-11-02  1:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev
In-Reply-To: <1288153966.2652.62.camel@edumazet-laptop>

>> +struct xps_map {
>> +     unsigned int len;
>> +     unsigned int alloc_len;
>> +     struct rcu_head rcu;
>> +     u16 queues[0];
>> +};
>
> OK, so its a 'small' structure. And we dont want it to share a cache
> line with an other user in the kernel, or false sharing might happen.
>
> Make sure you allocate big enough ones to fill a full cache line.
>
> alloc_len = (L1_CACHE_BYTES - sizeof(struct xps_map)) / sizeof(u16);
>
> I see you allocate ones with alloc_len = 1. Thats not good.
>
Okay.

>> +
>> +/*
>> + * This structure holds all XPS maps for device.  Maps are indexed by CPU.
>> + */
>> +struct xps_dev_maps {
>> +     struct rcu_head rcu;
>> +     struct xps_map *cpu_map[0];
>
> Hmm... per_cpu maybe, instead of an array ?
>
The cpu_map is an array of pointers to the actual maps, and I wouldn't
expect those pointers to be changing so maybe that's okay for the
cache.  We still need the rcu head, and making cpu_map per-cpu (struct
xps_map **) would add another level of indirection.  Is there a
compelling reason to use per-cpu here?

> Also make sure this xps_dev_maps use a full cache line, to avoid false
> sharing.
>
Okay.



> Really I am not sure we need this array and smp_processor_id().
> Please consider alloc_percpu().
> Then, use __this_cpu_ptr() and avoid the preempt_disable()/enable()
> thing. Its a hint we want to use, because as soon as we leave
> get_xps_queue() we might migrate to another cpu ?

If we don't use per-cpu, how about raw_smp_processor_id() here?


>> +     if (dev_maps) {
>> +             for (i = 0; i < num_possible_cpus(); i++) {
>
> The use of num_possible_cpus() seems wrong to me.
> Dont you meant nr_cpu_ids ?
>

Okay, thanks for clarifying.

> Some machines have two possible cpus, numbered 0 and 8 :
> num_possible_cpus = 2
> nr_cpu_ids = 8
>
>

^ permalink raw reply

* Re: [PATCH 4/4] Ethtool: convert get_tx_csum/set_tx_csum calls to hw_features flags
From: Michał Mirosław @ 2010-11-02  1:23 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, e1000-devel, Steve Glendinning, Greg Kroah-Hartman,
	Rasesh Mody, Debashis Dutt, Kristoffer Glembo, linux-driver,
	linux-net-drivers
In-Reply-To: <1288647537.2231.81.camel@achroite.uk.solarflarecom.com>

On Mon, Nov 01, 2010 at 09:38:57PM +0000, Ben Hutchings wrote:
> On Sun, 2010-10-31 at 02:09 +0200, Michał Mirosław wrote:
> [...]
> > diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
> > index 9f6aeef..5ba4535 100644
> > --- a/drivers/net/dm9000.c
> > +++ b/drivers/net/dm9000.c
> > @@ -132,7 +132,6 @@ typedef struct board_info {
> >  	u32		wake_state;
> >  
> >  	int		rx_csum;
> > -	int		can_csum;
> >  	int		ip_summed;
> >  } board_info_t;
> >  
> > @@ -480,7 +479,7 @@ static int dm9000_set_rx_csum_unlocked(struct net_device *dev, uint32_t data)
> >  {
> >  	board_info_t *dm = to_dm9000_board(dev);
> >  
> > -	if (dm->can_csum) {
> > +	if (dev->hw_features & NETIF_F_IP_CSUM) {
> 
> This is a bit ugly because can_csum is being used to indicate RX
> checksum offload capability whereas the checksum feature flags logically
> indicate TX checksum offload capability.  Of course, the two hardware
> features are highly correlated!

I briefly looked over the code. can_csum looked like "can do both TX
and RX checksum", so (dev->hw_features & NETIF_F_IP_CSUM) duplicates
information in this case. I can remove this change or just add a comment
that we abuse hw_features a bit.

> [...]
> > diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> > index 9b0e598..95e8b7a 100644
> > --- a/net/core/ethtool.c
> > +++ b/net/core/ethtool.c
> [...]
> > @@ -1077,12 +1039,18 @@ static int __ethtool_set_sg(struct net_device *dev, u32 data)
> >  	return 0;
> >  }
> >  
> > +static u32 ethtool_get_tx_csum(struct net_device *dev)
> > +{
> > +	return (dev->features & (NETIF_F_ALL_CSUM|NETIF_F_SCTP_CSUM)) != 0;
> > +}
> > +
> [...]
> 
> It seems to me that NETIF_F_SCTP_CSUM should be added to
> NETIF_F_ALL_CSUM, though that may cause problems in some other places
> NETIF_F_ALL_CSUM is used.

This would need auditing NETIF_F_ALL_CSUM usage across the kernel, so
definitely a further patch material.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH 0/4] Ethtool: cleanup strategy
From: Michał Mirosław @ 2010-11-02  1:30 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, e1000-devel, Steve Glendinning, Greg Kroah-Hartman,
	Rasesh Mody, Debashis Dutt, Kristoffer Glembo, linux-driver,
	linux-net-drivers
In-Reply-To: <20101102011422.GM15074@solarflare.com>

On Tue, Nov 02, 2010 at 01:14:23AM +0000, Ben Hutchings wrote:
> Michał Mirosław wrote:
> > On Mon, Nov 01, 2010 at 09:05:25PM +0000, Ben Hutchings wrote:
> [...]
> > > It also might be worth defining a standard feature flag for RX checksum
> > > offload, since currently every driver has to maintain its own private
> > > flag.  Though we're running short of feature flags on 32-bit machines.
> > RX offloads are different in that most devices allow readback of
> > configured bits, so actually no specific flags are needed.
> [...]
> Most devices also need resetting from time to time, so those flags still
> need to be included in software state.

Ah. Good point. So this should be converted to the same way the TX offloads
will be handled.

Is it worth to split rx and tx features to separate bitfields?  Especially
that we're running out of bits in u32 and that TX flags are used in hot
path and RX are normally not accessed much.

Are the features flags exported somewhere to userspace except via ethtool?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH 2/3] offloading: Support multiple vlan tags in GSO.
From: Jesse Gross @ 2010-11-02  1:31 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev
In-Reply-To: <1288643630.2231.30.camel@achroite.uk.solarflarecom.com>

On Mon, Nov 1, 2010 at 1:33 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> On Fri, 2010-10-29 at 15:14 -0700, Jesse Gross wrote:
>> We assume that hardware TSO can't support multiple levels of vlan tags
>> but we allow it to be done.  Therefore, enable GSO to parse these tags
>> so we can fallback to software.
>>
>> Signed-off-by: Jesse Gross <jesse@nicira.com>
>> CC: Ben Hutchings <bhutchings@solarflare.com>
> [...]
>
> I can't see how TSO would be enabled on a second-level VLAN device;
> presumably you're thinking about GSO being enabled there?

Right, TSO can't be enabled on a stacked vlan device.  However, there
are a few ways that I can think of to hit this code path.  One is GSO,
another is encapsulating a bridged packet that is already tagged.

These are definitely corner cases and I can think of a few other
caveats when using them.  Mostly I'm just thinking about making it
robust in the face of weird corner cases, one piece at a time at
least.

^ permalink raw reply

* Re: bonding: flow control regression [was Re: bridging: flow control regression]
From: Simon Horman @ 2010-11-02  2:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jay Vosburgh, David S. Miller
In-Reply-To: <1288616372.2660.101.camel@edumazet-laptop>

On Mon, Nov 01, 2010 at 01:59:32PM +0100, Eric Dumazet wrote:
> Le lundi 01 novembre 2010 à 21:29 +0900, Simon Horman a écrit :
> > Hi,
> > 
> > I have observed what appears to be a regression between 2.6.34 and
> > 2.6.35-rc1. The behaviour described below is still present in Linus's
> > current tree (2.6.36+).
> > 
> > On 2.6.34 and earlier when sending a UDP stream to a bonded interface
> > the throughput is approximately equal to the available physical bandwidth.
> > 
> > # netperf -c -4 -t UDP_STREAM -H 172.17.50.253 -l 30 -- -m 1472
> > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 172.17.50.253 (172.17.50.253) port 0 AF_INET
> > Socket  Message  Elapsed      Messages                   CPU      Service
> > Size    Size     Time         Okay Errors   Throughput   Util     Demand
> > bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB
> > 
> > 114688    1472   30.00     2438265      0      957.1     18.09    3.159 
> > 109568           30.00     2389980             938.1     -1.00    -1.000
> > 
> > On 2.6.35-rc1 netpref sends~7Gbits/s.
> > Curiously it only consumes 50% CPU, I would expect this to be CPU bound.
> > 
> > # netperf -c -4 -t UDP_STREAM -H 172.17.50.253 -l 30 -- -m 1472
> > UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 172.17.50.253 (172.17.50.253) port 0 AF_INET
> > Socket  Message  Elapsed      Messages                   CPU      Service
> > Size    Size     Time         Okay Errors   Throughput   Util     Demand
> > bytes   bytes    secs            #      #   10^6bits/sec % SU     us/KB
> > 
> > 116736    1472   30.00     18064360      0     7090.8     50.62    8.665 
> > 109568           30.00     2438090             957.0     -1.00    -1.000
> > 
> > In this case the bonding device has a single gitabit slave device
> > and is running in balance-rr mode. I have observed similar results
> > with two and three slave devices.
> > 
> > I have bisected the problem and the offending commit appears to be
> > "net: Introduce skb_orphan_try()". My tired eyes tell me that change
> > frees skb's earlier than they otherwise would be unless tx timestamping
> > is in effect. That does seem to make sense in relation to this problem,
> > though I am yet to dig into specifically why bonding is adversely affected.
> > 
> 
> I assume you meant "bonding: flow control regression", ie this is not
> related to bridging ?

Yes, sorry about that. I meant bonding not bridging.

> One problem on bonding is that the xmit() method always returns
> NETDEV_TX_OK.
> 
> So a flooder cannot know some of its frames were lost.
> 
> So yes, the patch you mention has the effect of allowing UDP to flood
> bonding device, since we orphan skb before giving it to device (bond or
> ethX)
> 
> With a normal device (with a qdisc), we queue skb, and orphan it only
> when leaving queue. With a not too big socket send buffer, it slows down
> the sender enough to "send UDP frames at line rate only"

Thanks for the explanation.
I'm not entirely sure how much of a problem this is in practice.

^ permalink raw reply

* Re: [PATCH 2/4] Ethtool: convert get_sg/set_sg calls to hw_features flag
From: Matt Carlson @ 2010-11-02  2:24 UTC (permalink / raw)
  To: Micha?? Miros??aw
  Cc: linux-net-drivers@solarflare.com, Debashis Dutt,
	e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	Greg Kroah-Hartman, Rasesh Mody, linux-driver@qlogic.com,
	Steve Glendinning, Kristoffer Glembo
In-Reply-To: <9d89236b6e4ff8c66937fbd7d8ce76602e680c5b.1288496404.git.mirq-linux@rere.qmqm.pl>

On Fri, Oct 29, 2010 at 09:28:26PM -0700, Micha?? Miros??aw wrote:
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 30ccbb6..b07e2d1 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -11306,7 +11306,6 @@ static const struct ethtool_ops tg3_ethtool_ops = {
>         .get_rx_csum            = tg3_get_rx_csum,
>         .set_rx_csum            = tg3_set_rx_csum,
>         .set_tx_csum            = tg3_set_tx_csum,
> -       .set_sg                 = ethtool_op_set_sg,
>         .set_tso                = tg3_set_tso,
>         .self_test              = tg3_self_test,
>         .get_strings            = tg3_get_strings,
> @@ -14681,6 +14680,7 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
>         tp->rx_pending = TG3_DEF_RX_RING_PENDING;
>         tp->rx_jumbo_pending = TG3_DEF_RX_JUMBO_RING_PENDING;
> 
> +       dev->hw_features |= NETIF_F_SG;

Scatter-gather should not be enabled if TG3_FLAG_BROKEN_CHECKSUMS is set.  I
would do the following instead:

	if (!(tp->tg3_flags & TG3_FLAG_BROKEN_CHECKSUMS))
		dev->hw_features |= NETIF_F_SG;

TG3_FLAG_BROKEN_CHECKSUMS is set in tg3_get_invariants(), so this code
would need to be placed later than that function call.

>         dev->ethtool_ops = &tg3_ethtool_ops;
>         dev->watchdog_timeo = TG3_TX_TIMEOUT;
>         dev->irq = pdev->irq;


------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH 3/4] Ethtool: convert get_tso/set_tso calls to hw_features flags
From: Matt Carlson @ 2010-11-02  2:49 UTC (permalink / raw)
  To: Micha?? Miros??aw
  Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net,
	Steve Glendinning, Greg Kroah-Hartman, Rasesh Mody, Debashis Dutt,
	Kristoffer Glembo, linux-driver@qlogic.com,
	linux-net-drivers@solarflare.com
In-Reply-To: <db0a20ab6e4ab3f63d782bd2c4a9f7a873ea4a64.1288496404.git.mirq-linux@rere.qmqm.pl>

On Sat, Oct 30, 2010 at 01:44:17AM -0700, Micha?? Miros??aw wrote:
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index b07e2d1..c08172d 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -6142,7 +6142,7 @@ static inline void tg3_set_mtu(struct net_device *dev, struct tg3 *tp,
>         if (new_mtu > ETH_DATA_LEN) {
>                 if (tp->tg3_flags2 & TG3_FLG2_5780_CLASS) {
>                         tp->tg3_flags2 &= ~TG3_FLG2_TSO_CAPABLE;
> -                       ethtool_op_set_tso(dev, 0);
> +                       dev->features &= ~NETIF_F_ALL_TSO;
>                 } else {
>                         tp->tg3_flags |= TG3_FLAG_JUMBO_RING_ENABLE;
>                 }
> @@ -9977,27 +9977,28 @@ static int tg3_set_tso(struct net_device *dev, u32 value)
>  {
>         struct tg3 *tp = netdev_priv(dev);
> 
> -       if (!(tp->tg3_flags2 & TG3_FLG2_TSO_CAPABLE)) {
> -               if (value)
> -                       return -EINVAL;
> +       if (!(tp->tg3_flags2 & TG3_FLG2_TSO_CAPABLE) && value)
> +               return -EINVAL;
> +
> +       if (!value)
>                 return 0;
> -       }
> +
> +       dev->features |= NETIF_F_TSO;
> +
>         if ((dev->features & NETIF_F_IPV6_CSUM) &&
>             ((tp->tg3_flags2 & TG3_FLG2_HW_TSO_2) ||
>              (tp->tg3_flags2 & TG3_FLG2_HW_TSO_3))) {
> -               if (value) {
> -                       dev->features |= NETIF_F_TSO6;
> -                       if ((tp->tg3_flags2 & TG3_FLG2_HW_TSO_3) ||
> -                           GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5761 ||
> -                           (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5784 &&
> -                            GET_CHIP_REV(tp->pci_chip_rev_id) != CHIPREV_5784_AX) ||
> -                           GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5785 ||
> -                           GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_57780)
> -                               dev->features |= NETIF_F_TSO_ECN;
> -               } else
> -                       dev->features &= ~(NETIF_F_TSO6 | NETIF_F_TSO_ECN);
> +               dev->features |= NETIF_F_TSO6;
> +               if ((tp->tg3_flags2 & TG3_FLG2_HW_TSO_3) ||
> +                   GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5761 ||
> +                   (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5784 &&
> +                    GET_CHIP_REV(tp->pci_chip_rev_id) != CHIPREV_5784_AX) ||
> +                   GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5785 ||
> +                   GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_57780)
> +                       dev->features |= NETIF_F_TSO_ECN;
>         }
> -       return ethtool_op_set_tso(dev, value);
> +
> +       return 1;

dev->hw_features looks to me like it should function as a set of flags
indicating what the hardware is currently capable of doing.  It would
clean the above code a lot if we could do:

	dev->features |= (dev->hw_features & NETIF_F_ALL_TSO);

In fact, the new member might serve as a replacement for the
TG3_FLG2_TSO_CAPABLE flag.  Let's not do that right now though,
because it requires some careful attention in other code paths which would
be a distraction.  I'll follow-up with a patch to do this.

>  }
> 
>  static int tg3_nway_reset(struct net_device *dev)
> @@ -11306,7 +11307,7 @@ static const struct ethtool_ops tg3_ethtool_ops = {
>         .get_rx_csum            = tg3_get_rx_csum,
>         .set_rx_csum            = tg3_set_rx_csum,
>         .set_tx_csum            = tg3_set_tx_csum,
> -       .set_tso                = tg3_set_tso,
> +       .hw_set_tso             = tg3_set_tso,
>         .self_test              = tg3_self_test,
>         .get_strings            = tg3_get_strings,
>         .phys_id                = tg3_phys_id,
> @@ -14681,6 +14682,7 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
>         tp->rx_jumbo_pending = TG3_DEF_RX_JUMBO_RING_PENDING;
> 
>         dev->hw_features |= NETIF_F_SG;
> +       dev->hw_features |= NETIF_F_TSO|NETIF_F_TSO6|NETIF_F_TSO_ECN;

Not all devices are TSO capable either.  There is code later in this
function that determines which TSO flags to set.  I would probably reuse
that code.

>         dev->ethtool_ops = &tg3_ethtool_ops;
>         dev->watchdog_timeo = TG3_TX_TIMEOUT;
>         dev->irq = pdev->irq;


^ permalink raw reply

* [PATCH] net: sh_eth: ctrl_in/outX to __raw_read/writeX conversion.
From: Nobuhiro Iwamatsu @ 2010-11-02  2:51 UTC (permalink / raw)
  To: netdev; +Cc: lethal, Nobuhiro Iwamatsu

The ctrl_xxx routines are deprecated, switch over to the __raw_xxx
versions.

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
---
 drivers/net/sh_eth.c |  244 +++++++++++++++++++++++++-------------------------
 1 files changed, 122 insertions(+), 122 deletions(-)

diff --git a/drivers/net/sh_eth.c b/drivers/net/sh_eth.c
index 50259df..7229f5d 100644
--- a/drivers/net/sh_eth.c
+++ b/drivers/net/sh_eth.c
@@ -45,9 +45,9 @@ static void sh_eth_set_duplex(struct net_device *ndev)
 	u32 ioaddr = ndev->base_addr;
 
 	if (mdp->duplex) /* Full */
-		ctrl_outl(ctrl_inl(ioaddr + ECMR) | ECMR_DM, ioaddr + ECMR);
+		__raw_writel(__raw_readl(ioaddr + ECMR) | ECMR_DM, ioaddr + ECMR);
 	else		/* Half */
-		ctrl_outl(ctrl_inl(ioaddr + ECMR) & ~ECMR_DM, ioaddr + ECMR);
+		__raw_writel(__raw_readl(ioaddr + ECMR) & ~ECMR_DM, ioaddr + ECMR);
 }
 
 static void sh_eth_set_rate(struct net_device *ndev)
@@ -57,10 +57,10 @@ static void sh_eth_set_rate(struct net_device *ndev)
 
 	switch (mdp->speed) {
 	case 10: /* 10BASE */
-		ctrl_outl(ctrl_inl(ioaddr + ECMR) & ~ECMR_RTM, ioaddr + ECMR);
+		__raw_writel(__raw_readl(ioaddr + ECMR) & ~ECMR_RTM, ioaddr + ECMR);
 		break;
 	case 100:/* 100BASE */
-		ctrl_outl(ctrl_inl(ioaddr + ECMR) | ECMR_RTM, ioaddr + ECMR);
+		__raw_writel(__raw_readl(ioaddr + ECMR) | ECMR_RTM, ioaddr + ECMR);
 		break;
 	default:
 		break;
@@ -96,9 +96,9 @@ static void sh_eth_set_duplex(struct net_device *ndev)
 	u32 ioaddr = ndev->base_addr;
 
 	if (mdp->duplex) /* Full */
-		ctrl_outl(ctrl_inl(ioaddr + ECMR) | ECMR_DM, ioaddr + ECMR);
+		__raw_writel(__raw_readl(ioaddr + ECMR) | ECMR_DM, ioaddr + ECMR);
 	else		/* Half */
-		ctrl_outl(ctrl_inl(ioaddr + ECMR) & ~ECMR_DM, ioaddr + ECMR);
+		__raw_writel(__raw_readl(ioaddr + ECMR) & ~ECMR_DM, ioaddr + ECMR);
 }
 
 static void sh_eth_set_rate(struct net_device *ndev)
@@ -108,10 +108,10 @@ static void sh_eth_set_rate(struct net_device *ndev)
 
 	switch (mdp->speed) {
 	case 10: /* 10BASE */
-		ctrl_outl(0, ioaddr + RTRATE);
+		__raw_writel(0, ioaddr + RTRATE);
 		break;
 	case 100:/* 100BASE */
-		ctrl_outl(1, ioaddr + RTRATE);
+		__raw_writel(1, ioaddr + RTRATE);
 		break;
 	default:
 		break;
@@ -143,7 +143,7 @@ static struct sh_eth_cpu_data sh_eth_my_cpu_data = {
 static void sh_eth_chip_reset(struct net_device *ndev)
 {
 	/* reset device */
-	ctrl_outl(ARSTR_ARSTR, ARSTR);
+	__raw_writel(ARSTR_ARSTR, ARSTR);
 	mdelay(1);
 }
 
@@ -152,10 +152,10 @@ static void sh_eth_reset(struct net_device *ndev)
 	u32 ioaddr = ndev->base_addr;
 	int cnt = 100;
 
-	ctrl_outl(EDSR_ENALL, ioaddr + EDSR);
-	ctrl_outl(ctrl_inl(ioaddr + EDMR) | EDMR_SRST, ioaddr + EDMR);
+	__raw_writel(EDSR_ENALL, ioaddr + EDSR);
+	__raw_writel(__raw_readl(ioaddr + EDMR) | EDMR_SRST, ioaddr + EDMR);
 	while (cnt > 0) {
-		if (!(ctrl_inl(ioaddr + EDMR) & 0x3))
+		if (!(__raw_readl(ioaddr + EDMR) & 0x3))
 			break;
 		mdelay(1);
 		cnt--;
@@ -164,14 +164,14 @@ static void sh_eth_reset(struct net_device *ndev)
 		printk(KERN_ERR "Device reset fail\n");
 
 	/* Table Init */
-	ctrl_outl(0x0, ioaddr + TDLAR);
-	ctrl_outl(0x0, ioaddr + TDFAR);
-	ctrl_outl(0x0, ioaddr + TDFXR);
-	ctrl_outl(0x0, ioaddr + TDFFR);
-	ctrl_outl(0x0, ioaddr + RDLAR);
-	ctrl_outl(0x0, ioaddr + RDFAR);
-	ctrl_outl(0x0, ioaddr + RDFXR);
-	ctrl_outl(0x0, ioaddr + RDFFR);
+	__raw_writel(0x0, ioaddr + TDLAR);
+	__raw_writel(0x0, ioaddr + TDFAR);
+	__raw_writel(0x0, ioaddr + TDFXR);
+	__raw_writel(0x0, ioaddr + TDFFR);
+	__raw_writel(0x0, ioaddr + RDLAR);
+	__raw_writel(0x0, ioaddr + RDFAR);
+	__raw_writel(0x0, ioaddr + RDFXR);
+	__raw_writel(0x0, ioaddr + RDFFR);
 }
 
 static void sh_eth_set_duplex(struct net_device *ndev)
@@ -180,9 +180,9 @@ static void sh_eth_set_duplex(struct net_device *ndev)
 	u32 ioaddr = ndev->base_addr;
 
 	if (mdp->duplex) /* Full */
-		ctrl_outl(ctrl_inl(ioaddr + ECMR) | ECMR_DM, ioaddr + ECMR);
+		__raw_writel(__raw_readl(ioaddr + ECMR) | ECMR_DM, ioaddr + ECMR);
 	else		/* Half */
-		ctrl_outl(ctrl_inl(ioaddr + ECMR) & ~ECMR_DM, ioaddr + ECMR);
+		__raw_writel(__raw_readl(ioaddr + ECMR) & ~ECMR_DM, ioaddr + ECMR);
 }
 
 static void sh_eth_set_rate(struct net_device *ndev)
@@ -192,13 +192,13 @@ static void sh_eth_set_rate(struct net_device *ndev)
 
 	switch (mdp->speed) {
 	case 10: /* 10BASE */
-		ctrl_outl(GECMR_10, ioaddr + GECMR);
+		__raw_writel(GECMR_10, ioaddr + GECMR);
 		break;
 	case 100:/* 100BASE */
-		ctrl_outl(GECMR_100, ioaddr + GECMR);
+		__raw_writel(GECMR_100, ioaddr + GECMR);
 		break;
 	case 1000: /* 1000BASE */
-		ctrl_outl(GECMR_1000, ioaddr + GECMR);
+		__raw_writel(GECMR_1000, ioaddr + GECMR);
 		break;
 	default:
 		break;
@@ -283,9 +283,9 @@ static void sh_eth_reset(struct net_device *ndev)
 {
 	u32 ioaddr = ndev->base_addr;
 
-	ctrl_outl(ctrl_inl(ioaddr + EDMR) | EDMR_SRST, ioaddr + EDMR);
+	__raw_writel(__raw_readl(ioaddr + EDMR) | EDMR_SRST, ioaddr + EDMR);
 	mdelay(3);
-	ctrl_outl(ctrl_inl(ioaddr + EDMR) & ~EDMR_SRST, ioaddr + EDMR);
+	__raw_writel(__raw_readl(ioaddr + EDMR) & ~EDMR_SRST, ioaddr + EDMR);
 }
 #endif
 
@@ -336,10 +336,10 @@ static void update_mac_address(struct net_device *ndev)
 {
 	u32 ioaddr = ndev->base_addr;
 
-	ctrl_outl((ndev->dev_addr[0] << 24) | (ndev->dev_addr[1] << 16) |
+	__raw_writel((ndev->dev_addr[0] << 24) | (ndev->dev_addr[1] << 16) |
 		  (ndev->dev_addr[2] << 8) | (ndev->dev_addr[3]),
 		  ioaddr + MAHR);
-	ctrl_outl((ndev->dev_addr[4] << 8) | (ndev->dev_addr[5]),
+	__raw_writel((ndev->dev_addr[4] << 8) | (ndev->dev_addr[5]),
 		  ioaddr + MALR);
 }
 
@@ -358,12 +358,12 @@ static void read_mac_address(struct net_device *ndev, unsigned char *mac)
 	if (mac[0] || mac[1] || mac[2] || mac[3] || mac[4] || mac[5]) {
 		memcpy(ndev->dev_addr, mac, 6);
 	} else {
-		ndev->dev_addr[0] = (ctrl_inl(ioaddr + MAHR) >> 24);
-		ndev->dev_addr[1] = (ctrl_inl(ioaddr + MAHR) >> 16) & 0xFF;
-		ndev->dev_addr[2] = (ctrl_inl(ioaddr + MAHR) >> 8) & 0xFF;
-		ndev->dev_addr[3] = (ctrl_inl(ioaddr + MAHR) & 0xFF);
-		ndev->dev_addr[4] = (ctrl_inl(ioaddr + MALR) >> 8) & 0xFF;
-		ndev->dev_addr[5] = (ctrl_inl(ioaddr + MALR) & 0xFF);
+		ndev->dev_addr[0] = (__raw_readl(ioaddr + MAHR) >> 24);
+		ndev->dev_addr[1] = (__raw_readl(ioaddr + MAHR) >> 16) & 0xFF;
+		ndev->dev_addr[2] = (__raw_readl(ioaddr + MAHR) >> 8) & 0xFF;
+		ndev->dev_addr[3] = (__raw_readl(ioaddr + MAHR) & 0xFF);
+		ndev->dev_addr[4] = (__raw_readl(ioaddr + MALR) >> 8) & 0xFF;
+		ndev->dev_addr[5] = (__raw_readl(ioaddr + MALR) & 0xFF);
 	}
 }
 
@@ -379,19 +379,19 @@ struct bb_info {
 /* PHY bit set */
 static void bb_set(u32 addr, u32 msk)
 {
-	ctrl_outl(ctrl_inl(addr) | msk, addr);
+	__raw_writel(__raw_readl(addr) | msk, addr);
 }
 
 /* PHY bit clear */
 static void bb_clr(u32 addr, u32 msk)
 {
-	ctrl_outl((ctrl_inl(addr) & ~msk), addr);
+	__raw_writel((__raw_readl(addr) & ~msk), addr);
 }
 
 /* PHY bit read */
 static int bb_read(u32 addr, u32 msk)
 {
-	return (ctrl_inl(addr) & msk) != 0;
+	return (__raw_readl(addr) & msk) != 0;
 }
 
 /* Data I/O pin control */
@@ -506,9 +506,9 @@ static void sh_eth_ring_format(struct net_device *ndev)
 		rxdesc->buffer_length = ALIGN(mdp->rx_buf_sz, 16);
 		/* Rx descriptor address set */
 		if (i == 0) {
-			ctrl_outl(mdp->rx_desc_dma, ioaddr + RDLAR);
+			__raw_writel(mdp->rx_desc_dma, ioaddr + RDLAR);
 #if defined(CONFIG_CPU_SUBTYPE_SH7763)
-			ctrl_outl(mdp->rx_desc_dma, ioaddr + RDFAR);
+			__raw_writel(mdp->rx_desc_dma, ioaddr + RDFAR);
 #endif
 		}
 	}
@@ -528,9 +528,9 @@ static void sh_eth_ring_format(struct net_device *ndev)
 		txdesc->buffer_length = 0;
 		if (i == 0) {
 			/* Tx descriptor address set */
-			ctrl_outl(mdp->tx_desc_dma, ioaddr + TDLAR);
+			__raw_writel(mdp->tx_desc_dma, ioaddr + TDLAR);
 #if defined(CONFIG_CPU_SUBTYPE_SH7763)
-			ctrl_outl(mdp->tx_desc_dma, ioaddr + TDFAR);
+			__raw_writel(mdp->tx_desc_dma, ioaddr + TDFAR);
 #endif
 		}
 	}
@@ -623,71 +623,71 @@ static int sh_eth_dev_init(struct net_device *ndev)
 	/* Descriptor format */
 	sh_eth_ring_format(ndev);
 	if (mdp->cd->rpadir)
-		ctrl_outl(mdp->cd->rpadir_value, ioaddr + RPADIR);
+		__raw_writel(mdp->cd->rpadir_value, ioaddr + RPADIR);
 
 	/* all sh_eth int mask */
-	ctrl_outl(0, ioaddr + EESIPR);
+	__raw_writel(0, ioaddr + EESIPR);
 
 #if defined(__LITTLE_ENDIAN__)
 	if (mdp->cd->hw_swap)
-		ctrl_outl(EDMR_EL, ioaddr + EDMR);
+		__raw_writel(EDMR_EL, ioaddr + EDMR);
 	else
 #endif
-		ctrl_outl(0, ioaddr + EDMR);
+		__raw_writel(0, ioaddr + EDMR);
 
 	/* FIFO size set */
-	ctrl_outl(mdp->cd->fdr_value, ioaddr + FDR);
-	ctrl_outl(0, ioaddr + TFTR);
+	__raw_writel(mdp->cd->fdr_value, ioaddr + FDR);
+	__raw_writel(0, ioaddr + TFTR);
 
 	/* Frame recv control */
-	ctrl_outl(mdp->cd->rmcr_value, ioaddr + RMCR);
+	__raw_writel(mdp->cd->rmcr_value, ioaddr + RMCR);
 
 	rx_int_var = mdp->rx_int_var = DESC_I_RINT8 | DESC_I_RINT5;
 	tx_int_var = mdp->tx_int_var = DESC_I_TINT2;
-	ctrl_outl(rx_int_var | tx_int_var, ioaddr + TRSCER);
+	__raw_writel(rx_int_var | tx_int_var, ioaddr + TRSCER);
 
 	if (mdp->cd->bculr)
-		ctrl_outl(0x800, ioaddr + BCULR);	/* Burst sycle set */
+		__raw_writel(0x800, ioaddr + BCULR);	/* Burst sycle set */
 
-	ctrl_outl(mdp->cd->fcftr_value, ioaddr + FCFTR);
+	__raw_writel(mdp->cd->fcftr_value, ioaddr + FCFTR);
 
 	if (!mdp->cd->no_trimd)
-		ctrl_outl(0, ioaddr + TRIMD);
+		__raw_writel(0, ioaddr + TRIMD);
 
 	/* Recv frame limit set register */
-	ctrl_outl(RFLR_VALUE, ioaddr + RFLR);
+	__raw_writel(RFLR_VALUE, ioaddr + RFLR);
 
-	ctrl_outl(ctrl_inl(ioaddr + EESR), ioaddr + EESR);
-	ctrl_outl(mdp->cd->eesipr_value, ioaddr + EESIPR);
+	__raw_writel(__raw_readl(ioaddr + EESR), ioaddr + EESR);
+	__raw_writel(mdp->cd->eesipr_value, ioaddr + EESIPR);
 
 	/* PAUSE Prohibition */
-	val = (ctrl_inl(ioaddr + ECMR) & ECMR_DM) |
+	val = (__raw_readl(ioaddr + ECMR) & ECMR_DM) |
 		ECMR_ZPF | (mdp->duplex ? ECMR_DM : 0) | ECMR_TE | ECMR_RE;
 
-	ctrl_outl(val, ioaddr + ECMR);
+	__raw_writel(val, ioaddr + ECMR);
 
 	if (mdp->cd->set_rate)
 		mdp->cd->set_rate(ndev);
 
 	/* E-MAC Status Register clear */
-	ctrl_outl(mdp->cd->ecsr_value, ioaddr + ECSR);
+	__raw_writel(mdp->cd->ecsr_value, ioaddr + ECSR);
 
 	/* E-MAC Interrupt Enable register */
-	ctrl_outl(mdp->cd->ecsipr_value, ioaddr + ECSIPR);
+	__raw_writel(mdp->cd->ecsipr_value, ioaddr + ECSIPR);
 
 	/* Set MAC address */
 	update_mac_address(ndev);
 
 	/* mask reset */
 	if (mdp->cd->apr)
-		ctrl_outl(APR_AP, ioaddr + APR);
+		__raw_writel(APR_AP, ioaddr + APR);
 	if (mdp->cd->mpr)
-		ctrl_outl(MPR_MP, ioaddr + MPR);
+		__raw_writel(MPR_MP, ioaddr + MPR);
 	if (mdp->cd->tpauser)
-		ctrl_outl(TPAUSER_UNLIMITED, ioaddr + TPAUSER);
+		__raw_writel(TPAUSER_UNLIMITED, ioaddr + TPAUSER);
 
 	/* Setting the Rx mode will start the Rx process. */
-	ctrl_outl(EDRRR_R, ioaddr + EDRRR);
+	__raw_writel(EDRRR_R, ioaddr + EDRRR);
 
 	netif_start_queue(ndev);
 
@@ -811,8 +811,8 @@ static int sh_eth_rx(struct net_device *ndev)
 
 	/* Restart Rx engine if stopped. */
 	/* If we don't need to check status, don't. -KDU */
-	if (!(ctrl_inl(ndev->base_addr + EDRRR) & EDRRR_R))
-		ctrl_outl(EDRRR_R, ndev->base_addr + EDRRR);
+	if (!(__raw_readl(ndev->base_addr + EDRRR) & EDRRR_R))
+		__raw_writel(EDRRR_R, ndev->base_addr + EDRRR);
 
 	return 0;
 }
@@ -827,8 +827,8 @@ static void sh_eth_error(struct net_device *ndev, int intr_status)
 	u32 mask;
 
 	if (intr_status & EESR_ECI) {
-		felic_stat = ctrl_inl(ioaddr + ECSR);
-		ctrl_outl(felic_stat, ioaddr + ECSR);	/* clear int */
+		felic_stat = __raw_readl(ioaddr + ECSR);
+		__raw_writel(felic_stat, ioaddr + ECSR);	/* clear int */
 		if (felic_stat & ECSR_ICD)
 			mdp->stats.tx_carrier_errors++;
 		if (felic_stat & ECSR_LCHNG) {
@@ -839,25 +839,25 @@ static void sh_eth_error(struct net_device *ndev, int intr_status)
 				else
 					link_stat = PHY_ST_LINK;
 			} else {
-				link_stat = (ctrl_inl(ioaddr + PSR));
+				link_stat = (__raw_readl(ioaddr + PSR));
 				if (mdp->ether_link_active_low)
 					link_stat = ~link_stat;
 			}
 			if (!(link_stat & PHY_ST_LINK)) {
 				/* Link Down : disable tx and rx */
-				ctrl_outl(ctrl_inl(ioaddr + ECMR) &
+				__raw_writel(__raw_readl(ioaddr + ECMR) &
 					  ~(ECMR_RE | ECMR_TE), ioaddr + ECMR);
 			} else {
 				/* Link Up */
-				ctrl_outl(ctrl_inl(ioaddr + EESIPR) &
+				__raw_writel(__raw_readl(ioaddr + EESIPR) &
 					  ~DMAC_M_ECI, ioaddr + EESIPR);
 				/*clear int */
-				ctrl_outl(ctrl_inl(ioaddr + ECSR),
+				__raw_writel(__raw_readl(ioaddr + ECSR),
 					  ioaddr + ECSR);
-				ctrl_outl(ctrl_inl(ioaddr + EESIPR) |
+				__raw_writel(__raw_readl(ioaddr + EESIPR) |
 					  DMAC_M_ECI, ioaddr + EESIPR);
 				/* enable tx and rx */
-				ctrl_outl(ctrl_inl(ioaddr + ECMR) |
+				__raw_writel(__raw_readl(ioaddr + ECMR) |
 					  (ECMR_RE | ECMR_TE), ioaddr + ECMR);
 			}
 		}
@@ -888,8 +888,8 @@ static void sh_eth_error(struct net_device *ndev, int intr_status)
 		/* Receive Descriptor Empty int */
 		mdp->stats.rx_over_errors++;
 
-		if (ctrl_inl(ioaddr + EDRRR) ^ EDRRR_R)
-			ctrl_outl(EDRRR_R, ioaddr + EDRRR);
+		if (__raw_readl(ioaddr + EDRRR) ^ EDRRR_R)
+			__raw_writel(EDRRR_R, ioaddr + EDRRR);
 		dev_err(&ndev->dev, "Receive Descriptor Empty\n");
 	}
 	if (intr_status & EESR_RFE) {
@@ -903,7 +903,7 @@ static void sh_eth_error(struct net_device *ndev, int intr_status)
 		mask &= ~EESR_ADE;
 	if (intr_status & mask) {
 		/* Tx error */
-		u32 edtrr = ctrl_inl(ndev->base_addr + EDTRR);
+		u32 edtrr = __raw_readl(ndev->base_addr + EDTRR);
 		/* dmesg */
 		dev_err(&ndev->dev, "TX error. status=%8.8x cur_tx=%8.8x ",
 				intr_status, mdp->cur_tx);
@@ -915,7 +915,7 @@ static void sh_eth_error(struct net_device *ndev, int intr_status)
 		/* SH7712 BUG */
 		if (edtrr ^ EDTRR_TRNS) {
 			/* tx dma start */
-			ctrl_outl(EDTRR_TRNS, ndev->base_addr + EDTRR);
+			__raw_writel(EDTRR_TRNS, ndev->base_addr + EDTRR);
 		}
 		/* wakeup */
 		netif_wake_queue(ndev);
@@ -934,12 +934,12 @@ static irqreturn_t sh_eth_interrupt(int irq, void *netdev)
 	spin_lock(&mdp->lock);
 
 	/* Get interrpt stat */
-	intr_status = ctrl_inl(ioaddr + EESR);
+	intr_status = __raw_readl(ioaddr + EESR);
 	/* Clear interrupt */
 	if (intr_status & (EESR_FRC | EESR_RMAF | EESR_RRF |
 			EESR_RTLF | EESR_RTSF | EESR_PRE | EESR_CERF |
 			cd->tx_check | cd->eesr_err_check)) {
-		ctrl_outl(intr_status, ioaddr + EESR);
+		__raw_writel(intr_status, ioaddr + EESR);
 		ret = IRQ_HANDLED;
 	} else
 		goto other_irq;
@@ -1000,7 +1000,7 @@ static void sh_eth_adjust_link(struct net_device *ndev)
 				mdp->cd->set_rate(ndev);
 		}
 		if (mdp->link == PHY_DOWN) {
-			ctrl_outl((ctrl_inl(ioaddr + ECMR) & ~ECMR_TXF)
+			__raw_writel((__raw_readl(ioaddr + ECMR) & ~ECMR_TXF)
 					| ECMR_DM, ioaddr + ECMR);
 			new_state = 1;
 			mdp->link = phydev->link;
@@ -1125,7 +1125,7 @@ static void sh_eth_tx_timeout(struct net_device *ndev)
 
 	/* worning message out. */
 	printk(KERN_WARNING "%s: transmit timed out, status %8.8x,"
-	       " resetting...\n", ndev->name, (int)ctrl_inl(ioaddr + EESR));
+	       " resetting...\n", ndev->name, (int)__raw_readl(ioaddr + EESR));
 
 	/* tx_errors count up */
 	mdp->stats.tx_errors++;
@@ -1196,8 +1196,8 @@ static int sh_eth_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 
 	mdp->cur_tx++;
 
-	if (!(ctrl_inl(ndev->base_addr + EDTRR) & EDTRR_TRNS))
-		ctrl_outl(EDTRR_TRNS, ndev->base_addr + EDTRR);
+	if (!(__raw_readl(ndev->base_addr + EDTRR) & EDTRR_TRNS))
+		__raw_writel(EDTRR_TRNS, ndev->base_addr + EDTRR);
 
 	return NETDEV_TX_OK;
 }
@@ -1212,11 +1212,11 @@ static int sh_eth_close(struct net_device *ndev)
 	netif_stop_queue(ndev);
 
 	/* Disable interrupts by clearing the interrupt mask. */
-	ctrl_outl(0x0000, ioaddr + EESIPR);
+	__raw_writel(0x0000, ioaddr + EESIPR);
 
 	/* Stop the chip's Tx and Rx processes. */
-	ctrl_outl(0, ioaddr + EDTRR);
-	ctrl_outl(0, ioaddr + EDRRR);
+	__raw_writel(0, ioaddr + EDTRR);
+	__raw_writel(0, ioaddr + EDRRR);
 
 	/* PHY Disconnect */
 	if (mdp->phydev) {
@@ -1251,20 +1251,20 @@ static struct net_device_stats *sh_eth_get_stats(struct net_device *ndev)
 
 	pm_runtime_get_sync(&mdp->pdev->dev);
 
-	mdp->stats.tx_dropped += ctrl_inl(ioaddr + TROCR);
-	ctrl_outl(0, ioaddr + TROCR);	/* (write clear) */
-	mdp->stats.collisions += ctrl_inl(ioaddr + CDCR);
-	ctrl_outl(0, ioaddr + CDCR);	/* (write clear) */
-	mdp->stats.tx_carrier_errors += ctrl_inl(ioaddr + LCCR);
-	ctrl_outl(0, ioaddr + LCCR);	/* (write clear) */
+	mdp->stats.tx_dropped += __raw_readl(ioaddr + TROCR);
+	__raw_writel(0, ioaddr + TROCR);	/* (write clear) */
+	mdp->stats.collisions += __raw_readl(ioaddr + CDCR);
+	__raw_writel(0, ioaddr + CDCR);	/* (write clear) */
+	mdp->stats.tx_carrier_errors += __raw_readl(ioaddr + LCCR);
+	__raw_writel(0, ioaddr + LCCR);	/* (write clear) */
 #if defined(CONFIG_CPU_SUBTYPE_SH7763)
-	mdp->stats.tx_carrier_errors += ctrl_inl(ioaddr + CERCR);/* CERCR */
-	ctrl_outl(0, ioaddr + CERCR);	/* (write clear) */
-	mdp->stats.tx_carrier_errors += ctrl_inl(ioaddr + CEECR);/* CEECR */
-	ctrl_outl(0, ioaddr + CEECR);	/* (write clear) */
+	mdp->stats.tx_carrier_errors += __raw_readl(ioaddr + CERCR);/* CERCR */
+	__raw_writel(0, ioaddr + CERCR);	/* (write clear) */
+	mdp->stats.tx_carrier_errors += __raw_readl(ioaddr + CEECR);/* CEECR */
+	__raw_writel(0, ioaddr + CEECR);	/* (write clear) */
 #else
-	mdp->stats.tx_carrier_errors += ctrl_inl(ioaddr + CNDCR);
-	ctrl_outl(0, ioaddr + CNDCR);	/* (write clear) */
+	mdp->stats.tx_carrier_errors += __raw_readl(ioaddr + CNDCR);
+	__raw_writel(0, ioaddr + CNDCR);	/* (write clear) */
 #endif
 	pm_runtime_put_sync(&mdp->pdev->dev);
 
@@ -1295,11 +1295,11 @@ static void sh_eth_set_multicast_list(struct net_device *ndev)
 
 	if (ndev->flags & IFF_PROMISC) {
 		/* Set promiscuous. */
-		ctrl_outl((ctrl_inl(ioaddr + ECMR) & ~ECMR_MCT) | ECMR_PRM,
+		__raw_writel((__raw_readl(ioaddr + ECMR) & ~ECMR_MCT) | ECMR_PRM,
 			  ioaddr + ECMR);
 	} else {
 		/* Normal, unicast/broadcast-only mode. */
-		ctrl_outl((ctrl_inl(ioaddr + ECMR) & ~ECMR_PRM) | ECMR_MCT,
+		__raw_writel((__raw_readl(ioaddr + ECMR) & ~ECMR_PRM) | ECMR_MCT,
 			  ioaddr + ECMR);
 	}
 }
@@ -1307,30 +1307,30 @@ static void sh_eth_set_multicast_list(struct net_device *ndev)
 /* SuperH's TSU register init function */
 static void sh_eth_tsu_init(u32 ioaddr)
 {
-	ctrl_outl(0, ioaddr + TSU_FWEN0);	/* Disable forward(0->1) */
-	ctrl_outl(0, ioaddr + TSU_FWEN1);	/* Disable forward(1->0) */
-	ctrl_outl(0, ioaddr + TSU_FCM);	/* forward fifo 3k-3k */
-	ctrl_outl(0xc, ioaddr + TSU_BSYSL0);
-	ctrl_outl(0xc, ioaddr + TSU_BSYSL1);
-	ctrl_outl(0, ioaddr + TSU_PRISL0);
-	ctrl_outl(0, ioaddr + TSU_PRISL1);
-	ctrl_outl(0, ioaddr + TSU_FWSL0);
-	ctrl_outl(0, ioaddr + TSU_FWSL1);
-	ctrl_outl(TSU_FWSLC_POSTENU | TSU_FWSLC_POSTENL, ioaddr + TSU_FWSLC);
+	__raw_writel(0, ioaddr + TSU_FWEN0);	/* Disable forward(0->1) */
+	__raw_writel(0, ioaddr + TSU_FWEN1);	/* Disable forward(1->0) */
+	__raw_writel(0, ioaddr + TSU_FCM);	/* forward fifo 3k-3k */
+	__raw_writel(0xc, ioaddr + TSU_BSYSL0);
+	__raw_writel(0xc, ioaddr + TSU_BSYSL1);
+	__raw_writel(0, ioaddr + TSU_PRISL0);
+	__raw_writel(0, ioaddr + TSU_PRISL1);
+	__raw_writel(0, ioaddr + TSU_FWSL0);
+	__raw_writel(0, ioaddr + TSU_FWSL1);
+	__raw_writel(TSU_FWSLC_POSTENU | TSU_FWSLC_POSTENL, ioaddr + TSU_FWSLC);
 #if defined(CONFIG_CPU_SUBTYPE_SH7763)
-	ctrl_outl(0, ioaddr + TSU_QTAG0);	/* Disable QTAG(0->1) */
-	ctrl_outl(0, ioaddr + TSU_QTAG1);	/* Disable QTAG(1->0) */
+	__raw_writel(0, ioaddr + TSU_QTAG0);	/* Disable QTAG(0->1) */
+	__raw_writel(0, ioaddr + TSU_QTAG1);	/* Disable QTAG(1->0) */
 #else
-	ctrl_outl(0, ioaddr + TSU_QTAGM0);	/* Disable QTAG(0->1) */
-	ctrl_outl(0, ioaddr + TSU_QTAGM1);	/* Disable QTAG(1->0) */
+	__raw_writel(0, ioaddr + TSU_QTAGM0);	/* Disable QTAG(0->1) */
+	__raw_writel(0, ioaddr + TSU_QTAGM1);	/* Disable QTAG(1->0) */
 #endif
-	ctrl_outl(0, ioaddr + TSU_FWSR);	/* all interrupt status clear */
-	ctrl_outl(0, ioaddr + TSU_FWINMK);	/* Disable all interrupt */
-	ctrl_outl(0, ioaddr + TSU_TEN);	/* Disable all CAM entry */
-	ctrl_outl(0, ioaddr + TSU_POST1);	/* Disable CAM entry [ 0- 7] */
-	ctrl_outl(0, ioaddr + TSU_POST2);	/* Disable CAM entry [ 8-15] */
-	ctrl_outl(0, ioaddr + TSU_POST3);	/* Disable CAM entry [16-23] */
-	ctrl_outl(0, ioaddr + TSU_POST4);	/* Disable CAM entry [24-31] */
+	__raw_writel(0, ioaddr + TSU_FWSR);	/* all interrupt status clear */
+	__raw_writel(0, ioaddr + TSU_FWINMK);	/* Disable all interrupt */
+	__raw_writel(0, ioaddr + TSU_TEN);	/* Disable all CAM entry */
+	__raw_writel(0, ioaddr + TSU_POST1);	/* Disable CAM entry [ 0- 7] */
+	__raw_writel(0, ioaddr + TSU_POST2);	/* Disable CAM entry [ 8-15] */
+	__raw_writel(0, ioaddr + TSU_POST3);	/* Disable CAM entry [16-23] */
+	__raw_writel(0, ioaddr + TSU_POST4);	/* Disable CAM entry [24-31] */
 }
 #endif /* SH_ETH_HAS_TSU */
 
-- 
1.7.2.3


^ permalink raw reply related

* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: Dan Williams @ 2010-11-02  2:50 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: David Miller, brian.haley, shemminger, netdev
In-Reply-To: <AANLkTi=Wp83HtQHXG0YEBjmctQYDu9Y4j+O9==_URy_A@mail.gmail.com>

On Tue, 2010-10-26 at 11:02 -0700, Lorenzo Colitti wrote:
> On Tue, Oct 26, 2010 at 10:55 AM, David Miller <davem@davemloft.net> wrote:
> >> Are there? For wifi you could look at the SSID (though even that is no
> >
> > I'm saying to check the AP's MAC.
> 
> Ah yes, that works better. But for wired? Or do you think it should be
> fixed only for wifi?

Only looking at the SSID tells you if you're on a different network.
Just looking at the AP MAC is like plugging into a different port on the
same switch: it's still the same network.

Dan



^ permalink raw reply

* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: Dan Williams @ 2010-11-02  2:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Lorenzo Colitti, netdev
In-Reply-To: <20101026082832.4c5af2ef@nehalam>

On Tue, 2010-10-26 at 08:28 -0700, Stephen Hemminger wrote:
> On Mon, 25 Oct 2010 22:44:03 -0700
> Lorenzo Colitti <lorenzo@google.com> wrote:
> 
> > On Mon, Oct 25, 2010 at 9:38 PM, Stephen Hemminger
> > <shemminger@vyatta.com> wrote:
> > > This is incorrect. When link is lost, routes and address should not be
> > > flushed. They should be marked as tentative and then go through DAD again
> > > on the new network.
> > 
> > That won't help the case I am trying to fix, which is the case where
> > the new link has a global prefix different than the old link. Marking
> > the addresses as tentative will simply make them pass DAD and come
> > back as soon as link comes back. But since they don't match the prefix
> > that is assigned to the new link, they are unusable, because packets
> > can't be routed back to them.
> 
> For IPv4 this is already handled by network manager.
> Why couldn't the same apply to IPv6?

For this sort of thing, I tend to think that only higher-level network
management daemons like NM have the *possibility* to get this sort of
thing right, because only they have overall knowledge of the networking
situation, like whether you've actually connected to a different network
or not.  Specifically in the case of wifi, only the connection manager
(or wpa_supplicant) knows whether you're on a different network or  not,
anything else is a layering violation.

Remember that carrier loss does *not* indicate that you've switched
networks, both for wired and especially for wifi.  The only way you know
that you've really switched is when you've reconnected (or timed out the
reconnect) and inspected the new network situation.  It seems only
appropriate that the policy gets applied from higher levels based on a
more nuanced view of network state.

Dan

> 
> > > If you do it this way, you break routing protocols when link is brought
> > > down and back up.
> > 
> > The only addresses and routes flushed in this way should be ones that
> > aren't manually configured, i.e., the ones created by autoconf
> > (addrconf.c:2720 onwards). These won't be used by routing protocols,
> > except for link-local addresses. So I assume you're talking about
> > link-local here.
> 
> Not sure, let me do test it.
> 
> > Link-local addresses are immediately recreated in a tentative state as
> > soon as link comes back, because on NETDEV_UP addrconf_notify calls
> > addrconf_dev_config. So, this patch only makes it so that they become
> > tentative when link goes away and comes back. In that time, the router
> > that temporarily loses link is unable to send packets for the brief
> > period of time that the link is performing DAD, but if the router has
> > lost link, it will also fail to send the packet while link is lost.
> > What's the additional failure scenario? Will it help if I make it so
> > that link-local addresses aren't touched at all?
> 
> Link-local works fine.
> 



^ permalink raw reply

* Re: [PATCH 05/11] vxge: add support for ethtool firmware flashing
From: Jon Mason @ 2010-11-02  4:04 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, netdev@vger.kernel.org, Sivakumar Subramani,
	Sreenivasa Honnur, Ramkrishna Vepa
In-Reply-To: <1288644687.2231.46.camel@achroite.uk.solarflarecom.com>

On Mon, Nov 01, 2010 at 01:51:27PM -0700, Ben Hutchings wrote:
> On Thu, 2010-10-28 at 21:19 -0500, Jon Mason wrote:
> > Add the ability in the vxge driver to flash firmware via ethtool.
> [...]
> > +enum vxge_hw_status
> > +vxge_update_fw_image(struct __vxge_hw_device *hldev, const u8 *fwdata, int size)
> > +{
> > +	u64 data0 = 0, data1 = 0, steer_ctrl = 0;
> > +	struct __vxge_hw_virtualpath *vpath;
> > +	enum vxge_hw_status status;
> > +	int ret_code, sec_code, i;
> > +
> > +	vpath = &hldev->virtual_paths[hldev->first_vp_id];
> > +
> > +	/* send upgrade start command */
> > +	status = vxge_hw_vpath_fw_api(vpath,
> > +				      VXGE_HW_FW_UPGRADE_ACTION,
> > +				      VXGE_HW_FW_UPGRADE_MEMO,
> > +				      VXGE_HW_FW_UPGRADE_OFFSET_START,
> > +				      &data0, &data1, &steer_ctrl);
> > +	if (status != VXGE_HW_OK) {
> > +		vxge_debug_init(VXGE_ERR, " %s: Upgrade start cmd failed",
> > +				__func__);
> > +		return status;
> > +	}
> > +
> > +	/* Transfer fw image to adapter 16 bytes at a time */
> > +	for (; size > 0; size -= VXGE_HW_FW_UPGRADE_BLK_SIZE) {
> > +		data0 = data1 = steer_ctrl = 0;
> > +
> > +		/* send 16 bytes at a time */
> > +		for (i = 0; i < VXGE_HW_BYTES_PER_U64; i++) {
> > +			data0 |= (u64)fwdata[i] << (i * VXGE_HW_BYTES_PER_U64);
> 
> You apparently mean to multiply i by the number of bits in a byte here.
> This happens to be equal to VXGE_HW_BYTES_PER_U64 but it still doesn't
> make sense to use that name for it.

This is transfering the firmware image 128bits at a time.  If the name
is confusing, would a "sizeof(u64)" make more sense?

> 
> [...]
> > diff --git a/drivers/net/vxge/vxge-ethtool.c b/drivers/net/vxge/vxge-ethtool.c
> > index 6194fd9..c5ab375 100644
> > --- a/drivers/net/vxge/vxge-ethtool.c
> > +++ b/drivers/net/vxge/vxge-ethtool.c
> > @@ -11,7 +11,7 @@
> >   *                 Virtualized Server Adapter.
> >   * Copyright(c) 2002-2010 Exar Corp.
> >   ******************************************************************************/
> > -#include<linux/ethtool.h>
> > +#include <linux/ethtool.h>
> >  #include <linux/slab.h>
> >  #include <linux/pci.h>
> >  #include <linux/etherdevice.h>
> > @@ -29,7 +29,6 @@
> >   * Return value:
> >   * 0 on success.
> >   */
> > -
> >  static int vxge_ethtool_sset(struct net_device *dev, struct ethtool_cmd *info)
> >  {
> >  	/* We currently only support 10Gb/FULL */
> > @@ -148,8 +147,7 @@ static void vxge_ethtool_gregs(struct net_device *dev,
> >  static int vxge_ethtool_idnic(struct net_device *dev, u32 data)
> >  {
> >  	struct vxgedev *vdev = (struct vxgedev *)netdev_priv(dev);
> > -	struct __vxge_hw_device  *hldev = (struct __vxge_hw_device  *)
> > -			pci_get_drvdata(vdev->pdev);
> > +	struct __vxge_hw_device *hldev = vdev->devh;
> >  
> >  	vxge_hw_device_flick_link_led(hldev, VXGE_FLICKER_ON);
> >  	msleep_interruptible(data ? (data * HZ) : VXGE_MAX_FLICKER_TIME);
> > @@ -168,11 +166,10 @@ static int vxge_ethtool_idnic(struct net_device *dev, u32 data)
> >   *  void
> >   */
> >  static void vxge_ethtool_getpause_data(struct net_device *dev,
> > -					struct ethtool_pauseparam *ep)
> > +				       struct ethtool_pauseparam *ep)
> >  {
> >  	struct vxgedev *vdev = (struct vxgedev *)netdev_priv(dev);
> > -	struct __vxge_hw_device  *hldev = (struct __vxge_hw_device  *)
> > -			pci_get_drvdata(vdev->pdev);
> > +	struct __vxge_hw_device *hldev = vdev->devh;
> >  
> >  	vxge_hw_device_getpause_data(hldev, 0, &ep->tx_pause, &ep->rx_pause);
> >  }
> > @@ -188,11 +185,10 @@ static void vxge_ethtool_getpause_data(struct net_device *dev,
> >   * int, returns 0 on Success
> >   */
> >  static int vxge_ethtool_setpause_data(struct net_device *dev,
> > -					struct ethtool_pauseparam *ep)
> > +				      struct ethtool_pauseparam *ep)
> >  {
> >  	struct vxgedev *vdev = (struct vxgedev *)netdev_priv(dev);
> > -	struct __vxge_hw_device  *hldev = (struct __vxge_hw_device  *)
> > -			pci_get_drvdata(vdev->pdev);
> > +	struct __vxge_hw_device *hldev = vdev->devh;
> >  
> >  	vxge_hw_device_setpause_data(hldev, 0, ep->tx_pause, ep->rx_pause);
> >  
> > @@ -211,7 +207,7 @@ static void vxge_get_ethtool_stats(struct net_device *dev,
> >  	struct vxge_vpath *vpath = NULL;
> >  
> >  	struct vxgedev *vdev = (struct vxgedev *)netdev_priv(dev);
> > -	struct __vxge_hw_device  *hldev = vdev->devh;
> > +	struct __vxge_hw_device *hldev = vdev->devh;
> >  	struct vxge_hw_xmac_stats *xmac_stats;
> >  	struct vxge_hw_device_stats_sw_info *sw_stats;
> >  	struct vxge_hw_device_stats_hw_info *hw_stats;
> 
> These changes seem to be entirely unrelated to the patch description.

Yes, they should've been included in the cleanup patch.

> 
> > @@ -1153,6 +1149,25 @@ static int vxge_set_flags(struct net_device *dev, u32 data)
> >  	return 0;
> >  }
> >  
> > +static int vxge_fw_flash(struct net_device *dev, struct ethtool_flash *parms)
> > +{
> > +	struct vxgedev *vdev = (struct vxgedev *)netdev_priv(dev);
> > +
> > +	if (vdev->max_vpath_supported != VXGE_HW_MAX_VIRTUAL_PATHS) {
> > +		printk(KERN_INFO "Single Function Mode is required to flash the"
> > +		       " firmware\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (netif_running(dev)) {
> > +		printk(KERN_INFO "Interface %s must be down to flash the "
> > +		       "firmware\n", dev->name);
> > +		return -EINVAL;
> > +	}
> > +
> > +	return vxge_fw_upgrade(vdev, parms->data, 1);
> > +}
> 
> I think EBUSY is a more appropriate error code.  There is nothing wrong
> with the arguments, only the current state of the device.

Fair enough

> 
> [...]
> > diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
> > index d977a63..52a48e7 100644
> > --- a/drivers/net/vxge/vxge-main.c
> > +++ b/drivers/net/vxge/vxge-main.c
> [...]
> > @@ -3277,30 +3279,23 @@ vxge_device_unregister(struct __vxge_hw_device *hldev)
> >  	struct vxgedev *vdev;
> >  	struct net_device *dev;
> >  	char buf[IFNAMSIZ];
> > -#if ((VXGE_DEBUG_INIT & VXGE_DEBUG_MASK) || \
> > -	(VXGE_DEBUG_ENTRYEXIT & VXGE_DEBUG_MASK))
> > -	u32 level_trace;
> > -#endif
> >  
> >  	dev = hldev->ndev;
> >  	vdev = netdev_priv(dev);
> > -#if ((VXGE_DEBUG_INIT & VXGE_DEBUG_MASK) || \
> > -	(VXGE_DEBUG_ENTRYEXIT & VXGE_DEBUG_MASK))
> > -	level_trace = vdev->level_trace;
> > -#endif
> > -	vxge_debug_entryexit(level_trace,
> > -		"%s: %s:%d", vdev->ndev->name, __func__, __LINE__);
> > +	vxge_debug_entryexit(vdev->level_trace,	"%s: %s:%d", dev->name,
> > +			     __func__, __LINE__);
> >  
> > -	memcpy(buf, vdev->ndev->name, IFNAMSIZ);
> > +	memcpy(buf, dev->name, IFNAMSIZ);
> >  
> >  	/* in 2.6 will call stop() if device is up */
> >  	unregister_netdev(dev);
> >  
> >  	flush_scheduled_work();
> >  
> > -	vxge_debug_init(level_trace, "%s: ethernet device unregistered", buf);
> > -	vxge_debug_entryexit(level_trace,
> > -		"%s: %s:%d  Exiting...", buf, __func__, __LINE__);
> > +	vxge_debug_init(vdev->level_trace, "%s: ethernet device unregistered",
> > +			buf);
> > +	vxge_debug_entryexit(vdev->level_trace,	"%s: %s:%d  Exiting...", buf,
> > +			     __func__, __LINE__);
> >  }
> >  
> >  /*
> 
> More apparently unrelated changes.
> 
> > @@ -3924,6 +3919,142 @@ static inline u32 vxge_get_num_vfs(u64 function_mode)
> >  	return num_functions;
> >  }
> >  
> > +int vxge_fw_upgrade(struct vxgedev *vdev, char *fw_name, int override)
> > +{
> > +	struct __vxge_hw_device *hldev = vdev->devh;
> > +	u32 maj, min, bld, cmaj, cmin, cbld;
> > +	enum vxge_hw_status status;
> > +	const struct firmware *fw;
> > +	int ret;
> > +
> > +	ret = request_firmware(&fw, fw_name, &vdev->pdev->dev);
> > +	if (ret) {
> > +		vxge_debug_init(VXGE_ERR, "%s: Firmware file '%s' not found",
> > +				VXGE_DRIVER_NAME, fw_name);
> > +		goto out;
> > +	}
> > +
> > +	/* Load the new firmware onto the adapter */
> > +	status = vxge_update_fw_image(hldev, fw->data, fw->size);
> > +	if (status != VXGE_HW_OK) {
> > +		vxge_debug_init(VXGE_ERR,
> > +				"%s: FW image download to adapter failed '%s'.",
> > +				VXGE_DRIVER_NAME, fw_name);
> > +		ret = -EFAULT;
> 
> Surely EIO or EINVAL.

EIO would make more sense in the case, yes.

> 
> > +		goto out;
> > +	}
> > +
> > +	/* Read the version of the new firmware */
> > +	status = vxge_hw_upgrade_read_version(hldev, &maj, &min, &bld);
> > +	if (status != VXGE_HW_OK) {
> > +		vxge_debug_init(VXGE_ERR,
> > +				"%s: Upgrade read version failed '%s'.",
> > +				VXGE_DRIVER_NAME, fw_name);
> > +		ret = -EFAULT;
> 
> EIO.
> 
> > +		goto out;
> > +	}
> > +
> > +	cmaj = vdev->config.device_hw_info.fw_version.major;
> > +	cmin = vdev->config.device_hw_info.fw_version.minor;
> > +	cbld = vdev->config.device_hw_info.fw_version.build;
> > +	/* It's possible the version in /lib/firmware is not the latest version.
> > +	 * If so, we could get into a loop of trying to upgrade to the latest
> > +	 * and flashing the older version.
> > +	 */
> > +	if (VXGE_FW_VER(maj, min, bld) == VXGE_FW_VER(cmaj, cmin, cbld) &&
> > +	    !override) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	printk(KERN_NOTICE "Upgrade to firmware version %d.%d.%d commencing\n",
> > +	       maj, min, bld);
> > +
> > +	/* Flash the adapter with the new firmware */
> > +	status = vxge_hw_flash_fw(hldev);
> > +	if (status != VXGE_HW_OK) {
> > +		vxge_debug_init(VXGE_ERR, "%s: Upgrade commit failed '%s'.",
> > +				VXGE_DRIVER_NAME, fw_name);
> > +		ret = -EFAULT;
> > +		goto out;
> > +	}
> > +
> > +	printk(KERN_NOTICE "Upgrade of firmware successful!  Adapter must be "
> > +	       "hard reset before using, thus requiring a system reboot or a "
> > +	       "hotplug event.\n");
> 
> Then what is the point of having the driver manage this?  And how can
> you be sure the user will even see this message?

The device is unusable at this point.  Open will fail and rmmod/insmod
will fail with hardware errors.  On a PCI-hotplug system, the driver
could most likely call those functions directly, but this is not
something that should be done and will not work on non-hotplug
systems.  I am open to suggestions for alternatives.

> 
> [...]
> > +static int vxge_probe_fw_update(struct vxgedev *vdev)
> > +{
> [...]
> > +	ret = vxge_fw_upgrade(vdev, fw_name, 0);
> > +	/* -EINVAL and -ENOENT are not fatal errors for flashing firmware on
> > +	 * probe, so ignore them
> > +	 */
> > +	if (ret != -EINVAL && ret != -ENOENT)
> > +		return -EFAULT;
> [..]
> 
> EIO.

Thanks for the eyes :)
Jon

> 
> Ben.
> 
> -- 
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
> 

^ permalink raw reply

* Re: bonding: flow control regression [was Re: bridging: flow control regression]
From: Eric Dumazet @ 2010-11-02  4:53 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, Jay Vosburgh, David S. Miller
In-Reply-To: <20101102020625.GA22724@verge.net.au>

Le mardi 02 novembre 2010 à 11:06 +0900, Simon Horman a écrit :

> Thanks for the explanation.
> I'm not entirely sure how much of a problem this is in practice.

Maybe for virtual devices (tunnels, bonding, ...), it would make sense
to delay the orphaning up to the real device.

But if the socket send buffer is very large, it would defeat the flow
control any way...

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox