Netdev List
 help / color / mirror / Atom feed
* [PATCH 09/12] sfc: Recycle discarded rx buffers back onto the queue
From: Ben Hutchings @ 2010-06-01 21:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1275426967.2114.25.camel@achroite.uk.solarflarecom.com>

From: Steve Hodgson <shodgson@solarflare.com>

The cut-through design of the receive path means that packets that
fail to match the appropriate MAC filter are not discarded at the MAC
but are flagged in the completion event as 'to be discarded'.  On
networks with heavy multicast traffic, this can account for a
significant proportion of received packets, so it is worthwhile to
recycle the buffer immediately in this case rather than freeing it
and then reallocating it shortly after.

The only complication here is dealing with a page shared
between two receive buffers. In that case, we need to be
careful to free the dma mapping when both buffers have
been free'd by the kernel. This means that we can only
recycle such a page if both receive buffers are discarded.
Unfortunately, in an environment with 1500mtu,
rx_alloc_method=PAGE, and a mixture of discarded and
not-discarded frames hitting the same receive queue,
buffer recycling won't always be possible.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/rx.c |   90 +++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/drivers/net/sfc/rx.c b/drivers/net/sfc/rx.c
index 615a1fc..dfebd73 100644
--- a/drivers/net/sfc/rx.c
+++ b/drivers/net/sfc/rx.c
@@ -82,9 +82,10 @@ static unsigned int rx_refill_limit = 95;
  * RX maximum head room required.
  *
  * This must be at least 1 to prevent overflow and at least 2 to allow
- * pipelined receives.
+ * pipelined receives. Then a further 1 because efx_recycle_rx_buffer()
+ * might insert two buffers.
  */
-#define EFX_RXD_HEAD_ROOM 2
+#define EFX_RXD_HEAD_ROOM 3
 
 static inline unsigned int efx_rx_buf_offset(struct efx_rx_buffer *buf)
 {
@@ -250,6 +251,70 @@ static void efx_fini_rx_buffer(struct efx_rx_queue *rx_queue,
 	efx_free_rx_buffer(rx_queue->efx, rx_buf);
 }
 
+/* Attempt to resurrect the other receive buffer that used to share this page,
+ * which had previously been passed up to the kernel and freed. */
+static void efx_resurrect_rx_buffer(struct efx_rx_queue *rx_queue,
+				    struct efx_rx_buffer *rx_buf)
+{
+	struct efx_rx_buffer *new_buf;
+	unsigned index;
+
+	/* We could have recycled the 1st half, then refilled
+	 * the queue, and now recycle the 2nd half.
+	 * EFX_RXD_HEAD_ROOM ensures that there is always room
+	 * to reinsert two buffers (once). */
+	get_page(rx_buf->page);
+
+	index = rx_queue->added_count & EFX_RXQ_MASK;
+	new_buf = efx_rx_buffer(rx_queue, index);
+	new_buf->dma_addr = rx_buf->dma_addr - (PAGE_SIZE >> 1);
+	new_buf->skb = NULL;
+	new_buf->page = rx_buf->page;
+	new_buf->data = rx_buf->data - (PAGE_SIZE >> 1);
+	new_buf->len = rx_buf->len;
+	++rx_queue->added_count;
+}
+
+/* Recycle the given rx buffer directly back into the rx_queue. There is
+ * always room to add this buffer, because we've just popped a buffer. */
+static void efx_recycle_rx_buffer(struct efx_channel *channel,
+				  struct efx_rx_buffer *rx_buf)
+{
+	struct efx_nic *efx = channel->efx;
+	struct efx_rx_queue *rx_queue = &efx->rx_queue[channel->channel];
+	struct efx_rx_buffer *new_buf;
+	unsigned index;
+
+	if (rx_buf->page != NULL && efx->rx_buffer_len < (PAGE_SIZE >> 1)) {
+		if (efx_rx_buf_offset(rx_buf) & (PAGE_SIZE >> 1)) {
+			/* This is the 2nd half of a page split between two
+			 * buffers, If page_count() is > 1 then the kernel
+			 * is holding onto the previous buffer */
+			if (page_count(rx_buf->page) != 1) {
+				efx_fini_rx_buffer(rx_queue, rx_buf);
+				return;
+			}
+
+			efx_resurrect_rx_buffer(rx_queue, rx_buf);
+		} else {
+			/* Free the 1st buffer's reference on the page. If the
+			 * 2nd buffer is also discarded, this buffer will be
+			 * resurrected above */
+			put_page(rx_buf->page);
+			rx_buf->page = NULL;
+			return;
+		}
+	}
+
+	index = rx_queue->added_count & EFX_RXQ_MASK;
+	new_buf = efx_rx_buffer(rx_queue, index);
+
+	memcpy(new_buf, rx_buf, sizeof(*new_buf));
+	rx_buf->page = NULL;
+	rx_buf->skb = NULL;
+	++rx_queue->added_count;
+}
+
 /**
  * efx_fast_push_rx_descriptors - push new RX descriptors quickly
  * @rx_queue:		RX descriptor queue
@@ -271,7 +336,7 @@ void efx_fast_push_rx_descriptors(struct efx_rx_queue *rx_queue)
 	fill_level = (rx_queue->added_count - rx_queue->removed_count);
 	EFX_BUG_ON_PARANOID(fill_level > EFX_RXQ_SIZE);
 	if (fill_level >= rx_queue->fast_fill_trigger)
-		return;
+		goto out;
 
 	/* Record minimum fill level */
 	if (unlikely(fill_level < rx_queue->min_fill)) {
@@ -281,7 +346,7 @@ void efx_fast_push_rx_descriptors(struct efx_rx_queue *rx_queue)
 
 	space = rx_queue->fast_fill_limit - fill_level;
 	if (space < EFX_RX_BATCH)
-		return;
+		goto out;
 
 	EFX_TRACE(rx_queue->efx, "RX queue %d fast-filling descriptor ring from"
 		  " level %d to level %d using %s allocation\n",
@@ -306,8 +371,8 @@ void efx_fast_push_rx_descriptors(struct efx_rx_queue *rx_queue)
 		  rx_queue->added_count - rx_queue->removed_count);
 
  out:
-	/* Send write pointer to card. */
-	efx_nic_notify_rx_desc(rx_queue);
+	if (rx_queue->notified_count != rx_queue->added_count)
+		efx_nic_notify_rx_desc(rx_queue);
 }
 
 void efx_rx_slow_fill(unsigned long context)
@@ -418,6 +483,7 @@ void efx_rx_packet(struct efx_rx_queue *rx_queue, unsigned int index,
 		   unsigned int len, bool checksummed, bool discard)
 {
 	struct efx_nic *efx = rx_queue->efx;
+	struct efx_channel *channel = rx_queue->channel;
 	struct efx_rx_buffer *rx_buf;
 	bool leak_packet = false;
 
@@ -445,12 +511,13 @@ void efx_rx_packet(struct efx_rx_queue *rx_queue, unsigned int index,
 	/* Discard packet, if instructed to do so */
 	if (unlikely(discard)) {
 		if (unlikely(leak_packet))
-			rx_queue->channel->n_skbuff_leaks++;
+			channel->n_skbuff_leaks++;
 		else
-			/* We haven't called efx_unmap_rx_buffer yet,
-			 * so fini the entire rx_buffer here */
-			efx_fini_rx_buffer(rx_queue, rx_buf);
-		return;
+			efx_recycle_rx_buffer(channel, rx_buf);
+
+		/* Don't hold off the previous receive */
+		rx_buf = NULL;
+		goto out;
 	}
 
 	/* Release card resources - assumes all RX buffers consumed in-order
@@ -467,6 +534,7 @@ void efx_rx_packet(struct efx_rx_queue *rx_queue, unsigned int index,
 	 * prefetched into cache.
 	 */
 	rx_buf->len = len;
+out:
 	if (rx_queue->channel->rx_pkt)
 		__efx_rx_packet(rx_queue->channel,
 				rx_queue->channel->rx_pkt,
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH 10/12] sfc: Allow shared pages to be recycled
From: Ben Hutchings @ 2010-06-01 21:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1275426967.2114.25.camel@achroite.uk.solarflarecom.com>

From: Steve Hodgson <shodgson@solarflare.com>

Insert a structure at the start of the shared page that
tracks the dma mapping refcnt. DMA into the next cache
line of the (shared) page (plus EFX_PAGE_IP_ALIGN).

When recycling a page, check the page refcnt. If the
page is otherwise unused, then resurrect the other
receive buffer that previously referenced the page.
Be careful not to overflow the receive ring, since we
can now resurrect n receive buffers in a row.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/efx.c        |    3 +-
 drivers/net/sfc/net_driver.h |   18 +++++++++
 drivers/net/sfc/rx.c         |   84 +++++++++++++++++++++---------------------
 3 files changed, 62 insertions(+), 43 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 5d9ef05..aae3347 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -469,7 +469,8 @@ static void efx_init_channels(struct efx_nic *efx)
 	efx->rx_buffer_len = (max(EFX_PAGE_IP_ALIGN, NET_IP_ALIGN) +
 			      EFX_MAX_FRAME_LEN(efx->net_dev->mtu) +
 			      efx->type->rx_buffer_padding);
-	efx->rx_buffer_order = get_order(efx->rx_buffer_len);
+	efx->rx_buffer_order = get_order(efx->rx_buffer_len +
+					 sizeof(struct efx_rx_page_state));
 
 	/* Initialise the channels */
 	efx_for_each_channel(channel, efx) {
diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index 59c8ecc..40c0d93 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -232,6 +232,24 @@ struct efx_rx_buffer {
 };
 
 /**
+ * struct efx_rx_page_state - Page-based rx buffer state
+ *
+ * Inserted at the start of every page allocated for receive buffers.
+ * Used to facilitate sharing dma mappings between recycled rx buffers
+ * and those passed up to the kernel.
+ *
+ * @refcnt: Number of struct efx_rx_buffer's referencing this page.
+ *	When refcnt falls to zero, the page is unmapped for dma
+ * @dma_addr: The dma address of this page.
+ */
+struct efx_rx_page_state {
+	unsigned refcnt;
+	dma_addr_t dma_addr;
+
+	unsigned int __pad[0] ____cacheline_aligned;
+};
+
+/**
  * struct efx_rx_queue - An Efx RX queue
  * @efx: The associated Efx NIC
  * @queue: DMA queue number
diff --git a/drivers/net/sfc/rx.c b/drivers/net/sfc/rx.c
index dfebd73..9fb698e 100644
--- a/drivers/net/sfc/rx.c
+++ b/drivers/net/sfc/rx.c
@@ -25,6 +25,9 @@
 /* Number of RX descriptors pushed at once. */
 #define EFX_RX_BATCH  8
 
+/* Maximum size of a buffer sharing a page */
+#define EFX_RX_HALF_PAGE ((PAGE_SIZE >> 1) - sizeof(struct efx_rx_page_state))
+
 /* Size of buffer allocated for skb header area. */
 #define EFX_SKB_HEADERS  64u
 
@@ -82,10 +85,9 @@ static unsigned int rx_refill_limit = 95;
  * RX maximum head room required.
  *
  * This must be at least 1 to prevent overflow and at least 2 to allow
- * pipelined receives. Then a further 1 because efx_recycle_rx_buffer()
- * might insert two buffers.
+ * pipelined receives.
  */
-#define EFX_RXD_HEAD_ROOM 3
+#define EFX_RXD_HEAD_ROOM 2
 
 static inline unsigned int efx_rx_buf_offset(struct efx_rx_buffer *buf)
 {
@@ -164,7 +166,8 @@ static int efx_init_rx_buffers_page(struct efx_rx_queue *rx_queue)
 	struct efx_nic *efx = rx_queue->efx;
 	struct efx_rx_buffer *rx_buf;
 	struct page *page;
-	char *page_addr;
+	void *page_addr;
+	struct efx_rx_page_state *state;
 	dma_addr_t dma_addr;
 	unsigned index, count;
 
@@ -183,22 +186,27 @@ static int efx_init_rx_buffers_page(struct efx_rx_queue *rx_queue)
 			__free_pages(page, efx->rx_buffer_order);
 			return -EIO;
 		}
-		EFX_BUG_ON_PARANOID(dma_addr & (PAGE_SIZE - 1));
-		page_addr = page_address(page) + EFX_PAGE_IP_ALIGN;
-		dma_addr += EFX_PAGE_IP_ALIGN;
+		page_addr = page_address(page);
+		state = page_addr;
+		state->refcnt = 0;
+		state->dma_addr = dma_addr;
+
+		page_addr += sizeof(struct efx_rx_page_state);
+		dma_addr += sizeof(struct efx_rx_page_state);
 
 	split:
 		index = rx_queue->added_count & EFX_RXQ_MASK;
 		rx_buf = efx_rx_buffer(rx_queue, index);
-		rx_buf->dma_addr = dma_addr;
+		rx_buf->dma_addr = dma_addr + EFX_PAGE_IP_ALIGN;
 		rx_buf->skb = NULL;
 		rx_buf->page = page;
-		rx_buf->data = page_addr;
+		rx_buf->data = page_addr + EFX_PAGE_IP_ALIGN;
 		rx_buf->len = efx->rx_buffer_len - EFX_PAGE_IP_ALIGN;
 		++rx_queue->added_count;
 		++rx_queue->alloc_page_count;
+		++state->refcnt;
 
-		if ((~count & 1) && (efx->rx_buffer_len < (PAGE_SIZE >> 1))) {
+		if ((~count & 1) && (efx->rx_buffer_len <= EFX_RX_HALF_PAGE)) {
 			/* Use the second half of the page */
 			get_page(page);
 			dma_addr += (PAGE_SIZE >> 1);
@@ -215,14 +223,14 @@ static void efx_unmap_rx_buffer(struct efx_nic *efx,
 				struct efx_rx_buffer *rx_buf)
 {
 	if (rx_buf->page) {
+		struct efx_rx_page_state *state;
+
 		EFX_BUG_ON_PARANOID(rx_buf->skb);
 
-		/* Unmap the buffer if there's only one buffer per page(s),
-		 * or this is the second half of a two buffer page. */
-		if (efx->rx_buffer_order != 0 ||
-		    (efx_rx_buf_offset(rx_buf) & (PAGE_SIZE >> 1)) != 0) {
+		state = page_address(rx_buf->page);
+		if (--state->refcnt == 0) {
 			pci_unmap_page(efx->pci_dev,
-				       rx_buf->dma_addr & ~(PAGE_SIZE - 1),
+				       state->dma_addr,
 				       efx_rx_buf_size(efx),
 				       PCI_DMA_FROMDEVICE);
 		}
@@ -256,21 +264,30 @@ static void efx_fini_rx_buffer(struct efx_rx_queue *rx_queue,
 static void efx_resurrect_rx_buffer(struct efx_rx_queue *rx_queue,
 				    struct efx_rx_buffer *rx_buf)
 {
+	struct efx_rx_page_state *state = page_address(rx_buf->page);
 	struct efx_rx_buffer *new_buf;
-	unsigned index;
+	unsigned fill_level, index;
+
+	/* +1 because efx_rx_packet() incremented removed_count. +1 because
+	 * we'd like to insert an additional descriptor whilst leaving
+	 * EFX_RXD_HEAD_ROOM for the non-recycle path */
+	fill_level = (rx_queue->added_count - rx_queue->removed_count + 2);
+	if (unlikely(fill_level >= EFX_RXQ_SIZE - EFX_RXD_HEAD_ROOM)) {
+		/* We could place "state" on a list, and drain the list in
+		 * efx_fast_push_rx_descriptors(). For now, this will do. */
+		return;
+	}
 
-	/* We could have recycled the 1st half, then refilled
-	 * the queue, and now recycle the 2nd half.
-	 * EFX_RXD_HEAD_ROOM ensures that there is always room
-	 * to reinsert two buffers (once). */
+	++state->refcnt;
 	get_page(rx_buf->page);
 
 	index = rx_queue->added_count & EFX_RXQ_MASK;
 	new_buf = efx_rx_buffer(rx_queue, index);
-	new_buf->dma_addr = rx_buf->dma_addr - (PAGE_SIZE >> 1);
+	new_buf->dma_addr = rx_buf->dma_addr ^ (PAGE_SIZE >> 1);
 	new_buf->skb = NULL;
 	new_buf->page = rx_buf->page;
-	new_buf->data = rx_buf->data - (PAGE_SIZE >> 1);
+	new_buf->data = (void *)
+		((__force unsigned long)rx_buf->data ^ (PAGE_SIZE >> 1));
 	new_buf->len = rx_buf->len;
 	++rx_queue->added_count;
 }
@@ -285,26 +302,9 @@ static void efx_recycle_rx_buffer(struct efx_channel *channel,
 	struct efx_rx_buffer *new_buf;
 	unsigned index;
 
-	if (rx_buf->page != NULL && efx->rx_buffer_len < (PAGE_SIZE >> 1)) {
-		if (efx_rx_buf_offset(rx_buf) & (PAGE_SIZE >> 1)) {
-			/* This is the 2nd half of a page split between two
-			 * buffers, If page_count() is > 1 then the kernel
-			 * is holding onto the previous buffer */
-			if (page_count(rx_buf->page) != 1) {
-				efx_fini_rx_buffer(rx_queue, rx_buf);
-				return;
-			}
-
-			efx_resurrect_rx_buffer(rx_queue, rx_buf);
-		} else {
-			/* Free the 1st buffer's reference on the page. If the
-			 * 2nd buffer is also discarded, this buffer will be
-			 * resurrected above */
-			put_page(rx_buf->page);
-			rx_buf->page = NULL;
-			return;
-		}
-	}
+	if (rx_buf->page != NULL && efx->rx_buffer_len <= EFX_RX_HALF_PAGE &&
+	    page_count(rx_buf->page) == 1)
+		efx_resurrect_rx_buffer(rx_queue, rx_buf);
 
 	index = rx_queue->added_count & EFX_RXQ_MASK;
 	new_buf = efx_rx_buffer(rx_queue, index);
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH 11/12] sfc: Only count bad packets in rx_errors
From: Ben Hutchings @ 2010-06-01 21:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1275426967.2114.25.camel@achroite.uk.solarflarecom.com>

rx_errors is defined as 'bad packets received', but we are currently
including various overflow errors as well.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/efx.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index aae3347..26b0cc2 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -1518,11 +1518,8 @@ static struct net_device_stats *efx_net_stats(struct net_device *net_dev)
 	stats->tx_window_errors = mac_stats->tx_late_collision;
 
 	stats->rx_errors = (stats->rx_length_errors +
-			    stats->rx_over_errors +
 			    stats->rx_crc_errors +
 			    stats->rx_frame_errors +
-			    stats->rx_fifo_errors +
-			    stats->rx_missed_errors +
 			    mac_stats->rx_symbol_error);
 	stats->tx_errors = (stats->tx_window_errors +
 			    mac_stats->tx_bad);
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-next-2.6 1/2] qlcnic: NIC Partitioning - Add basic infrastructure support
From: Anirban Chakraborty @ 2010-06-01 21:28 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Amit Salecha, Ameen Rahman

I have respun it. Please apply.

Following changes have been added to enable the adapter to work in
NIC partitioning mode where multiple PCI functions of an adapter port can
be configured to work as NIC functions. The first function that is enumerated on
the PCI bus assumes the role of management function which, besides being able
to do all the NIC functionality, can configure other NIC partitions. Other NIC
functions can be configured as privileged or non privileged functions.
Privileged function can not configure other NIC functions but can do all the
NIC functionality including any firmware initialization, chip reset etc. Non
privileged functions can do only basic IO. For chip reset etc, it depends on the
privilege or management function.

1. Added code to determine PCI function number independent of kernel API.
2. Added Driver - FW version 2.0 support.
3. Changed producer and consumer register offset calculation.
4. Added management and privileged operation modes for npar functions. A module
 parameter has been added to control it.
5. Added support for configuring the eswitch in the adapter.

thanks,
Anirban

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h         |  121 ++++++++-
 drivers/net/qlcnic/qlcnic_ctx.c     |  513 +++++++++++++++++++++++++++++++++--
 drivers/net/qlcnic/qlcnic_ethtool.c |   11 +-
 drivers/net/qlcnic/qlcnic_hdr.h     |   70 +++++
 drivers/net/qlcnic/qlcnic_hw.c      |   14 +-
 drivers/net/qlcnic/qlcnic_init.c    |   40 ++-
 drivers/net/qlcnic/qlcnic_main.c    |  195 ++++++++++++--
 7 files changed, 890 insertions(+), 74 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 896d40d..31a0b43 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -197,8 +197,7 @@ struct cmd_desc_type0 {
 
 	__le64 addr_buffer4;
 
-	__le32 reserved2;
-	__le16 reserved;
+	u8 eth_addr[ETH_ALEN];
 	__le16 vlan_TCI;
 
 } __attribute__ ((aligned(64)));
@@ -315,6 +314,8 @@ struct uni_data_desc{
 #define QLCNIC_BRDTYPE_P3_10G_XFP	0x0032
 #define QLCNIC_BRDTYPE_P3_10G_TP	0x0080
 
+#define QLCNIC_MSIX_TABLE_OFFSET	0x44
+
 /* Flash memory map */
 #define QLCNIC_BRDCFG_START	0x4000		/* board config */
 #define QLCNIC_BOOTLD_START	0x10000		/* bootld */
@@ -542,7 +543,17 @@ struct qlcnic_recv_context {
 #define QLCNIC_CDRP_CMD_READ_PEXQ_PARAMETERS	0x0000001c
 #define QLCNIC_CDRP_CMD_GET_LIC_CAPABILITIES	0x0000001d
 #define QLCNIC_CDRP_CMD_READ_MAX_LRO_PER_BOARD	0x0000001e
-#define QLCNIC_CDRP_CMD_MAX			0x0000001f
+#define QLCNIC_CDRP_CMD_MAC_ADDRESS		0x0000001f
+
+#define QLCNIC_CDRP_CMD_GET_PCI_INFO		0x00000020
+#define QLCNIC_CDRP_CMD_GET_NIC_INFO		0x00000021
+#define QLCNIC_CDRP_CMD_SET_NIC_INFO		0x00000022
+#define QLCNIC_CDRP_CMD_RESET_NPAR		0x00000023
+#define QLCNIC_CDRP_CMD_GET_ESWITCH_CAPABILITY	0x00000024
+#define QLCNIC_CDRP_CMD_TOGGLE_ESWITCH		0x00000025
+#define QLCNIC_CDRP_CMD_GET_ESWITCH_STATUS	0x00000026
+#define QLCNIC_CDRP_CMD_SET_PORTMIRRORING	0x00000027
+#define QLCNIC_CDRP_CMD_CONFIGURE_ESWITCH	0x00000028
 
 #define QLCNIC_RCODE_SUCCESS		0
 #define QLCNIC_RCODE_TIMEOUT		17
@@ -560,7 +571,6 @@ struct qlcnic_recv_context {
 /*
  * Context state
  */
-#define QLCHAL_VERSION	1
 
 #define QLCNIC_HOST_CTX_STATE_ACTIVE	2
 
@@ -887,6 +897,7 @@ struct qlcnic_mac_req {
 #define MSIX_ENTRIES_PER_ADAPTER	NUM_STS_DESC_RINGS
 #define QLCNIC_MSIX_TBL_SPACE		8192
 #define QLCNIC_PCI_REG_MSIX_TBL 	0x44
+#define QLCNIC_MSIX_TBL_PGSIZE		4096
 
 #define QLCNIC_NETDEV_WEIGHT	128
 #define QLCNIC_ADAPTER_UP_MAGIC 777
@@ -923,7 +934,6 @@ struct qlcnic_adapter {
 	u8 mc_enabled;
 	u8 max_mc_count;
 	u8 rss_supported;
-	u8 rsrvd1;
 	u8 fw_wait_cnt;
 	u8 fw_fail_cnt;
 	u8 tx_timeo_cnt;
@@ -940,6 +950,15 @@ struct qlcnic_adapter {
 	u16 link_autoneg;
 	u16 module_type;
 
+	u16 op_mode;
+	u16 switch_mode;
+	u16 max_tx_ques;
+	u16 max_rx_ques;
+	u16 min_tx_bw;
+	u16 max_tx_bw;
+	u16 max_mtu;
+
+	u32 fw_hal_version;
 	u32 capabilities;
 	u32 flags;
 	u32 irq;
@@ -948,18 +967,22 @@ struct qlcnic_adapter {
 	u32 int_vec_bit;
 	u32 heartbit;
 
+	u8 max_mac_filters;
 	u8 dev_state;
 	u8 diag_test;
 	u8 diag_cnt;
 	u8 reset_ack_timeo;
 	u8 dev_init_timeo;
-	u8 rsrd1;
 	u16 msg_enable;
 
 	u8 mac_addr[ETH_ALEN];
 
 	u64 dev_rst_time;
 
+	struct qlcnic_pci_info *npars;
+	struct qlcnic_eswitch *eswitch;
+	struct qlcnic_nic_template *nic_ops;
+
 	struct qlcnic_adapter_stats stats;
 
 	struct qlcnic_recv_context recv_ctx;
@@ -984,6 +1007,53 @@ struct qlcnic_adapter {
 	const struct firmware *fw;
 };
 
+struct qlcnic_info {
+	__le16	pci_func;
+	__le16	op_mode; /* 1 = Priv, 2 = NP, 3 = NP passthru */
+	__le16	phys_port;
+	__le16	switch_mode; /* 0 = disabled, 1 = int, 2 = ext */
+
+	__le32	capabilities;
+	u8	max_mac_filters;
+	u8	reserved1;
+	__le16	max_mtu;
+
+	__le16	max_tx_ques;
+	__le16	max_rx_ques;
+	__le16	min_tx_bw;
+	__le16	max_tx_bw;
+	u8	reserved2[104];
+};
+
+struct qlcnic_pci_info {
+	__le16	id; /* pci function id */
+	__le16	active; /* 1 = Enabled */
+	__le16	type; /* 1 = NIC, 2 = FCoE, 3 = iSCSI */
+	__le16	default_port; /* default port number */
+
+	__le16	tx_min_bw; /* Multiple of 100mbpc */
+	__le16	tx_max_bw;
+	__le16	reserved1[2];
+
+	u8	mac[ETH_ALEN];
+	u8	reserved2[106];
+};
+
+struct qlcnic_eswitch {
+	u8	port;
+	u8	active_vports;
+	u8	active_vlans;
+	u8	active_ucast_filters;
+	u8	max_ucast_filters;
+	u8	max_active_vlans;
+
+	u32	flags;
+#define QLCNIC_SWITCH_ENABLE		BIT_1
+#define QLCNIC_SWITCH_VLAN_FILTERING	BIT_2
+#define QLCNIC_SWITCH_PROMISC_MODE	BIT_3
+#define QLCNIC_SWITCH_PORT_MIRRORING	BIT_4
+};
+
 int qlcnic_fw_cmd_query_phy(struct qlcnic_adapter *adapter, u32 reg, u32 *val);
 int qlcnic_fw_cmd_set_phy(struct qlcnic_adapter *adapter, u32 reg, u32 val);
 
@@ -1070,13 +1140,14 @@ void qlcnic_advert_link_change(struct qlcnic_adapter *adapter, int linkup);
 int qlcnic_fw_cmd_set_mtu(struct qlcnic_adapter *adapter, int mtu);
 int qlcnic_change_mtu(struct net_device *netdev, int new_mtu);
 int qlcnic_config_hw_lro(struct qlcnic_adapter *adapter, int enable);
-int qlcnic_config_bridged_mode(struct qlcnic_adapter *adapter, int enable);
+int qlcnic_config_bridged_mode(struct qlcnic_adapter *adapter, u32 enable);
 int qlcnic_send_lro_cleanup(struct qlcnic_adapter *adapter);
 void qlcnic_update_cmd_producer(struct qlcnic_adapter *adapter,
 		struct qlcnic_host_tx_ring *tx_ring);
-int qlcnic_get_mac_addr(struct qlcnic_adapter *adapter, u64 *mac);
+int qlcnic_get_mac_addr(struct qlcnic_adapter *adapter, u8 *mac);
 void qlcnic_clear_ilb_mode(struct qlcnic_adapter *adapter);
 int qlcnic_set_ilb_mode(struct qlcnic_adapter *adapter);
+void qlcnic_fetch_mac(struct qlcnic_adapter *, u32, u32, u8, u8 *);
 
 /* Functions from qlcnic_main.c */
 int qlcnic_reset_context(struct qlcnic_adapter *);
@@ -1088,6 +1159,32 @@ int qlcnic_check_loopback_buff(unsigned char *data);
 netdev_tx_t qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
 void qlcnic_process_rcv_ring_diag(struct qlcnic_host_sds_ring *sds_ring);
 
+/* Functions from qlcnic_vf.c */
+int qlcnicvf_config_bridged_mode(struct qlcnic_adapter *, u32);
+int qlcnicvf_config_led(struct qlcnic_adapter *, u32, u32);
+int qlcnicvf_set_ilb_mode(struct qlcnic_adapter *adapter);
+void qlcnicvf_clear_ilb_mode(struct qlcnic_adapter *adapter);
+void qlcnicvf_set_port_mode(struct qlcnic_adapter *adapter);
+
+/* Management functions */
+int qlcnic_set_mac_address(struct qlcnic_adapter *, u8*);
+int qlcnic_get_mac_address(struct qlcnic_adapter *, u8*);
+int qlcnic_get_nic_info(struct qlcnic_adapter *, u8);
+int qlcnic_set_nic_info(struct qlcnic_adapter *, struct qlcnic_info *);
+int qlcnic_get_pci_info(struct qlcnic_adapter *);
+int qlcnic_reset_partition(struct qlcnic_adapter *, u8);
+
+/*  eSwitch management functions */
+int qlcnic_get_eswitch_capabilities(struct qlcnic_adapter *, u8,
+				struct qlcnic_eswitch *);
+int qlcnic_get_eswitch_status(struct qlcnic_adapter *, u8,
+				struct qlcnic_eswitch *);
+int qlcnic_toggle_eswitch(struct qlcnic_adapter *, u8, u8);
+int qlcnic_config_switch_port(struct qlcnic_adapter *, u8, int, u8, u8,
+			u8, u8, u16);
+int qlcnic_config_port_mirroring(struct qlcnic_adapter *, u8, u8, u8);
+extern int qlcnic_config_tso;
+
 /*
  * QLOGIC Board information
  */
@@ -1131,6 +1228,14 @@ static inline u32 qlcnic_tx_avail(struct qlcnic_host_tx_ring *tx_ring)
 
 extern const struct ethtool_ops qlcnic_ethtool_ops;
 
+struct qlcnic_nic_template {
+	int (*get_mac_addr) (struct qlcnic_adapter *, u8*);
+	int (*config_bridged_mode) (struct qlcnic_adapter *, u32);
+	int (*config_led) (struct qlcnic_adapter *, u32, u32);
+	int (*set_ilb_mode) (struct qlcnic_adapter *);
+	void (*clear_ilb_mode) (struct qlcnic_adapter *);
+};
+
 #define QLCDB(adapter, lvl, _fmt, _args...) do {	\
 	if (NETIF_MSG_##lvl & adapter->msg_enable)	\
 		printk(KERN_INFO "%s: %s: " _fmt,	\
diff --git a/drivers/net/qlcnic/qlcnic_ctx.c b/drivers/net/qlcnic/qlcnic_ctx.c
index c2c1f5c..1e1dc58 100644
--- a/drivers/net/qlcnic/qlcnic_ctx.c
+++ b/drivers/net/qlcnic/qlcnic_ctx.c
@@ -88,12 +88,12 @@ qlcnic_fw_cmd_set_mtu(struct qlcnic_adapter *adapter, int mtu)
 
 	if (recv_ctx->state == QLCNIC_HOST_CTX_STATE_ACTIVE) {
 		if (qlcnic_issue_cmd(adapter,
-				adapter->ahw.pci_func,
-				QLCHAL_VERSION,
-				recv_ctx->context_id,
-				mtu,
-				0,
-				QLCNIC_CDRP_CMD_SET_MTU)) {
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			recv_ctx->context_id,
+			mtu,
+			0,
+			QLCNIC_CDRP_CMD_SET_MTU)) {
 
 			dev_err(&adapter->pdev->dev, "Failed to set mtu\n");
 			return -EIO;
@@ -121,7 +121,7 @@ qlcnic_fw_cmd_create_rx_ctx(struct qlcnic_adapter *adapter)
 
 	int i, nrds_rings, nsds_rings;
 	size_t rq_size, rsp_size;
-	u32 cap, reg, val;
+	u32 cap, reg, val, reg2;
 	int err;
 
 	struct qlcnic_recv_context *recv_ctx = &adapter->recv_ctx;
@@ -197,7 +197,7 @@ qlcnic_fw_cmd_create_rx_ctx(struct qlcnic_adapter *adapter)
 	phys_addr = hostrq_phys_addr;
 	err = qlcnic_issue_cmd(adapter,
 			adapter->ahw.pci_func,
-			QLCHAL_VERSION,
+			adapter->fw_hal_version,
 			(u32)(phys_addr >> 32),
 			(u32)(phys_addr & 0xffffffff),
 			rq_size,
@@ -216,8 +216,12 @@ qlcnic_fw_cmd_create_rx_ctx(struct qlcnic_adapter *adapter)
 		rds_ring = &recv_ctx->rds_rings[i];
 
 		reg = le32_to_cpu(prsp_rds[i].host_producer_crb);
-		rds_ring->crb_rcv_producer = qlcnic_get_ioaddr(adapter,
+		if (adapter->fw_hal_version == QLCNIC_FW_BASE)
+			rds_ring->crb_rcv_producer = qlcnic_get_ioaddr(adapter,
 				QLCNIC_REG(reg - 0x200));
+		else
+			rds_ring->crb_rcv_producer = adapter->ahw.pci_base0 +
+				reg;
 	}
 
 	prsp_sds = ((struct qlcnic_cardrsp_sds_ring *)
@@ -227,12 +231,18 @@ qlcnic_fw_cmd_create_rx_ctx(struct qlcnic_adapter *adapter)
 		sds_ring = &recv_ctx->sds_rings[i];
 
 		reg = le32_to_cpu(prsp_sds[i].host_consumer_crb);
-		sds_ring->crb_sts_consumer = qlcnic_get_ioaddr(adapter,
-				QLCNIC_REG(reg - 0x200));
+		reg2 = le32_to_cpu(prsp_sds[i].interrupt_crb);
 
-		reg = le32_to_cpu(prsp_sds[i].interrupt_crb);
-		sds_ring->crb_intr_mask = qlcnic_get_ioaddr(adapter,
+		if (adapter->fw_hal_version == QLCNIC_FW_BASE) {
+			sds_ring->crb_sts_consumer = qlcnic_get_ioaddr(adapter,
 				QLCNIC_REG(reg - 0x200));
+			sds_ring->crb_intr_mask = qlcnic_get_ioaddr(adapter,
+				QLCNIC_REG(reg2 - 0x200));
+		} else {
+			sds_ring->crb_sts_consumer = adapter->ahw.pci_base0 +
+				reg;
+			sds_ring->crb_intr_mask = adapter->ahw.pci_base0 + reg2;
+		}
 	}
 
 	recv_ctx->state = le32_to_cpu(prsp->host_ctx_state);
@@ -253,7 +263,7 @@ qlcnic_fw_cmd_destroy_rx_ctx(struct qlcnic_adapter *adapter)
 
 	if (qlcnic_issue_cmd(adapter,
 			adapter->ahw.pci_func,
-			QLCHAL_VERSION,
+			adapter->fw_hal_version,
 			recv_ctx->context_id,
 			QLCNIC_DESTROY_CTX_RESET,
 			0,
@@ -319,7 +329,7 @@ qlcnic_fw_cmd_create_tx_ctx(struct qlcnic_adapter *adapter)
 	phys_addr = rq_phys_addr;
 	err = qlcnic_issue_cmd(adapter,
 			adapter->ahw.pci_func,
-			QLCHAL_VERSION,
+			adapter->fw_hal_version,
 			(u32)(phys_addr >> 32),
 			((u32)phys_addr & 0xffffffff),
 			rq_size,
@@ -327,8 +337,12 @@ qlcnic_fw_cmd_create_tx_ctx(struct qlcnic_adapter *adapter)
 
 	if (err == QLCNIC_RCODE_SUCCESS) {
 		temp = le32_to_cpu(prsp->cds_ring.host_producer_crb);
-		tx_ring->crb_cmd_producer = qlcnic_get_ioaddr(adapter,
+		if (adapter->fw_hal_version == QLCNIC_FW_BASE)
+			tx_ring->crb_cmd_producer = qlcnic_get_ioaddr(adapter,
 				QLCNIC_REG(temp - 0x200));
+		else
+			tx_ring->crb_cmd_producer = adapter->ahw.pci_base0 +
+				temp;
 
 		adapter->tx_context_id =
 			le16_to_cpu(prsp->context_id);
@@ -351,7 +365,7 @@ qlcnic_fw_cmd_destroy_tx_ctx(struct qlcnic_adapter *adapter)
 {
 	if (qlcnic_issue_cmd(adapter,
 			adapter->ahw.pci_func,
-			QLCHAL_VERSION,
+			adapter->fw_hal_version,
 			adapter->tx_context_id,
 			QLCNIC_DESTROY_CTX_RESET,
 			0,
@@ -368,7 +382,7 @@ qlcnic_fw_cmd_query_phy(struct qlcnic_adapter *adapter, u32 reg, u32 *val)
 
 	if (qlcnic_issue_cmd(adapter,
 			adapter->ahw.pci_func,
-			QLCHAL_VERSION,
+			adapter->fw_hal_version,
 			reg,
 			0,
 			0,
@@ -385,7 +399,7 @@ qlcnic_fw_cmd_set_phy(struct qlcnic_adapter *adapter, u32 reg, u32 val)
 {
 	return qlcnic_issue_cmd(adapter,
 			adapter->ahw.pci_func,
-			QLCHAL_VERSION,
+			adapter->fw_hal_version,
 			reg,
 			val,
 			0,
@@ -533,3 +547,464 @@ void qlcnic_free_hw_resources(struct qlcnic_adapter *adapter)
 	}
 }
 
+/* Set MAC address of a NIC partition */
+int qlcnic_set_mac_address(struct qlcnic_adapter *adapter, u8* mac)
+{
+	int err = 0;
+	u32 arg1, arg2, arg3;
+
+	arg1 = adapter->ahw.pci_func | BIT_9;
+	arg2 = mac[0] | (mac[1] << 8) | (mac[2] << 16) | (mac[3] << 24);
+	arg3 = mac[4] | (mac[5] << 16);
+
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			arg1,
+			arg2,
+			arg3,
+			QLCNIC_CDRP_CMD_MAC_ADDRESS);
+
+	if (err != QLCNIC_RCODE_SUCCESS) {
+		dev_err(&adapter->pdev->dev,
+			"Failed to set mac address%d\n", err);
+		err = -EIO;
+	}
+
+	return err;
+}
+
+/* Get MAC address of a NIC partition */
+int qlcnic_get_mac_address(struct qlcnic_adapter *adapter, u8 *mac)
+{
+	int err;
+	u32 arg1;
+
+	arg1 = adapter->ahw.pci_func | BIT_8;
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			arg1,
+			0,
+			0,
+			QLCNIC_CDRP_CMD_MAC_ADDRESS);
+
+	if (err == QLCNIC_RCODE_SUCCESS) {
+		qlcnic_fetch_mac(adapter, QLCNIC_ARG1_CRB_OFFSET,
+				QLCNIC_ARG2_CRB_OFFSET, 0, mac);
+		dev_info(&adapter->pdev->dev, "MAC address: %pM\n", mac);
+	} else {
+		dev_err(&adapter->pdev->dev,
+			"Failed to get mac address%d\n", err);
+		err = -EIO;
+	}
+
+	return err;
+}
+
+/* Get info of a NIC partition */
+int qlcnic_get_nic_info(struct qlcnic_adapter *adapter, u8 func_id)
+{
+	int	err;
+	dma_addr_t nic_dma_t;
+	struct qlcnic_info *nic_info;
+	void *nic_info_addr;
+	size_t	nic_size = sizeof(struct qlcnic_info);
+
+	nic_info_addr = pci_alloc_consistent(adapter->pdev,
+		nic_size, &nic_dma_t);
+	if (!nic_info_addr)
+		return -ENOMEM;
+	memset(nic_info_addr, 0, nic_size);
+
+	nic_info = (struct qlcnic_info *) nic_info_addr;
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			MSD(nic_dma_t),
+			LSD(nic_dma_t),
+			(func_id << 16 | nic_size),
+			QLCNIC_CDRP_CMD_GET_NIC_INFO);
+
+	if (err == QLCNIC_RCODE_SUCCESS) {
+		adapter->physical_port = le16_to_cpu(nic_info->phys_port);
+		adapter->switch_mode = le16_to_cpu(nic_info->switch_mode);
+		adapter->max_tx_ques = le16_to_cpu(nic_info->max_tx_ques);
+		adapter->max_rx_ques = le16_to_cpu(nic_info->max_rx_ques);
+		adapter->min_tx_bw = le16_to_cpu(nic_info->min_tx_bw);
+		adapter->max_tx_bw = le16_to_cpu(nic_info->max_tx_bw);
+		adapter->max_mtu = le16_to_cpu(nic_info->max_mtu);
+		adapter->capabilities = le32_to_cpu(nic_info->capabilities);
+		adapter->max_mac_filters = nic_info->max_mac_filters;
+
+		dev_info(&adapter->pdev->dev,
+			"phy port: %d switch_mode: %d,\n"
+			"\tmax_tx_q: %d max_rx_q: %d min_tx_bw: 0x%x,\n"
+			"\tmax_tx_bw: 0x%x max_mtu:0x%x, capabilities: 0x%x\n",
+			adapter->physical_port, adapter->switch_mode,
+			adapter->max_tx_ques, adapter->max_rx_ques,
+			adapter->min_tx_bw, adapter->max_tx_bw,
+			adapter->max_mtu, adapter->capabilities);
+	} else {
+		dev_err(&adapter->pdev->dev,
+			"Failed to get nic info%d\n", err);
+		err = -EIO;
+	}
+
+	pci_free_consistent(adapter->pdev, nic_size, nic_info_addr, nic_dma_t);
+	return err;
+}
+
+/* Configure a NIC partition */
+int qlcnic_set_nic_info(struct qlcnic_adapter *adapter, struct qlcnic_info *nic)
+{
+	int err = -EIO;
+	u32 func_state;
+	dma_addr_t nic_dma_t;
+	void *nic_info_addr;
+	struct qlcnic_info *nic_info;
+	size_t nic_size = sizeof(struct qlcnic_info);
+
+	if (adapter->op_mode != QLCNIC_MGMT_FUNC)
+		return err;
+
+	if (qlcnic_api_lock(adapter))
+		return err;
+
+	func_state = QLCRD32(adapter, QLCNIC_CRB_DEV_REF_COUNT);
+	if (QLC_DEV_CHECK_ACTIVE(func_state, nic->pci_func)) {
+		qlcnic_api_unlock(adapter);
+		return err;
+	}
+
+	qlcnic_api_unlock(adapter);
+
+	nic_info_addr = pci_alloc_consistent(adapter->pdev, nic_size,
+			&nic_dma_t);
+	if (!nic_info_addr)
+		return -ENOMEM;
+
+	memset(nic_info_addr, 0, nic_size);
+	nic_info = (struct qlcnic_info *)nic_info_addr;
+
+	nic_info->pci_func = cpu_to_le16(nic->pci_func);
+	nic_info->op_mode = cpu_to_le16(nic->op_mode);
+	nic_info->phys_port = cpu_to_le16(nic->phys_port);
+	nic_info->switch_mode = cpu_to_le16(nic->switch_mode);
+	nic_info->capabilities = cpu_to_le32(nic->capabilities);
+	nic_info->max_mac_filters = nic->max_mac_filters;
+	nic_info->max_tx_ques = cpu_to_le16(nic->max_tx_ques);
+	nic_info->max_rx_ques = cpu_to_le16(nic->max_rx_ques);
+	nic_info->min_tx_bw = cpu_to_le16(nic->min_tx_bw);
+	nic_info->max_tx_bw = cpu_to_le16(nic->max_tx_bw);
+
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			MSD(nic_dma_t),
+			LSD(nic_dma_t),
+			nic_size,
+			QLCNIC_CDRP_CMD_SET_NIC_INFO);
+
+	if (err != QLCNIC_RCODE_SUCCESS) {
+		dev_err(&adapter->pdev->dev,
+			"Failed to set nic info%d\n", err);
+		err = -EIO;
+	}
+
+	pci_free_consistent(adapter->pdev, nic_size, nic_info_addr, nic_dma_t);
+	return err;
+}
+
+/* Get PCI Info of a partition */
+int qlcnic_get_pci_info(struct qlcnic_adapter *adapter)
+{
+	int err = 0, i;
+	dma_addr_t pci_info_dma_t;
+	struct qlcnic_pci_info *npar;
+	void *pci_info_addr;
+	size_t npar_size = sizeof(struct qlcnic_pci_info);
+	size_t pci_size = npar_size * QLCNIC_MAX_PCI_FUNC;
+
+	pci_info_addr = pci_alloc_consistent(adapter->pdev, pci_size,
+			&pci_info_dma_t);
+	if (!pci_info_addr)
+		return -ENOMEM;
+	memset(pci_info_addr, 0, pci_size);
+
+	if (!adapter->npars)
+		adapter->npars = kzalloc(pci_size, GFP_KERNEL);
+	if (!adapter->npars) {
+		err = -ENOMEM;
+		goto err_npar;
+	}
+
+	if (!adapter->eswitch)
+		adapter->eswitch = kzalloc(sizeof(struct qlcnic_eswitch) *
+				QLCNIC_NIU_MAX_XG_PORTS, GFP_KERNEL);
+	if (!adapter->eswitch) {
+		err = -ENOMEM;
+		goto err_eswitch;
+	}
+
+	npar = (struct qlcnic_pci_info *) pci_info_addr;
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			MSD(pci_info_dma_t),
+			LSD(pci_info_dma_t),
+			pci_size,
+			QLCNIC_CDRP_CMD_GET_PCI_INFO);
+
+	if (err == QLCNIC_RCODE_SUCCESS) {
+		for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++, npar++) {
+			adapter->npars[i].id = le32_to_cpu(npar->id);
+			adapter->npars[i].active = le32_to_cpu(npar->active);
+			adapter->npars[i].type = le32_to_cpu(npar->type);
+			adapter->npars[i].default_port =
+				le32_to_cpu(npar->default_port);
+			adapter->npars[i].tx_min_bw =
+				le32_to_cpu(npar->tx_min_bw);
+			adapter->npars[i].tx_max_bw =
+				le32_to_cpu(npar->tx_max_bw);
+			memcpy(adapter->npars[i].mac, npar->mac, ETH_ALEN);
+		}
+	} else {
+		dev_err(&adapter->pdev->dev,
+			"Failed to get PCI Info%d\n", err);
+		kfree(adapter->npars);
+		err = -EIO;
+	}
+	goto err_npar;
+
+err_eswitch:
+	kfree(adapter->npars);
+	adapter->npars = NULL;
+
+err_npar:
+	pci_free_consistent(adapter->pdev, pci_size, pci_info_addr,
+		pci_info_dma_t);
+	return err;
+}
+
+/* Reset a NIC partition */
+
+int qlcnic_reset_partition(struct qlcnic_adapter *adapter, u8 func_no)
+{
+	int err = -EIO;
+
+	if (adapter->op_mode != QLCNIC_MGMT_FUNC)
+		return err;
+
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			func_no,
+			0,
+			0,
+			QLCNIC_CDRP_CMD_RESET_NPAR);
+
+	if (err != QLCNIC_RCODE_SUCCESS) {
+		dev_err(&adapter->pdev->dev,
+			"Failed to issue reset partition%d\n", err);
+		err = -EIO;
+	}
+
+	return err;
+}
+
+/* Get eSwitch Capabilities */
+int qlcnic_get_eswitch_capabilities(struct qlcnic_adapter *adapter, u8 port,
+					struct qlcnic_eswitch *eswitch)
+{
+	int err = -EIO;
+	u32 arg1, arg2;
+
+	if (adapter->op_mode == QLCNIC_NON_PRIV_FUNC)
+		return err;
+
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			port,
+			0,
+			0,
+			QLCNIC_CDRP_CMD_GET_ESWITCH_CAPABILITY);
+
+	if (err == QLCNIC_RCODE_SUCCESS) {
+		arg1 = QLCRD32(adapter, QLCNIC_ARG1_CRB_OFFSET);
+		arg2 = QLCRD32(adapter, QLCNIC_ARG2_CRB_OFFSET);
+
+		eswitch->port = arg1 & 0xf;
+		eswitch->active_vports = LSB(arg2);
+		eswitch->max_ucast_filters = MSB(arg2);
+		eswitch->max_active_vlans = LSB(MSW(arg2));
+		if (arg1 & BIT_6)
+			eswitch->flags |= QLCNIC_SWITCH_VLAN_FILTERING;
+		if (arg1 & BIT_7)
+			eswitch->flags |= QLCNIC_SWITCH_PROMISC_MODE;
+		if (arg1 & BIT_8)
+			eswitch->flags |= QLCNIC_SWITCH_PORT_MIRRORING;
+	} else {
+		dev_err(&adapter->pdev->dev,
+			"Failed to get eswitch capabilities%d\n", err);
+	}
+
+	return err;
+}
+
+/* Get current status of eswitch */
+int qlcnic_get_eswitch_status(struct qlcnic_adapter *adapter, u8 port,
+				struct qlcnic_eswitch *eswitch)
+{
+	int err = -EIO;
+	u32 arg1, arg2;
+
+	if (adapter->op_mode != QLCNIC_MGMT_FUNC)
+		return err;
+
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			port,
+			0,
+			0,
+			QLCNIC_CDRP_CMD_GET_ESWITCH_STATUS);
+
+	if (err == QLCNIC_RCODE_SUCCESS) {
+		arg1 = QLCRD32(adapter, QLCNIC_ARG1_CRB_OFFSET);
+		arg2 = QLCRD32(adapter, QLCNIC_ARG2_CRB_OFFSET);
+
+		eswitch->port = arg1 & 0xf;
+		eswitch->active_vports = LSB(arg2);
+		eswitch->active_ucast_filters = MSB(arg2);
+		eswitch->active_vlans = LSB(MSW(arg2));
+		if (arg1 & BIT_6)
+			eswitch->flags |= QLCNIC_SWITCH_VLAN_FILTERING;
+		if (arg1 & BIT_8)
+			eswitch->flags |= QLCNIC_SWITCH_PORT_MIRRORING;
+
+	} else {
+		dev_err(&adapter->pdev->dev,
+			"Failed to get eswitch status%d\n", err);
+	}
+
+	return err;
+}
+
+/* Enable/Disable eSwitch */
+int qlcnic_toggle_eswitch(struct qlcnic_adapter *adapter, u8 id, u8 enable)
+{
+	int err = -EIO;
+	u32 arg1, arg2;
+	struct qlcnic_eswitch *eswitch;
+
+	if (adapter->op_mode != QLCNIC_MGMT_FUNC)
+		return err;
+
+	eswitch = &adapter->eswitch[id];
+	if (!eswitch)
+		return err;
+
+	arg1 = eswitch->port | (enable ? BIT_4 : 0);
+	arg2 = eswitch->active_vports | (eswitch->max_ucast_filters << 8) |
+		(eswitch->max_active_vlans << 16);
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			arg1,
+			arg2,
+			0,
+			QLCNIC_CDRP_CMD_TOGGLE_ESWITCH);
+
+	if (err != QLCNIC_RCODE_SUCCESS) {
+		dev_err(&adapter->pdev->dev,
+			"Failed to enable eswitch%d\n", eswitch->port);
+		eswitch->flags &= ~QLCNIC_SWITCH_ENABLE;
+		err = -EIO;
+	} else {
+		eswitch->flags |= QLCNIC_SWITCH_ENABLE;
+		dev_info(&adapter->pdev->dev,
+			"Enabled eSwitch for port %d\n", eswitch->port);
+	}
+
+	return err;
+}
+
+/* Configure eSwitch for port mirroring */
+int qlcnic_config_port_mirroring(struct qlcnic_adapter *adapter, u8 id,
+				u8 enable_mirroring, u8 pci_func)
+{
+	int err = -EIO;
+	u32 arg1;
+
+	if (adapter->op_mode != QLCNIC_MGMT_FUNC ||
+		!(adapter->eswitch[id].flags & QLCNIC_SWITCH_ENABLE))
+		return err;
+
+	arg1 = id | (enable_mirroring ? BIT_4 : 0);
+	arg1 |= pci_func << 8;
+
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			arg1,
+			0,
+			0,
+			QLCNIC_CDRP_CMD_SET_PORTMIRRORING);
+
+	if (err != QLCNIC_RCODE_SUCCESS) {
+		dev_err(&adapter->pdev->dev,
+			"Failed to configure port mirroring%d on eswitch:%d\n",
+			pci_func, id);
+	} else {
+		dev_info(&adapter->pdev->dev,
+			"Configured eSwitch %d for port mirroring:%d\n",
+			id, pci_func);
+	}
+
+	return err;
+}
+
+/* Configure eSwitch port */
+int qlcnic_config_switch_port(struct qlcnic_adapter *adapter, u8 id,
+		int vlan_tagging, u8 discard_tagged, u8 promsc_mode,
+		u8 mac_learn, u8 pci_func, u16 vlan_id)
+{
+	int err = -EIO;
+	u32 arg1;
+	struct qlcnic_eswitch *eswitch;
+
+	if (adapter->op_mode != QLCNIC_MGMT_FUNC)
+		return err;
+
+	eswitch = &adapter->eswitch[id];
+	if (!(eswitch->flags & QLCNIC_SWITCH_ENABLE))
+		return err;
+
+	arg1 = eswitch->port | (discard_tagged ? BIT_4 : 0);
+	arg1 |= (promsc_mode ? BIT_6 : 0) | (mac_learn ? BIT_7 : 0);
+	arg1 |= pci_func << 8;
+	if (vlan_tagging)
+		arg1 |= BIT_5 | (vlan_id << 16);
+
+	err = qlcnic_issue_cmd(adapter,
+			adapter->ahw.pci_func,
+			adapter->fw_hal_version,
+			arg1,
+			0,
+			0,
+			QLCNIC_CDRP_CMD_CONFIGURE_ESWITCH);
+
+	if (err != QLCNIC_RCODE_SUCCESS) {
+		dev_err(&adapter->pdev->dev,
+			"Failed to configure eswitch port%d\n", eswitch->port);
+		eswitch->flags |= QLCNIC_SWITCH_ENABLE;
+	} else {
+		eswitch->flags &= ~QLCNIC_SWITCH_ENABLE;
+		dev_info(&adapter->pdev->dev,
+			"Configured eSwitch for port %d\n", eswitch->port);
+	}
+
+	return err;
+}
diff --git a/drivers/net/qlcnic/qlcnic_ethtool.c b/drivers/net/qlcnic/qlcnic_ethtool.c
index 3bd514e..3e4822a 100644
--- a/drivers/net/qlcnic/qlcnic_ethtool.c
+++ b/drivers/net/qlcnic/qlcnic_ethtool.c
@@ -683,13 +683,13 @@ static int qlcnic_loopback_test(struct net_device *netdev)
 	if (ret)
 		goto clear_it;
 
-	ret = qlcnic_set_ilb_mode(adapter);
+	ret = adapter->nic_ops->set_ilb_mode(adapter);
 	if (ret)
 		goto done;
 
 	ret = qlcnic_do_ilb_test(adapter);
 
-	qlcnic_clear_ilb_mode(adapter);
+	adapter->nic_ops->clear_ilb_mode(adapter);
 
 done:
 	qlcnic_diag_free_res(netdev, max_sds_rings);
@@ -715,7 +715,8 @@ static int qlcnic_irq_test(struct net_device *netdev)
 
 	adapter->diag_cnt = 0;
 	ret = qlcnic_issue_cmd(adapter, adapter->ahw.pci_func,
-			QLCHAL_VERSION, adapter->portnum, 0, 0, 0x00000011);
+			adapter->fw_hal_version, adapter->portnum,
+			0, 0, 0x00000011);
 	if (ret)
 		goto done;
 
@@ -834,7 +835,7 @@ static int qlcnic_blink_led(struct net_device *dev, u32 val)
 	struct qlcnic_adapter *adapter = netdev_priv(dev);
 	int ret;
 
-	ret = qlcnic_config_led(adapter, 1, 0xf);
+	ret = adapter->nic_ops->config_led(adapter, 1, 0xf);
 	if (ret) {
 		dev_err(&adapter->pdev->dev,
 			"Failed to set LED blink state.\n");
@@ -843,7 +844,7 @@ static int qlcnic_blink_led(struct net_device *dev, u32 val)
 
 	msleep_interruptible(val * 1000);
 
-	ret = qlcnic_config_led(adapter, 0, 0xf);
+	ret = adapter->nic_ops->config_led(adapter, 0, 0xf);
 	if (ret) {
 		dev_err(&adapter->pdev->dev,
 			"Failed to reset LED blink state.\n");
diff --git a/drivers/net/qlcnic/qlcnic_hdr.h b/drivers/net/qlcnic/qlcnic_hdr.h
index ad9d167..1bcfb12 100644
--- a/drivers/net/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/qlcnic/qlcnic_hdr.h
@@ -208,6 +208,39 @@ enum {
 	QLCNIC_HW_PX_MAP_CRB_PGR0
 };
 
+#define	BIT_0	0x1
+#define	BIT_1	0x2
+#define	BIT_2	0x4
+#define	BIT_3	0x8
+#define	BIT_4	0x10
+#define	BIT_5	0x20
+#define	BIT_6	0x40
+#define	BIT_7	0x80
+#define	BIT_8	0x100
+#define	BIT_9	0x200
+#define	BIT_10	0x400
+#define	BIT_11	0x800
+#define	BIT_12	0x1000
+#define	BIT_13	0x2000
+#define	BIT_14	0x4000
+#define	BIT_15	0x8000
+#define	BIT_16	0x10000
+#define	BIT_17	0x20000
+#define	BIT_18	0x40000
+#define	BIT_19	0x80000
+#define	BIT_20	0x100000
+#define	BIT_21	0x200000
+#define	BIT_22	0x400000
+#define	BIT_23	0x800000
+#define	BIT_24	0x1000000
+#define	BIT_25	0x2000000
+#define	BIT_26	0x4000000
+#define	BIT_27	0x8000000
+#define	BIT_28	0x10000000
+#define	BIT_29	0x20000000
+#define	BIT_30	0x40000000
+#define	BIT_31	0x80000000
+
 /*  This field defines CRB adr [31:20] of the agents */
 
 #define QLCNIC_HW_CRB_HUB_AGT_ADR_MN	\
@@ -684,12 +717,20 @@ enum {
 #define QLCNIC_DEV_FAILED		0x6
 #define QLCNIC_DEV_QUISCENT		0x7
 
+#define QLC_DEV_CHECK_ACTIVE(VAL, FN)		((VAL) &= (1 << (FN * 4)))
 #define QLC_DEV_SET_REF_CNT(VAL, FN)		((VAL) |= (1 << (FN * 4)))
 #define QLC_DEV_CLR_REF_CNT(VAL, FN)		((VAL) &= ~(1 << (FN * 4)))
 #define QLC_DEV_SET_RST_RDY(VAL, FN)		((VAL) |= (1 << (FN * 4)))
 #define QLC_DEV_SET_QSCNT_RDY(VAL, FN)		((VAL) |= (2 << (FN * 4)))
 #define QLC_DEV_CLR_RST_QSCNT(VAL, FN)		((VAL) &= ~(3 << (FN * 4)))
 
+#define QLC_DEV_GET_DRV(VAL, FN)		(0xf & ((VAL) >> (FN * 4)))
+#define QLC_DEV_SET_DRV(VAL, FN)		((VAL) << (FN * 4))
+
+#define QLCNIC_TYPE_NIC		1
+#define QLCNIC_TYPE_FCOE		2
+#define QLCNIC_TYPE_ISCSI		3
+
 #define QLCNIC_RCODE_DRIVER_INFO		0x20000000
 #define QLCNIC_RCODE_DRIVER_CAN_RELOAD		0x40000000
 #define QLCNIC_RCODE_FATAL_ERROR		0x80000000
@@ -721,6 +762,35 @@ struct qlcnic_legacy_intr_set {
 	u32	pci_int_reg;
 };
 
+#define QLCNIC_FW_API		0x1b216c
+#define QLCNIC_DRV_OP_MODE	0x1b2170
+#define QLCNIC_MSIX_BASE	0x132110
+#define QLCNIC_MAX_PCI_FUNC	8
+
+/* PCI function operational mode */
+enum {
+	QLCNIC_MGMT_FUNC	= 0,
+	QLCNIC_PRIV_FUNC	= 1,
+	QLCNIC_NON_PRIV_FUNC	= 2
+};
+
+/* FW HAL api version */
+enum {
+	QLCNIC_FW_BASE	= 1,
+	QLCNIC_FW_NPAR	= 2
+};
+
+#define QLC_DEV_DRV_DEFAULT 0x11111111
+
+#define LSB(x)	((uint8_t)(x))
+#define MSB(x)	((uint8_t)((uint16_t)(x) >> 8))
+
+#define LSW(x)  ((uint16_t)((uint32_t)(x)))
+#define MSW(x)  ((uint16_t)((uint32_t)(x) >> 16))
+
+#define LSD(x)  ((uint32_t)((uint64_t)(x)))
+#define MSD(x)  ((uint32_t)((((uint64_t)(x)) >> 16) >> 16))
+
 #define	QLCNIC_LEGACY_INTR_CONFIG					\
 {									\
 	{								\
diff --git a/drivers/net/qlcnic/qlcnic_hw.c b/drivers/net/qlcnic/qlcnic_hw.c
index 0c2e1f0..f776956 100644
--- a/drivers/net/qlcnic/qlcnic_hw.c
+++ b/drivers/net/qlcnic/qlcnic_hw.c
@@ -538,7 +538,7 @@ int qlcnic_config_hw_lro(struct qlcnic_adapter *adapter, int enable)
 	return rv;
 }
 
-int qlcnic_config_bridged_mode(struct qlcnic_adapter *adapter, int enable)
+int qlcnic_config_bridged_mode(struct qlcnic_adapter *adapter, u32 enable)
 {
 	struct qlcnic_nic_req req;
 	u64 word;
@@ -704,21 +704,15 @@ int qlcnic_change_mtu(struct net_device *netdev, int mtu)
 	return rc;
 }
 
-int qlcnic_get_mac_addr(struct qlcnic_adapter *adapter, u64 *mac)
+int qlcnic_get_mac_addr(struct qlcnic_adapter *adapter, u8 *mac)
 {
-	u32 crbaddr, mac_hi, mac_lo;
+	u32 crbaddr;
 	int pci_func = adapter->ahw.pci_func;
 
 	crbaddr = CRB_MAC_BLOCK_START +
 		(4 * ((pci_func/2) * 3)) + (4 * (pci_func & 1));
 
-	mac_lo = QLCRD32(adapter, crbaddr);
-	mac_hi = QLCRD32(adapter, crbaddr+4);
-
-	if (pci_func & 1)
-		*mac = le64_to_cpu((mac_lo >> 16) | ((u64)mac_hi << 16));
-	else
-		*mac = le64_to_cpu((u64)mac_lo | ((u64)mac_hi << 32));
+	qlcnic_fetch_mac(adapter, crbaddr, crbaddr+4, pci_func & 1, mac);
 
 	return 0;
 }
diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index 71a4e66..635c990 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -520,17 +520,16 @@ qlcnic_setup_idc_param(struct qlcnic_adapter *adapter) {
 	int timeo;
 	u32 val;
 
-	val = QLCRD32(adapter, QLCNIC_CRB_DEV_PARTITION_INFO);
-	val = (val >> (adapter->portnum * 4)) & 0xf;
-
-	if ((val & 0x3) != 1) {
-		dev_err(&adapter->pdev->dev, "Not an Ethernet NIC func=%u\n",
-									val);
-		return -EIO;
+	if (adapter->fw_hal_version == QLCNIC_FW_BASE) {
+		val = QLCRD32(adapter, QLCNIC_CRB_DEV_PARTITION_INFO);
+		val = QLC_DEV_GET_DRV(val, adapter->portnum);
+		if ((val & 0x3) != QLCNIC_TYPE_NIC) {
+			dev_err(&adapter->pdev->dev,
+				"Not an Ethernet NIC func=%u\n", val);
+			return -EIO;
+		}
+		adapter->physical_port = (val >> 2);
 	}
-
-	adapter->physical_port = (val >> 2);
-
 	if (qlcnic_rom_fast_read(adapter, QLCNIC_ROM_DEV_INIT_TIMEOUT, &timeo))
 		timeo = 30;
 
@@ -1701,3 +1700,24 @@ qlcnic_process_rcv_ring_diag(struct qlcnic_host_sds_ring *sds_ring)
 	sds_ring->consumer = consumer;
 	writel(consumer, sds_ring->crb_sts_consumer);
 }
+
+void
+qlcnic_fetch_mac(struct qlcnic_adapter *adapter, u32 off1, u32 off2,
+			u8 alt_mac, u8 *mac)
+{
+	u32 mac_low, mac_high;
+	int i;
+
+	mac_low = QLCRD32(adapter, off1);
+	mac_high = QLCRD32(adapter, off2);
+
+	if (alt_mac) {
+		mac_low |= (mac_low >> 16) | (mac_high << 16);
+		mac_high >>= 16;
+	}
+
+	for (i = 0; i < 2; i++)
+		mac[i] = (u8)(mac_high >> ((1 - i) * 8));
+	for (i = 2; i < 6; i++)
+		mac[i] = (u8)(mac_low >> ((5 - i) * 8));
+}
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 23ea9ca..1e5e66f 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -65,6 +65,10 @@ static int load_fw_file;
 module_param(load_fw_file, int, 0644);
 MODULE_PARM_DESC(load_fw_file, "Load firmware from (0=flash, 1=file");
 
+static int qlcnic_config_npars;
+module_param(qlcnic_config_npars, int, 0644);
+MODULE_PARM_DESC(qlcnic_config_npars, "Configure NPARs (0=disabled, 1=enabled");
+
 static int __devinit qlcnic_probe(struct pci_dev *pdev,
 		const struct pci_device_id *ent);
 static void __devexit qlcnic_remove(struct pci_dev *pdev);
@@ -307,19 +311,14 @@ static void qlcnic_init_msix_entries(struct qlcnic_adapter *adapter, int count)
 static int
 qlcnic_read_mac_addr(struct qlcnic_adapter *adapter)
 {
-	int i;
-	unsigned char *p;
-	u64 mac_addr;
+	u8 mac_addr[ETH_ALEN];
 	struct net_device *netdev = adapter->netdev;
 	struct pci_dev *pdev = adapter->pdev;
 
-	if (qlcnic_get_mac_addr(adapter, &mac_addr) != 0)
+	if (adapter->nic_ops->get_mac_addr(adapter, mac_addr) != 0)
 		return -EIO;
 
-	p = (unsigned char *)&mac_addr;
-	for (i = 0; i < 6; i++)
-		netdev->dev_addr[i] = *(p + 5 - i);
-
+	memcpy(netdev->dev_addr, mac_addr, ETH_ALEN);
 	memcpy(netdev->perm_addr, netdev->dev_addr, netdev->addr_len);
 	memcpy(adapter->mac_addr, netdev->dev_addr, netdev->addr_len);
 
@@ -371,6 +370,22 @@ static const struct net_device_ops qlcnic_netdev_ops = {
 #endif
 };
 
+static struct qlcnic_nic_template qlcnic_ops = {
+	.get_mac_addr = qlcnic_get_mac_addr,
+	.config_bridged_mode = qlcnic_config_bridged_mode,
+	.config_led = qlcnic_config_led,
+	.set_ilb_mode = qlcnic_set_ilb_mode,
+	.clear_ilb_mode = qlcnic_clear_ilb_mode
+};
+
+static struct qlcnic_nic_template qlcnic_pf_ops = {
+	.get_mac_addr = qlcnic_get_mac_address,
+	.config_bridged_mode = qlcnic_config_bridged_mode,
+	.config_led = qlcnic_config_led,
+	.set_ilb_mode = qlcnic_set_ilb_mode,
+	.clear_ilb_mode = qlcnic_clear_ilb_mode
+};
+
 static void
 qlcnic_setup_intr(struct qlcnic_adapter *adapter)
 {
@@ -452,6 +467,125 @@ qlcnic_cleanup_pci_map(struct qlcnic_adapter *adapter)
 		iounmap(adapter->ahw.pci_base0);
 }
 
+/* Use api lock to access this function */
+static int
+qlcnic_set_function_modes(struct qlcnic_adapter *adapter)
+{
+	u8 id;
+	u32 ref_count;
+	int i, ret = 1;
+	u32 data = QLCNIC_MGMT_FUNC;
+	void __iomem *priv_op = adapter->ahw.pci_base0 + QLCNIC_DRV_OP_MODE;
+
+	/* If other drivers are not in use set their privilege level */
+	ref_count = QLCRD32(adapter, QLCNIC_CRB_DEV_REF_COUNT);
+	ret = qlcnic_api_lock(adapter);
+	if (ret)
+		goto err_lock;
+	if (QLC_DEV_CLR_REF_CNT(ref_count, adapter->ahw.pci_func))
+		goto err_npar;
+
+	for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++) {
+		id = adapter->npars[i].id;
+		if (adapter->npars[i].type != QLCNIC_TYPE_NIC ||
+			id == adapter->ahw.pci_func)
+			continue;
+		data |= (qlcnic_config_npars & QLC_DEV_SET_DRV(0xf, id));
+	}
+	writel(data, priv_op);
+
+err_npar:
+	qlcnic_api_unlock(adapter);
+err_lock:
+	return ret;
+}
+
+static u8
+qlcnic_set_mgmt_driver(struct qlcnic_adapter *adapter)
+{
+	u8 i, ret = 0;
+
+	if (qlcnic_get_pci_info(adapter))
+		return ret;
+	/* Set the eswitch */
+	for (i = 0; i < QLCNIC_NIU_MAX_XG_PORTS; i++) {
+		if (!qlcnic_get_eswitch_capabilities(adapter, i,
+			&adapter->eswitch[i])) {
+			ret++;
+			qlcnic_toggle_eswitch(adapter, i, ret);
+		}
+	}
+	return ret;
+}
+
+static u32
+qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
+{
+	void __iomem *msix_base_addr;
+	void __iomem *priv_op;
+	u32 func;
+	u32 msix_base;
+	u32 op_mode, priv_level;
+
+	/* Determine FW API version */
+	adapter->fw_hal_version = readl(adapter->ahw.pci_base0 + QLCNIC_FW_API);
+	if (adapter->fw_hal_version == ~0) {
+		adapter->nic_ops = &qlcnic_ops;
+		adapter->fw_hal_version = QLCNIC_FW_BASE;
+		adapter->ahw.pci_func = PCI_FUNC(adapter->pdev->devfn);
+		dev_info(&adapter->pdev->dev,
+			"FW does not support nic partion\n");
+		return adapter->fw_hal_version;
+	}
+
+	/* Find PCI function number */
+	pci_read_config_dword(adapter->pdev, QLCNIC_MSIX_TABLE_OFFSET, &func);
+	msix_base_addr = adapter->ahw.pci_base0 + QLCNIC_MSIX_BASE;
+	msix_base = readl(msix_base_addr);
+	func = (func - msix_base)/QLCNIC_MSIX_TBL_PGSIZE;
+	adapter->ahw.pci_func = func;
+
+	/* Determine function privilege level */
+	priv_op = adapter->ahw.pci_base0 + QLCNIC_DRV_OP_MODE;
+	op_mode = readl(priv_op);
+	if (op_mode == QLC_DEV_DRV_DEFAULT) {
+		priv_level = QLCNIC_MGMT_FUNC;
+		if (qlcnic_api_lock(adapter))
+			return 0;
+		op_mode = (op_mode & ~QLC_DEV_SET_DRV(0xf, func)) |
+				(QLC_DEV_SET_DRV(QLCNIC_MGMT_FUNC, func));
+		writel(op_mode, priv_op);
+		qlcnic_api_unlock(adapter);
+
+	} else
+		priv_level = QLC_DEV_GET_DRV(op_mode, adapter->ahw.pci_func);
+
+	switch (priv_level) {
+	case QLCNIC_MGMT_FUNC:
+		adapter->op_mode = QLCNIC_MGMT_FUNC;
+		adapter->nic_ops = &qlcnic_pf_ops;
+		/* Set privilege level for other functions */
+		if (qlcnic_config_npars)
+			qlcnic_set_function_modes(adapter);
+		dev_info(&adapter->pdev->dev,
+			"HAL Version: %d, Management function\n",
+			adapter->fw_hal_version);
+		break;
+	case QLCNIC_PRIV_FUNC:
+		adapter->op_mode = QLCNIC_PRIV_FUNC;
+		dev_info(&adapter->pdev->dev,
+			"HAL Version: %d, Privileged function\n",
+			adapter->fw_hal_version);
+		adapter->nic_ops = &qlcnic_pf_ops;
+		break;
+	default:
+		dev_info(&adapter->pdev->dev, "Unknown function mode: %d\n",
+			priv_level);
+		return 0;
+	}
+	return adapter->fw_hal_version;
+}
+
 static int
 qlcnic_setup_pci_map(struct qlcnic_adapter *adapter)
 {
@@ -460,7 +594,6 @@ qlcnic_setup_pci_map(struct qlcnic_adapter *adapter)
 	unsigned long mem_len, pci_len0 = 0;
 
 	struct pci_dev *pdev = adapter->pdev;
-	int pci_func = adapter->ahw.pci_func;
 
 	/* remap phys address */
 	mem_base = pci_resource_start(pdev, 0);	/* 0 is for BAR 0 */
@@ -483,8 +616,13 @@ qlcnic_setup_pci_map(struct qlcnic_adapter *adapter)
 	adapter->ahw.pci_base0 = mem_ptr0;
 	adapter->ahw.pci_len0 = pci_len0;
 
+	if (!qlcnic_get_driver_mode(adapter)) {
+		iounmap(adapter->ahw.pci_base0);
+		return -EIO;
+	}
+
 	adapter->ahw.ocm_win_crb = qlcnic_get_ioaddr(adapter,
-		QLCNIC_PCIX_PS_REG(PCIX_OCM_WINDOW_REG(pci_func)));
+		QLCNIC_PCIX_PS_REG(PCIX_OCM_WINDOW_REG(adapter->ahw.pci_func)));
 
 	return 0;
 }
@@ -553,7 +691,10 @@ qlcnic_check_options(struct qlcnic_adapter *adapter)
 	dev_info(&pdev->dev, "firmware v%d.%d.%d\n",
 			fw_major, fw_minor, fw_build);
 
-	adapter->capabilities = QLCRD32(adapter, CRB_FW_CAPABILITIES_1);
+	if (adapter->fw_hal_version == QLCNIC_FW_NPAR)
+		qlcnic_get_nic_info(adapter, adapter->ahw.pci_func);
+	else
+		adapter->capabilities = QLCRD32(adapter, CRB_FW_CAPABILITIES_1);
 
 	adapter->flags &= ~QLCNIC_LRO_ENABLED;
 
@@ -633,6 +774,10 @@ wait_init:
 
 	qlcnic_check_options(adapter);
 
+	if (adapter->fw_hal_version != QLCNIC_FW_BASE &&
+			adapter->op_mode == QLCNIC_MGMT_FUNC)
+		qlcnic_set_mgmt_driver(adapter);
+
 	adapter->need_fw_reset = 0;
 
 	qlcnic_release_firmware(adapter);
@@ -977,12 +1122,11 @@ qlcnic_setup_netdev(struct qlcnic_adapter *adapter,
 
 	SET_ETHTOOL_OPS(netdev, &qlcnic_ethtool_ops);
 
-	netdev->features |= (NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO);
-	netdev->features |= (NETIF_F_GRO);
-	netdev->vlan_features |= (NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO);
+	netdev->features |= (NETIF_F_SG | NETIF_F_IP_CSUM |
+		NETIF_F_IPV6_CSUM | NETIF_F_GRO | NETIF_F_TSO | NETIF_F_TSO6);
 
-	netdev->features |= (NETIF_F_IPV6_CSUM | NETIF_F_TSO6);
-	netdev->vlan_features |= (NETIF_F_IPV6_CSUM | NETIF_F_TSO6);
+	netdev->vlan_features |= (NETIF_F_SG | NETIF_F_IP_CSUM |
+		NETIF_F_IPV6_CSUM | NETIF_F_TSO | NETIF_F_TSO6);
 
 	if (pci_using_dac) {
 		netdev->features |= NETIF_F_HIGHDMA;
@@ -1036,7 +1180,6 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev = NULL;
 	struct qlcnic_adapter *adapter = NULL;
 	int err;
-	int pci_func_id = PCI_FUNC(pdev->devfn);
 	uint8_t revision_id;
 	uint8_t pci_using_dac;
 
@@ -1072,7 +1215,6 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter->netdev  = netdev;
 	adapter->pdev    = pdev;
 	adapter->dev_rst_time = jiffies;
-	adapter->ahw.pci_func  = pci_func_id;
 
 	revision_id = pdev->revision;
 	adapter->ahw.revision_id = revision_id;
@@ -1088,7 +1230,7 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_out_free_netdev;
 
 	/* This will be reset for mezz cards  */
-	adapter->portnum = pci_func_id;
+	adapter->portnum = adapter->ahw.pci_func;
 
 	err = qlcnic_get_board_info(adapter);
 	if (err) {
@@ -1175,6 +1317,11 @@ static void __devexit qlcnic_remove(struct pci_dev *pdev)
 
 	qlcnic_detach(adapter);
 
+	if (adapter->npars != NULL)
+		kfree(adapter->npars);
+	if (adapter->eswitch != NULL)
+		kfree(adapter->eswitch);
+
 	qlcnic_clr_all_drv_state(adapter);
 
 	clear_bit(__QLCNIC_RESETTING, &adapter->state);
@@ -1340,11 +1487,11 @@ qlcnic_tso_check(struct net_device *netdev,
 	u8 opcode = TX_ETHER_PKT;
 	__be16 protocol = skb->protocol;
 	u16 flags = 0, vid = 0;
-	u32 producer;
 	int copied, offset, copy_len, hdr_len = 0, tso = 0, vlan_oob = 0;
 	struct cmd_desc_type0 *hwdesc;
 	struct vlan_ethhdr *vh;
 	struct qlcnic_adapter *adapter = netdev_priv(netdev);
+	u32 producer = tx_ring->producer;
 
 	if (protocol == cpu_to_be16(ETH_P_8021Q)) {
 
@@ -1360,6 +1507,11 @@ qlcnic_tso_check(struct net_device *netdev,
 		vlan_oob = 1;
 	}
 
+	if (*(skb->data) & BIT_0) {
+		flags |= BIT_0;
+		memcpy(&first_desc->eth_addr, skb->data, ETH_ALEN);
+	}
+
 	if ((netdev->features & (NETIF_F_TSO | NETIF_F_TSO6)) &&
 			skb_shinfo(skb)->gso_size > 0) {
 
@@ -1409,7 +1561,6 @@ qlcnic_tso_check(struct net_device *netdev,
 	/* For LSO, we need to copy the MAC/IP/TCP headers into
 	 * the descriptor ring
 	 */
-	producer = tx_ring->producer;
 	copied = 0;
 	offset = 2;
 
@@ -2382,7 +2533,7 @@ qlcnic_store_bridged_mode(struct device *dev,
 	if (strict_strtoul(buf, 2, &new))
 		goto err_out;
 
-	if (!qlcnic_config_bridged_mode(adapter, !!new))
+	if (!adapter->nic_ops->config_bridged_mode(adapter, !!new))
 		ret = len;
 
 err_out:
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH 12/12] sfc: Get port number from CS_PORT_NUM, not PCI function number
From: Ben Hutchings @ 2010-06-01 21:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1275426967.2114.25.camel@achroite.uk.solarflarecom.com>

A single shared memory region used to communicate with firmware is
mapped into both PCI PFs of the SFC9020 and SFL9021.  Drivers must be
able to identify which port they are addressing in order to use the
correct sub-region.  Currently we use the PCI function number, but the
PCI address may be virtualised.  Use the CS_PORT_NUM register field
defined for just this purpose.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
David,

I think this may be a candidate for 2.6.35 and 2.6.{33,34}.y. If a
driver uses the wrong sub-region of shared memory it will horribly
confuse the firmware and any driver bound to the other function.

Ben.

 drivers/net/sfc/net_driver.h |    4 +++-
 drivers/net/sfc/siena.c      |    4 ++++
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index 40c0d93..6b2e440 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -649,6 +649,7 @@ union efx_multicast_hash {
  * struct efx_nic - an Efx NIC
  * @name: Device name (net device name or bus id before net device registered)
  * @pci_dev: The PCI device
+ * @port_num: Index of this host port within the controller
  * @type: Controller type attributes
  * @legacy_irq: IRQ number
  * @workqueue: Workqueue for port reconfigures and the HW monitor.
@@ -732,6 +733,7 @@ union efx_multicast_hash {
 struct efx_nic {
 	char name[IFNAMSIZ];
 	struct pci_dev *pci_dev;
+	unsigned port_num;
 	const struct efx_nic_type *type;
 	int legacy_irq;
 	struct workqueue_struct *workqueue;
@@ -834,7 +836,7 @@ static inline const char *efx_dev_name(struct efx_nic *efx)
 
 static inline unsigned int efx_port_num(struct efx_nic *efx)
 {
-	return PCI_FUNC(efx->pci_dev->devfn);
+	return efx->port_num;
 }
 
 /**
diff --git a/drivers/net/sfc/siena.c b/drivers/net/sfc/siena.c
index 727b422..7ecd255 100644
--- a/drivers/net/sfc/siena.c
+++ b/drivers/net/sfc/siena.c
@@ -206,6 +206,7 @@ static int siena_probe_nic(struct efx_nic *efx)
 {
 	struct siena_nic_data *nic_data;
 	bool already_attached = 0;
+	efx_oword_t reg;
 	int rc;
 
 	/* Allocate storage for hardware specific data */
@@ -220,6 +221,9 @@ static int siena_probe_nic(struct efx_nic *efx)
 		goto fail1;
 	}
 
+	efx_reado(efx, &reg, FR_AZ_CS_DEBUG);
+	efx->port_num = EFX_OWORD_FIELD(reg, FRF_CZ_CS_PORT_NUM) - 1;
+
 	efx_mcdi_init(efx);
 
 	/* Recover from a failed assertion before probing */
-- 
1.6.2.5

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-next-2.6 2/2] qlcnic: NIC Partitioning - Add non privileged mode support
From: Anirban Chakraborty @ 2010-06-01 21:33 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Amit Salecha, Ameen Rahman


Added support for NIC functions that work in non privileged mode where these 
functions are privileged to do IO only, the control operations are handled via 
privileged functions.
Bumped up version number to 5.0.3.

Please apply.

thanks,
Anirban

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |   13 +---
 drivers/net/qlcnic/qlcnic_hdr.h  |   14 +++--
 drivers/net/qlcnic/qlcnic_main.c |  124 +++++++++++++++++++++++++++++++++++---
 3 files changed, 128 insertions(+), 23 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 31a0b43..02db363 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -51,8 +51,8 @@
 
 #define _QLCNIC_LINUX_MAJOR 5
 #define _QLCNIC_LINUX_MINOR 0
-#define _QLCNIC_LINUX_SUBVERSION 2
-#define QLCNIC_LINUX_VERSIONID  "5.0.2"
+#define _QLCNIC_LINUX_SUBVERSION 3
+#define QLCNIC_LINUX_VERSIONID  "5.0.3"
 #define QLCNIC_DRV_IDC_VER  0x01
 
 #define QLCNIC_VERSION_CODE(a, b, c)	(((a) << 24) + ((b) << 16) + (c))
@@ -891,6 +891,7 @@ struct qlcnic_mac_req {
 #define QLCNIC_LRO_ENABLED		0x08
 #define QLCNIC_BRIDGE_ENABLED       	0X10
 #define QLCNIC_DIAG_ENABLED		0x20
+#define QLCNIC_NPAR_ENABLED		0x40
 #define QLCNIC_IS_MSI_FAMILY(adapter) \
 	((adapter)->flags & (QLCNIC_MSI_ENABLED | QLCNIC_MSIX_ENABLED))
 
@@ -1159,13 +1160,6 @@ int qlcnic_check_loopback_buff(unsigned char *data);
 netdev_tx_t qlcnic_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
 void qlcnic_process_rcv_ring_diag(struct qlcnic_host_sds_ring *sds_ring);
 
-/* Functions from qlcnic_vf.c */
-int qlcnicvf_config_bridged_mode(struct qlcnic_adapter *, u32);
-int qlcnicvf_config_led(struct qlcnic_adapter *, u32, u32);
-int qlcnicvf_set_ilb_mode(struct qlcnic_adapter *adapter);
-void qlcnicvf_clear_ilb_mode(struct qlcnic_adapter *adapter);
-void qlcnicvf_set_port_mode(struct qlcnic_adapter *adapter);
-
 /* Management functions */
 int qlcnic_set_mac_address(struct qlcnic_adapter *, u8*);
 int qlcnic_get_mac_address(struct qlcnic_adapter *, u8*);
@@ -1234,6 +1228,7 @@ struct qlcnic_nic_template {
 	int (*config_led) (struct qlcnic_adapter *, u32, u32);
 	int (*set_ilb_mode) (struct qlcnic_adapter *);
 	void (*clear_ilb_mode) (struct qlcnic_adapter *);
+	int (*start_firmware) (struct qlcnic_adapter *);
 };
 
 #define QLCDB(adapter, lvl, _fmt, _args...) do {	\
diff --git a/drivers/net/qlcnic/qlcnic_hdr.h b/drivers/net/qlcnic/qlcnic_hdr.h
index 1bcfb12..7b81cab 100644
--- a/drivers/net/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/qlcnic/qlcnic_hdr.h
@@ -701,10 +701,11 @@ enum {
 #define QLCNIC_CRB_DEV_REF_COUNT	(QLCNIC_CAM_RAM(0x138))
 #define QLCNIC_CRB_DEV_STATE		(QLCNIC_CAM_RAM(0x140))
 
-#define QLCNIC_CRB_DRV_STATE               (QLCNIC_CAM_RAM(0x144))
-#define QLCNIC_CRB_DRV_SCRATCH             (QLCNIC_CAM_RAM(0x148))
-#define QLCNIC_CRB_DEV_PARTITION_INFO      (QLCNIC_CAM_RAM(0x14c))
+#define QLCNIC_CRB_DRV_STATE		(QLCNIC_CAM_RAM(0x144))
+#define QLCNIC_CRB_DRV_SCRATCH		(QLCNIC_CAM_RAM(0x148))
+#define QLCNIC_CRB_DEV_PARTITION_INFO	(QLCNIC_CAM_RAM(0x14c))
 #define QLCNIC_CRB_DRV_IDC_VER		(QLCNIC_CAM_RAM(0x174))
+#define QLCNIC_CRB_DEV_NPAR_STATE	(QLCNIC_CAM_RAM(0x19c))
 #define QLCNIC_ROM_DEV_INIT_TIMEOUT	(0x3e885c)
 #define QLCNIC_ROM_DRV_RESET_TIMEOUT	(0x3e8860)
 
@@ -717,6 +718,9 @@ enum {
 #define QLCNIC_DEV_FAILED		0x6
 #define QLCNIC_DEV_QUISCENT		0x7
 
+#define QLCNIC_DEV_NPAR_NOT_RDY	0
+#define QLCNIC_DEV_NPAR_RDY		1
+
 #define QLC_DEV_CHECK_ACTIVE(VAL, FN)		((VAL) &= (1 << (FN * 4)))
 #define QLC_DEV_SET_REF_CNT(VAL, FN)		((VAL) |= (1 << (FN * 4)))
 #define QLC_DEV_CLR_REF_CNT(VAL, FN)		((VAL) &= ~(1 << (FN * 4)))
@@ -732,8 +736,8 @@ enum {
 #define QLCNIC_TYPE_ISCSI		3
 
 #define QLCNIC_RCODE_DRIVER_INFO		0x20000000
-#define QLCNIC_RCODE_DRIVER_CAN_RELOAD		0x40000000
-#define QLCNIC_RCODE_FATAL_ERROR		0x80000000
+#define QLCNIC_RCODE_DRIVER_CAN_RELOAD		BIT_30
+#define QLCNIC_RCODE_FATAL_ERROR		BIT_31
 #define QLCNIC_FWERROR_PEGNUM(code)		((code) & 0xff)
 #define QLCNIC_FWERROR_CODE(code)		((code >> 8) & 0xfffff)
 
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 1e5e66f..119bcae 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -103,7 +103,14 @@ static irqreturn_t qlcnic_msix_intr(int irq, void *data);
 
 static struct net_device_stats *qlcnic_get_stats(struct net_device *netdev);
 static void qlcnic_config_indev_addr(struct net_device *dev, unsigned long);
-
+static int qlcnic_start_firmware(struct qlcnic_adapter *);
+
+static void qlcnic_dev_set_npar_ready(struct qlcnic_adapter *);
+static void qlcnicvf_clear_ilb_mode(struct qlcnic_adapter *);
+static int qlcnicvf_set_ilb_mode(struct qlcnic_adapter *);
+static int qlcnicvf_config_led(struct qlcnic_adapter *, u32, u32);
+static int qlcnicvf_config_bridged_mode(struct qlcnic_adapter *, u32);
+static int qlcnicvf_start_firmware(struct qlcnic_adapter *);
 /*  PCI Device ID Table  */
 #define ENTRY(device) \
 	{PCI_DEVICE(PCI_VENDOR_ID_QLOGIC, (device)), \
@@ -375,7 +382,8 @@ static struct qlcnic_nic_template qlcnic_ops = {
 	.config_bridged_mode = qlcnic_config_bridged_mode,
 	.config_led = qlcnic_config_led,
 	.set_ilb_mode = qlcnic_set_ilb_mode,
-	.clear_ilb_mode = qlcnic_clear_ilb_mode
+	.clear_ilb_mode = qlcnic_clear_ilb_mode,
+	.start_firmware = qlcnic_start_firmware
 };
 
 static struct qlcnic_nic_template qlcnic_pf_ops = {
@@ -383,7 +391,17 @@ static struct qlcnic_nic_template qlcnic_pf_ops = {
 	.config_bridged_mode = qlcnic_config_bridged_mode,
 	.config_led = qlcnic_config_led,
 	.set_ilb_mode = qlcnic_set_ilb_mode,
-	.clear_ilb_mode = qlcnic_clear_ilb_mode
+	.clear_ilb_mode = qlcnic_clear_ilb_mode,
+	.start_firmware = qlcnic_start_firmware
+};
+
+static struct qlcnic_nic_template qlcnic_vf_ops = {
+	.get_mac_addr = qlcnic_get_mac_address,
+	.config_bridged_mode = qlcnicvf_config_bridged_mode,
+	.config_led = qlcnicvf_config_led,
+	.set_ilb_mode = qlcnicvf_set_ilb_mode,
+	.clear_ilb_mode = qlcnicvf_clear_ilb_mode,
+	.start_firmware = qlcnicvf_start_firmware
 };
 
 static void
@@ -467,7 +485,6 @@ qlcnic_cleanup_pci_map(struct qlcnic_adapter *adapter)
 		iounmap(adapter->ahw.pci_base0);
 }
 
-/* Use api lock to access this function */
 static int
 qlcnic_set_function_modes(struct qlcnic_adapter *adapter)
 {
@@ -567,6 +584,7 @@ qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
 		/* Set privilege level for other functions */
 		if (qlcnic_config_npars)
 			qlcnic_set_function_modes(adapter);
+		qlcnic_dev_set_npar_ready(adapter);
 		dev_info(&adapter->pdev->dev,
 			"HAL Version: %d, Management function\n",
 			adapter->fw_hal_version);
@@ -578,6 +596,13 @@ qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
 			adapter->fw_hal_version);
 		adapter->nic_ops = &qlcnic_pf_ops;
 		break;
+	case QLCNIC_NON_PRIV_FUNC:
+		adapter->op_mode = QLCNIC_NON_PRIV_FUNC;
+		dev_info(&adapter->pdev->dev,
+			"HAL Version: %d Non Privileged function\n",
+			adapter->fw_hal_version);
+		adapter->nic_ops = &qlcnic_vf_ops;
+		break;
 	default:
 		dev_info(&adapter->pdev->dev, "Unknown function mode: %d\n",
 			priv_level);
@@ -772,6 +797,8 @@ wait_init:
 	QLCWR32(adapter, QLCNIC_CRB_DEV_STATE, QLCNIC_DEV_READY);
 	qlcnic_idc_debug_info(adapter, 1);
 
+	qlcnic_dev_set_npar_ready(adapter);
+
 	qlcnic_check_options(adapter);
 
 	if (adapter->fw_hal_version != QLCNIC_FW_BASE &&
@@ -1244,7 +1271,7 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (qlcnic_setup_idc_param(adapter))
 		goto err_out_iounmap;
 
-	err = qlcnic_start_firmware(adapter);
+	err = adapter->nic_ops->start_firmware(adapter);
 	if (err) {
 		dev_err(&pdev->dev, "Loading fw failed.Please Reboot\n");
 		goto err_out_decr_ref;
@@ -1410,7 +1437,7 @@ qlcnic_resume(struct pci_dev *pdev)
 	pci_set_master(pdev);
 	pci_restore_state(pdev);
 
-	err = qlcnic_start_firmware(adapter);
+	err = adapter->nic_ops->start_firmware(adapter);
 	if (err) {
 		dev_err(&pdev->dev, "failed to start firmware\n");
 		return err;
@@ -2260,7 +2287,7 @@ qlcnic_fwinit_work(struct work_struct *work)
 {
 	struct qlcnic_adapter *adapter = container_of(work,
 			struct qlcnic_adapter, fw_work.work);
-	u32 dev_state = 0xf;
+	u32 dev_state = 0xf, npar_state;
 
 	if (qlcnic_api_lock(adapter))
 		goto err_ret;
@@ -2273,6 +2300,19 @@ qlcnic_fwinit_work(struct work_struct *work)
 		return;
 	}
 
+	if (adapter->op_mode == QLCNIC_NON_PRIV_FUNC) {
+		npar_state = QLCRD32(adapter, QLCNIC_CRB_DEV_NPAR_STATE);
+		if (npar_state == QLCNIC_DEV_NPAR_RDY) {
+			qlcnic_api_unlock(adapter);
+			goto wait_npar;
+		} else {
+			qlcnic_schedule_work(adapter, qlcnic_fwinit_work,
+				FW_POLL_DELAY);
+			qlcnic_api_unlock(adapter);
+			return;
+		}
+	}
+
 	if (adapter->fw_wait_cnt++ > adapter->reset_ack_timeo) {
 		dev_err(&adapter->pdev->dev, "Reset:Failed to get ack %d sec\n",
 					adapter->reset_ack_timeo);
@@ -2305,7 +2345,7 @@ skip_ack_check:
 
 		qlcnic_api_unlock(adapter);
 
-		if (!qlcnic_start_firmware(adapter)) {
+		if (!adapter->nic_ops->start_firmware(adapter)) {
 			qlcnic_schedule_work(adapter, qlcnic_attach_work, 0);
 			return;
 		}
@@ -2314,6 +2354,7 @@ skip_ack_check:
 
 	qlcnic_api_unlock(adapter);
 
+wait_npar:
 	dev_state = QLCRD32(adapter, QLCNIC_CRB_DEV_STATE);
 	QLCDB(adapter, HW, "Func waiting: Device state=%u\n", dev_state);
 
@@ -2328,7 +2369,7 @@ skip_ack_check:
 		break;
 
 	default:
-		if (!qlcnic_start_firmware(adapter)) {
+		if (!adapter->nic_ops->start_firmware(adapter)) {
 			qlcnic_schedule_work(adapter, qlcnic_attach_work, 0);
 			return;
 		}
@@ -2402,6 +2443,30 @@ qlcnic_dev_request_reset(struct qlcnic_adapter *adapter)
 	qlcnic_api_unlock(adapter);
 }
 
+/* Transit to NPAR READY state from NPAR NOT READY state */
+static void
+qlcnic_dev_set_npar_ready(struct qlcnic_adapter *adapter)
+{
+	u32 state;
+
+	if (adapter->op_mode == QLCNIC_NON_PRIV_FUNC ||
+		adapter->fw_hal_version == QLCNIC_FW_BASE)
+		return;
+
+	if (qlcnic_api_lock(adapter))
+		return;
+
+	state = QLCRD32(adapter, QLCNIC_CRB_DEV_NPAR_STATE);
+
+	if (state != QLCNIC_DEV_NPAR_RDY) {
+		QLCWR32(adapter, QLCNIC_CRB_DEV_NPAR_STATE,
+			QLCNIC_DEV_NPAR_RDY);
+		QLCDB(adapter, DRV, "NPAR READY state set\n");
+	}
+
+	qlcnic_api_unlock(adapter);
+}
+
 static void
 qlcnic_schedule_work(struct qlcnic_adapter *adapter,
 		work_func_t func, int delay)
@@ -2889,6 +2954,47 @@ done:
 	return NOTIFY_DONE;
 }
 
+static int
+qlcnicvf_start_firmware(struct qlcnic_adapter *adapter)
+{
+	int err;
+
+	err = qlcnic_can_start_firmware(adapter);
+	if (err)
+		return err;
+
+	qlcnic_check_options(adapter);
+
+	adapter->need_fw_reset = 0;
+
+	return err;
+}
+
+static int
+qlcnicvf_config_bridged_mode(struct qlcnic_adapter *adapter, u32 enable)
+{
+	return -EOPNOTSUPP;
+}
+
+static int
+qlcnicvf_config_led(struct qlcnic_adapter *adapter, u32 state, u32 rate)
+{
+	return -EOPNOTSUPP;
+}
+
+static int
+qlcnicvf_set_ilb_mode(struct qlcnic_adapter *adapter)
+{
+	return -EOPNOTSUPP;
+}
+
+static void
+qlcnicvf_clear_ilb_mode(struct qlcnic_adapter *adapter)
+{
+	return;
+}
+
+
 static struct notifier_block	qlcnic_netdev_cb = {
 	.notifier_call = qlcnic_netdev_event,
 };
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH 08/12] sfc: Support only two rx buffers per page
From: Ben Hutchings @ 2010-06-01 21:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1275426967.2114.25.camel@achroite.uk.solarflarecom.com>

From: Steve Hodgson <shodgson@solarflare.com>

- Pull the loop handling into efx_init_rx_buffers_(skb|page)
- Remove rx_queue->buf_page, and associated clean up code
- Remove unmap_addr, since unmap_addr is trivially calculable

This will allow us to recycle discarded buffers directly
from efx_rx_packet(), since will never be in the middle of
splitting a page.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/net_driver.h |   10 --
 drivers/net/sfc/rx.c         |  228 ++++++++++++++++++------------------------
 2 files changed, 96 insertions(+), 142 deletions(-)

diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index 4539803..59c8ecc 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -222,7 +222,6 @@ struct efx_tx_queue {
  *	If both this and skb are %NULL, the buffer slot is currently free.
  * @data: Pointer to ethernet header
  * @len: Buffer length, in bytes.
- * @unmap_addr: DMA address to unmap
  */
 struct efx_rx_buffer {
 	dma_addr_t dma_addr;
@@ -230,7 +229,6 @@ struct efx_rx_buffer {
 	struct page *page;
 	char *data;
 	unsigned int len;
-	dma_addr_t unmap_addr;
 };
 
 /**
@@ -257,11 +255,6 @@ struct efx_rx_buffer {
  * @alloc_page_count: RX allocation strategy counter.
  * @alloc_skb_count: RX allocation strategy counter.
  * @slow_fill: Timer used to defer efx_nic_generate_fill_event().
- * @buf_page: Page for next RX buffer.
- *	We can use a single page for multiple RX buffers. This tracks
- *	the remaining space in the allocation.
- * @buf_dma_addr: Page's DMA address.
- * @buf_data: Page's host address.
  * @flushed: Use when handling queue flushing
  */
 struct efx_rx_queue {
@@ -284,9 +277,6 @@ struct efx_rx_queue {
 	struct timer_list slow_fill;
 	unsigned int slow_fill_count;
 
-	struct page *buf_page;
-	dma_addr_t buf_dma_addr;
-	char *buf_data;
 	enum efx_flush_state flushed;
 };
 
diff --git a/drivers/net/sfc/rx.c b/drivers/net/sfc/rx.c
index bf1e55e..615a1fc 100644
--- a/drivers/net/sfc/rx.c
+++ b/drivers/net/sfc/rx.c
@@ -98,155 +98,132 @@ static inline unsigned int efx_rx_buf_size(struct efx_nic *efx)
 	return PAGE_SIZE << efx->rx_buffer_order;
 }
 
-
 /**
- * efx_init_rx_buffer_skb - create new RX buffer using skb-based allocation
+ * efx_init_rx_buffers_skb - create EFX_RX_BATCH skb-based RX buffers
  *
  * @rx_queue:		Efx RX queue
- * @rx_buf:		RX buffer structure to populate
  *
- * This allocates memory for a new receive buffer, maps it for DMA,
- * and populates a struct efx_rx_buffer with the relevant
- * information.  Return a negative error code or 0 on success.
+ * This allocates EFX_RX_BATCH skbs, maps them for DMA, and populates a
+ * struct efx_rx_buffer for each one. Return a negative error code or 0
+ * on success. May fail having only inserted fewer than EFX_RX_BATCH
+ * buffers.
  */
-static int efx_init_rx_buffer_skb(struct efx_rx_queue *rx_queue,
-				  struct efx_rx_buffer *rx_buf)
+static int efx_init_rx_buffers_skb(struct efx_rx_queue *rx_queue)
 {
 	struct efx_nic *efx = rx_queue->efx;
 	struct net_device *net_dev = efx->net_dev;
+	struct efx_rx_buffer *rx_buf;
 	int skb_len = efx->rx_buffer_len;
+	unsigned index, count;
 
-	rx_buf->skb = netdev_alloc_skb(net_dev, skb_len);
-	if (unlikely(!rx_buf->skb))
-		return -ENOMEM;
+	for (count = 0; count < EFX_RX_BATCH; ++count) {
+		index = rx_queue->added_count & EFX_RXQ_MASK;
+		rx_buf = efx_rx_buffer(rx_queue, index);
 
-	/* Adjust the SKB for padding and checksum */
-	skb_reserve(rx_buf->skb, NET_IP_ALIGN);
-	rx_buf->len = skb_len - NET_IP_ALIGN;
-	rx_buf->data = (char *)rx_buf->skb->data;
-	rx_buf->skb->ip_summed = CHECKSUM_UNNECESSARY;
+		rx_buf->skb = netdev_alloc_skb(net_dev, skb_len);
+		if (unlikely(!rx_buf->skb))
+			return -ENOMEM;
+		rx_buf->page = NULL;
 
-	rx_buf->dma_addr = pci_map_single(efx->pci_dev,
-					  rx_buf->data, rx_buf->len,
-					  PCI_DMA_FROMDEVICE);
+		/* Adjust the SKB for padding and checksum */
+		skb_reserve(rx_buf->skb, NET_IP_ALIGN);
+		rx_buf->len = skb_len - NET_IP_ALIGN;
+		rx_buf->data = (char *)rx_buf->skb->data;
+		rx_buf->skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+		rx_buf->dma_addr = pci_map_single(efx->pci_dev,
+						  rx_buf->data, rx_buf->len,
+						  PCI_DMA_FROMDEVICE);
+		if (unlikely(pci_dma_mapping_error(efx->pci_dev,
+						   rx_buf->dma_addr))) {
+			dev_kfree_skb_any(rx_buf->skb);
+			rx_buf->skb = NULL;
+			return -EIO;
+		}
 
-	if (unlikely(pci_dma_mapping_error(efx->pci_dev, rx_buf->dma_addr))) {
-		dev_kfree_skb_any(rx_buf->skb);
-		rx_buf->skb = NULL;
-		return -EIO;
+		++rx_queue->added_count;
+		++rx_queue->alloc_skb_count;
 	}
 
 	return 0;
 }
 
 /**
- * efx_init_rx_buffer_page - create new RX buffer using page-based allocation
+ * efx_init_rx_buffers_page - create EFX_RX_BATCH page-based RX buffers
  *
  * @rx_queue:		Efx RX queue
- * @rx_buf:		RX buffer structure to populate
  *
- * This allocates memory for a new receive buffer, maps it for DMA,
- * and populates a struct efx_rx_buffer with the relevant
- * information.  Return a negative error code or 0 on success.
+ * This allocates memory for EFX_RX_BATCH receive buffers, maps them for DMA,
+ * and populates struct efx_rx_buffers for each one. Return a negative error
+ * code or 0 on success. If a single page can be split between two buffers,
+ * then the page will either be inserted fully, or not at at all.
  */
-static int efx_init_rx_buffer_page(struct efx_rx_queue *rx_queue,
-				   struct efx_rx_buffer *rx_buf)
+static int efx_init_rx_buffers_page(struct efx_rx_queue *rx_queue)
 {
 	struct efx_nic *efx = rx_queue->efx;
-	int bytes, space, offset;
-
-	bytes = efx->rx_buffer_len - EFX_PAGE_IP_ALIGN;
-
-	/* If there is space left in the previously allocated page,
-	 * then use it. Otherwise allocate a new one */
-	rx_buf->page = rx_queue->buf_page;
-	if (rx_buf->page == NULL) {
-		dma_addr_t dma_addr;
-
-		rx_buf->page = alloc_pages(__GFP_COLD | __GFP_COMP | GFP_ATOMIC,
-					   efx->rx_buffer_order);
-		if (unlikely(rx_buf->page == NULL))
+	struct efx_rx_buffer *rx_buf;
+	struct page *page;
+	char *page_addr;
+	dma_addr_t dma_addr;
+	unsigned index, count;
+
+	/* We can split a page between two buffers */
+	BUILD_BUG_ON(EFX_RX_BATCH & 1);
+
+	for (count = 0; count < EFX_RX_BATCH; ++count) {
+		page = alloc_pages(__GFP_COLD | __GFP_COMP | GFP_ATOMIC,
+				   efx->rx_buffer_order);
+		if (unlikely(page == NULL))
 			return -ENOMEM;
-
-		dma_addr = pci_map_page(efx->pci_dev, rx_buf->page,
-					0, efx_rx_buf_size(efx),
+		dma_addr = pci_map_page(efx->pci_dev, page, 0,
+					efx_rx_buf_size(efx),
 					PCI_DMA_FROMDEVICE);
-
 		if (unlikely(pci_dma_mapping_error(efx->pci_dev, dma_addr))) {
-			__free_pages(rx_buf->page, efx->rx_buffer_order);
-			rx_buf->page = NULL;
+			__free_pages(page, efx->rx_buffer_order);
 			return -EIO;
 		}
-
-		rx_queue->buf_page = rx_buf->page;
-		rx_queue->buf_dma_addr = dma_addr;
-		rx_queue->buf_data = (page_address(rx_buf->page) +
-				      EFX_PAGE_IP_ALIGN);
-	}
-
-	rx_buf->len = bytes;
-	rx_buf->data = rx_queue->buf_data;
-	offset = efx_rx_buf_offset(rx_buf);
-	rx_buf->dma_addr = rx_queue->buf_dma_addr + offset;
-
-	/* Try to pack multiple buffers per page */
-	if (efx->rx_buffer_order == 0) {
-		/* The next buffer starts on the next 512 byte boundary */
-		rx_queue->buf_data += ((bytes + 0x1ff) & ~0x1ff);
-		offset += ((bytes + 0x1ff) & ~0x1ff);
-
-		space = efx_rx_buf_size(efx) - offset;
-		if (space >= bytes) {
-			/* Refs dropped on kernel releasing each skb */
-			get_page(rx_queue->buf_page);
-			goto out;
+		EFX_BUG_ON_PARANOID(dma_addr & (PAGE_SIZE - 1));
+		page_addr = page_address(page) + EFX_PAGE_IP_ALIGN;
+		dma_addr += EFX_PAGE_IP_ALIGN;
+
+	split:
+		index = rx_queue->added_count & EFX_RXQ_MASK;
+		rx_buf = efx_rx_buffer(rx_queue, index);
+		rx_buf->dma_addr = dma_addr;
+		rx_buf->skb = NULL;
+		rx_buf->page = page;
+		rx_buf->data = page_addr;
+		rx_buf->len = efx->rx_buffer_len - EFX_PAGE_IP_ALIGN;
+		++rx_queue->added_count;
+		++rx_queue->alloc_page_count;
+
+		if ((~count & 1) && (efx->rx_buffer_len < (PAGE_SIZE >> 1))) {
+			/* Use the second half of the page */
+			get_page(page);
+			dma_addr += (PAGE_SIZE >> 1);
+			page_addr += (PAGE_SIZE >> 1);
+			++count;
+			goto split;
 		}
 	}
 
-	/* This is the final RX buffer for this page, so mark it for
-	 * unmapping */
-	rx_queue->buf_page = NULL;
-	rx_buf->unmap_addr = rx_queue->buf_dma_addr;
-
- out:
 	return 0;
 }
 
-/* This allocates memory for a new receive buffer, maps it for DMA,
- * and populates a struct efx_rx_buffer with the relevant
- * information.
- */
-static int efx_init_rx_buffer(struct efx_rx_queue *rx_queue,
-			      struct efx_rx_buffer *new_rx_buf)
-{
-	int rc = 0;
-
-	if (rx_queue->channel->rx_alloc_push_pages) {
-		new_rx_buf->skb = NULL;
-		rc = efx_init_rx_buffer_page(rx_queue, new_rx_buf);
-		rx_queue->alloc_page_count++;
-	} else {
-		new_rx_buf->page = NULL;
-		rc = efx_init_rx_buffer_skb(rx_queue, new_rx_buf);
-		rx_queue->alloc_skb_count++;
-	}
-
-	if (unlikely(rc < 0))
-		EFX_LOG_RL(rx_queue->efx, "%s RXQ[%d] =%d\n", __func__,
-			   rx_queue->queue, rc);
-	return rc;
-}
-
 static void efx_unmap_rx_buffer(struct efx_nic *efx,
 				struct efx_rx_buffer *rx_buf)
 {
 	if (rx_buf->page) {
 		EFX_BUG_ON_PARANOID(rx_buf->skb);
-		if (rx_buf->unmap_addr) {
-			pci_unmap_page(efx->pci_dev, rx_buf->unmap_addr,
+
+		/* Unmap the buffer if there's only one buffer per page(s),
+		 * or this is the second half of a two buffer page. */
+		if (efx->rx_buffer_order != 0 ||
+		    (efx_rx_buf_offset(rx_buf) & (PAGE_SIZE >> 1)) != 0) {
+			pci_unmap_page(efx->pci_dev,
+				       rx_buf->dma_addr & ~(PAGE_SIZE - 1),
 				       efx_rx_buf_size(efx),
 				       PCI_DMA_FROMDEVICE);
-			rx_buf->unmap_addr = 0;
 		}
 	} else if (likely(rx_buf->skb)) {
 		pci_unmap_single(efx->pci_dev, rx_buf->dma_addr,
@@ -286,9 +263,9 @@ static void efx_fini_rx_buffer(struct efx_rx_queue *rx_queue,
  */
 void efx_fast_push_rx_descriptors(struct efx_rx_queue *rx_queue)
 {
-	struct efx_rx_buffer *rx_buf;
-	unsigned fill_level, index;
-	int i, space, rc = 0;
+	struct efx_channel *channel = rx_queue->channel;
+	unsigned fill_level;
+	int space, rc = 0;
 
 	/* Calculate current fill level, and exit if we don't need to fill */
 	fill_level = (rx_queue->added_count - rx_queue->removed_count);
@@ -309,21 +286,18 @@ void efx_fast_push_rx_descriptors(struct efx_rx_queue *rx_queue)
 	EFX_TRACE(rx_queue->efx, "RX queue %d fast-filling descriptor ring from"
 		  " level %d to level %d using %s allocation\n",
 		  rx_queue->queue, fill_level, rx_queue->fast_fill_limit,
-		  rx_queue->channel->rx_alloc_push_pages ? "page" : "skb");
+		  channel->rx_alloc_push_pages ? "page" : "skb");
 
 	do {
-		for (i = 0; i < EFX_RX_BATCH; ++i) {
-			index = rx_queue->added_count & EFX_RXQ_MASK;
-			rx_buf = efx_rx_buffer(rx_queue, index);
-			rc = efx_init_rx_buffer(rx_queue, rx_buf);
-			if (unlikely(rc)) {
-				/* Ensure that we don't leave the rx queue
-				 * empty */
-				if (rx_queue->added_count == rx_queue->removed_count)
-					efx_schedule_slow_fill(rx_queue);
-				goto out;
-			}
-			++rx_queue->added_count;
+		if (channel->rx_alloc_push_pages)
+			rc = efx_init_rx_buffers_page(rx_queue);
+		else
+			rc = efx_init_rx_buffers_skb(rx_queue);
+		if (unlikely(rc)) {
+			/* Ensure that we don't leave the rx queue empty */
+			if (rx_queue->added_count == rx_queue->removed_count)
+				efx_schedule_slow_fill(rx_queue);
+			goto out;
 		}
 	} while ((space -= EFX_RX_BATCH) >= EFX_RX_BATCH);
 
@@ -638,16 +612,6 @@ void efx_fini_rx_queue(struct efx_rx_queue *rx_queue)
 			efx_fini_rx_buffer(rx_queue, rx_buf);
 		}
 	}
-
-	/* For a page that is part-way through splitting into RX buffers */
-	if (rx_queue->buf_page != NULL) {
-		pci_unmap_page(rx_queue->efx->pci_dev, rx_queue->buf_dma_addr,
-			       efx_rx_buf_size(rx_queue->efx),
-			       PCI_DMA_FROMDEVICE);
-		__free_pages(rx_queue->buf_page,
-			     rx_queue->efx->rx_buffer_order);
-		rx_queue->buf_page = NULL;
-	}
 }
 
 void efx_remove_rx_queue(struct efx_rx_queue *rx_queue)
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* Re: [PATCH] use unfair spinlock when running on hypervisor.
From: Eric Dumazet @ 2010-06-01 21:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Andi Kleen, Gleb Natapov, linux-kernel, kvm, hpa, mingo, npiggin,
	tglx, mtosatti, netdev
In-Reply-To: <4C053ACC.5020708@redhat.com>

Le mardi 01 juin 2010 à 19:52 +0300, Avi Kivity a écrit :

> What I'd like to see eventually is a short-term-unfair, long-term-fair 
> spinlock.  Might make sense for bare metal as well.  But it won't be 
> easy to write.
> 

This thread rings a bell here :)

Yes, ticket spinlocks are sometime slower, especially in workloads where
a spinlock needs to be taken several times to handle one unit of work,
and many cpus competing.

We currently have kind of a similar problem in network stack, and we
have a patch to speedup xmit path by an order of magnitude, letting one
cpu (the consumer cpu) to get unfair access to the (ticket) spinlock.
(It can compete with no more than one other cpu)

Boost from ~50.000 to ~600.000 pps on a dual quad core machine (E5450
@3.00GHz) on a particular workload (many cpus want to xmit their
packets)

( patch : http://patchwork.ozlabs.org/patch/53163/ )


It could be possible to write such a generic beast, with a cascade or
regular ticket spinlocks ?

One ticket spinlock at first stage (only if some conditions are met, aka
slow path), then an 'primary' spinlock at second stage.


// generic implementation
// (x86 could use 16bit fields for users_in & user_out)
struct cascade_lock {
	atomic_t 	users_in;
	int		users_out;
	spinlock_t	primlock;
	spinlock_t	slowpathlock; // could be outside of this structure, shared by many 'cascade_locks'
};

/*
 * In kvm case, you might call hypervisor when slowpathlock is about to be taken ?
 * When a cascade lock is unlocked, and relocked right after, this cpu has unfair
 * priority and could get the lock before cpus blocked in slowpathlock (especially if
 * an hypervisor call was done)
 *
 * In network xmit path, the dequeue thread would use highprio_user=true mode
 * In network xmit path, the 'contended' enqueueing thread would set a negative threshold,
 *  to force a 'lowprio_user' mode.
 */
void cascade_lock(struct cascade_lock *l, bool highprio_user, int threshold)
{
	bool slowpath = false;

	atomic_inc(&l->users_in); // no real need for atomic_inc_return()
	if (atomic_read(&l->users_in) - l->users_out > threshold && !highprio_user)) {
		spin_lock(&l->slowpathlock);
		slowpath = true;
	}
	spin_lock(&l->primlock);
	if (slowpath)
		spin_unlock(&l->slowpathlock);
}

void cascade_unlock(struct cascade_lock *l)
{
	l->users_out++;
	spin_unlock(&l->primlock);
}

^ permalink raw reply

* 2.6.32 and Multicast group membership
From: Mr. Berkley Shands @ 2010-06-01 21:39 UTC (permalink / raw)
  To: Net Dev

starting in 2.6.32, my multicast connections stop getting data after 
60-100 seconds.
The identical user code works fine under 2.6.22 through 2.6.31.

The NIC (an intel 82586) has two ports on the same subnet
(eth0 at 172.16.21.55/24 and eth1 at 172.16.21.56/24)

      if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, (char*)&req, 
sizeof(req)))
      {
         perror("setsockopt IP_ADD_MEMBERSHIP failed");
         ::exit(-1);
      }

If I do ADD_MEMBERSHIP on just one of these interfaces, the sockets 
still get data.
But if I join on both interfaces, one or both will stop getting packets
after 60-100 seconds. Sniffing with tcpdump shows the Cisco layer 3 switch
is not getting its responses back to keep the multicast group open.
The HP layer 2 switch does not seem to care, it keeps the data flowing 
regardless
of which physical port the join is executed on.

Changing IP_MULTICAST_ALL has no effect :-(
Did I miss something? New code that I have to specify to keep the Cisco 
happy?

tia

Berkley


-- 

// E. F. Berkley Shands, MSc//

** Exegy Inc.**

349 Marshall Road, Suite 100

St. Louis , MO  63119

Direct:  (314) 218-3600 X450

Cell:  (314) 303-2546

Office:  (314) 218-3600

Fax:  (314) 218-3601

 

The Usual Disclaimer follows...

 



^ permalink raw reply

* Re: [PATCHv2] Refactor update of IPv6 flow destination address for srcrt (RH) option
From: Arnaud Ebalard @ 2010-06-01 21:58 UTC (permalink / raw)
  To: Joe Perches
  Cc: David S. Miller,
	YOSHIFUJI Hideaki / 吉藤英明, netdev
In-Reply-To: <1275422925.19372.114.camel@Joe-Laptop.home>

Hi,

Joe Perches <joe@perches.com> writes:

>> +static inline struct in6_addr *srcrt_dst_flow_update(struct in6_addr *final,
>> +						     struct in6_addr *fl6dst,
>> +						     const struct ipv6_txoptions *opt)
>> +{
>> +	if (opt && opt->srcrt) {
>> +		const struct rt0_hdr *rt0 = (struct rt0_hdr *)opt->srcrt;
>> +		ipv6_addr_copy(final, fl6dst);
>> +		ipv6_addr_copy(fl6dst, rt0->addr);
>> +		return final;
>> +	}
>> +	return NULL;
>> +}
>> +
>
> Should this be inline?  Maybe it'd be better moved to exthdrs.c

I thought it was small enough to keep it inline in ipv6.h but removing
the inline and moving it to exthdrs.c makes sense. 

> Perhaps this is clearer as something like:
>
> /**
>  * fl6_update_dst - update flow destination address
>  *
>  * @fl: flowlabel fl_dst to update
>  * @opt: struct ipv6_txoptions
>  * @orig: original fl_dst if modified
>  *
>  * Returns NULL if no txoptions or no srcrt, otherwise
>  * returns orig and initial value of fl->fl6_dst set in orig
>  */
> struct in6_addr *fl6_update_dst(struct ip6_flowlabel *fl,
> 				const struct ipv6_txoptions *opt,
> 				struct in6_addr *orig)
> {
> 	if (!opt || !opt->srcrt)
> 		return NULL;
>
> 	ipv6_addr_copy(orig, &fl->fl6_dst);
> 	ipv6_addr_copy(&fl->fl6_dst, ((struct rt0_hdr *)opt->srcrt)->addr);
> 	return orig;
> }

I'll send an updated new version tomorrow, i.e. in a few hours (with
s/ip6_flowlabel/flowi/).

Thanks for your feedback.

Cheers,

a+

^ permalink raw reply

* Re: [PATCH] IP: Increment INADDRERRORS if routing for a packet is not successful
From: Eric Dumazet @ 2010-06-01 22:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: netdev, Stephen Hemminger, David Miller
In-Reply-To: <alpine.DEB.2.00.1006011609280.9438@router.home>

Le mardi 01 juin 2010 à 16:13 -0500, Christoph Lameter a écrit :
> Something like this would have been very helpful during recent debugging
> of multicast issues. Silent discards are bad.
> 

> 
> If the kernel perceives that something is wrong with an incoming packet then the
> IP stack currently silently discards packets. This makes it difficult to diagnose
> problems with the network configurations (such as a misbehaving kernel
> subsystem discarding multicast packets because the reverse path filter
> does not like multicast subscriptions on the second NIC with rp_filter=1).
> 
> It is also necessary to know how many inbound packets are discarded to
> assess networking issues in general with a NIC.
> 
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> 

I disagree with this patch.

IPSTATS_MIB_INADDRERRORS has a strong meaning, part of RFCS.

In this path, we simulate the routing of a virtual packet, not its
delivery.

This should not affect IPSTATS SNMP entries.

You should use another MIB entry, say LINUX_MIB_INROUTEERRORS ?

Dont inet_rtm_getroute() caller gets an error status anyway ?

> ---
>  net/ipv4/route.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6/net/ipv4/route.c
> ===================================================================
> --- linux-2.6.orig/net/ipv4/route.c	2010-06-01 11:46:10.000000000 -0500
> +++ linux-2.6/net/ipv4/route.c	2010-06-01 11:52:55.000000000 -0500
> @@ -2981,6 +2981,9 @@ static int inet_rtm_getroute(struct sk_b
>  		rt = skb_rtable(skb);
>  		if (err == 0 && rt->u.dst.error)
>  			err = -rt->u.dst.error;
> +		if (err)
> +			IP_INC_STATS_BH(dev_net(skb->dev),
> +					IPSTATS_MIB_INADDRERRORS);
>  	} else {
>  		struct flowi fl = {
>  			.nl_u = {
> 
> 



^ permalink raw reply

* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: David Miller @ 2010-06-01 22:15 UTC (permalink / raw)
  To: richih.mailinglist
  Cc: ben, paulus, netdev, linux-ppp, alan, patrakov, linux-kernel
In-Reply-To: <AANLkTinpsIs7qMSpQ78fWCTHh03iLBIwRlsU7sqJau08@mail.gmail.com>

From: Richard Hartmann <richih.mailinglist@gmail.com>
Date: Tue, 1 Jun 2010 13:28:59 +0200

> Maybe not a bug in the Linux kernel itself, but certainly in the real world
> that exists around Linux. Similar to how a change to a device driver that
> is needed to work around broken hardware is a bug fix, imo.

It's not the same situation at all.

It is easier to fix misconfigured products that exist because of
software and configurations than it is to fix a physical piece of
hardware.

So you could work around it if you wanted to.

I definitely don't see this as -stable material, as a result.  We will
push it to net-next-2.6 and it will thus hit 2.6.36 as previously
mentioned.

^ permalink raw reply

* Re: [PATCH] IP: Increment INADDRERRORS if routing for a packet is not successful
From: David Miller @ 2010-06-01 22:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: cl, netdev, shemminger
In-Reply-To: <1275430054.2638.115.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 02 Jun 2010 00:07:34 +0200

> IPSTATS_MIB_INADDRERRORS has a strong meaning, part of RFCS.
> 
> In this path, we simulate the routing of a virtual packet, not its
> delivery.
> 
> This should not affect IPSTATS SNMP entries.
> 
> You should use another MIB entry, say LINUX_MIB_INROUTEERRORS ?

Agreed.

^ permalink raw reply

* Re: [2.6.35-rc1 regression] tg3 vpd r/w failed
From: David Miller @ 2010-06-01 22:32 UTC (permalink / raw)
  To: mikpe; +Cc: netdev, mcarlson, linux-kernel, sparclinux
In-Reply-To: <19461.26529.591960.912204@pilspetsen.it.uu.se>


Mikael I'm getting tired of continuing to add the proper
mailing list CC: for your sparc bug reports.  Please make
sure to add sparclinux@vger.kernel.org to your reports
in the future.

Thanks.

> Booting 2.6.35-rc1 on a Sun Blade 2500 Red with builtin tg3 I get:
> 
> tg3.c:v3.110 (April 9, 2010)
> PCI: Enabling device: (0000:00:03.0), cmd 2
> ...
> tg3 0000:00:03.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
> tg3 0000:00:03.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
> tg3 0000:00:03.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
> tg3 0000:00:03.0: eth0: Tigon3 [partno(none) rev 1002] (PCI:66MHz:64-bit) MAC address ...
> tg3 0000:00:03.0: eth0: attached PHY is 5703 (10/100/1000Base-T Ethernet) (WireSpeed[1])
> tg3 0000:00:03.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> tg3 0000:00:03.0: eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
> ...
> tg3 0000:00:03.0: eth0: No firmware running
> tg3 0000:00:03.0: eth0: Link is up at 100 Mbps, full duplex
> tg3 0000:00:03.0: eth0: Flow control is on for TX and on for RX
> 
> The three 'vpd r/w failed' messages are new and did not occur with 2.6.34
> or earlier kernels.
> 
> /Mikael

^ permalink raw reply

* Re: [PATCH] phylib: Add support for the LXT973 phy.
From: Andy Fleming @ 2010-06-01 22:39 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev
In-Reply-To: <20100531130932.GA15845@riccoc20.at.omicron.at>

On Mon, May 31, 2010 at 8:09 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> This patch implements a work around for Erratum 5, "3.3 V Fiber Speed
> Selection." If the hardware wiring does not respect this erratum, then
> fiber optic mode will not work properly.
>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>


>
> +static int lxt973_probe(struct phy_device *phydev)
> +{
> +       int val = phy_read(phydev, MII_LXT973_PCR);
> +
> +       if (val & PCR_FIBER_SELECT) {
> +               /*
> +                * If fiber is selected, then the only correct setting
> +                * is 100Mbps, full duplex, and auto negotiation off.
> +                */
> +               val = phy_read(phydev, MII_BMCR);
> +               val |= (BMCR_SPEED100 | BMCR_FULLDPLX);
> +               val &= ~BMCR_ANENABLE;
> +               phy_write(phydev, MII_BMCR, val);
> +               /* Remember that the port is in fiber mode. */
> +               phydev->priv = lxt973_probe;


That's a bit hacky.  There is a dev_flags field, which could be used
for this.  Probably, we should add a more general way of saying what
sort of port this is.  But don't use the presence and absence of
"priv", as it could one day get used for a different purpose, and this
seems like it would leave us open to strange bugs.

Also, is this erratum true for all lxt973 models, or is it fixed in
some revisions?


Andy

^ permalink raw reply

* Re: [PATCH] [ath5k][leds] Ability to disable leds support. If leds support enabled do not force mac802.11 leds layer selection.
From: Dmytro Milinevskyy @ 2010-06-01 23:01 UTC (permalink / raw)
  To: Bob Copeland
  Cc: ath5k-devel, Jiri Slaby, Nick Kossifidis, Luis R. Rodriguez,
	John W. Linville, GeunSik Lim, Greg Kroah-Hartman, Lukas Turek,
	Mark Hindley, Johannes Berg, Jiri Kosina, Kalle Valo, Keng-Yu Lin,
	Luca Verdesca, Shahar Or, linux-wireless, netdev, linux-kernel
In-Reply-To: <AANLkTinFCuqgvnEGFmssL5R1J9-Yx7JB8mO9yKdXuYUj@mail.gmail.com>

Bob, thanks for the response.

I will rework the patch.

>> -/* GPIO-controlled software LED */
>> -#define AR5K_SOFTLED_PIN       0
>> -#define AR5K_SOFTLED_ON                0
>> -#define AR5K_SOFTLED_OFF       1
>> -
>
> Please drop this hunk, no problem keeping it around.

I suppose this should go away with another patch to keep current
clean. These dfinitions are not related to the subject.

Regards,

-- Dima

On Tue, Jun 1, 2010 at 11:34 PM, Bob Copeland <me@bobcopeland.com> wrote:
> On Wed, Apr 7, 2010 at 2:58 PM, Dmytro Milinevskyy
> <milinevskyy@gmail.com> wrote:
>> Hello!
>
> Thanks, comments inline.
>
>
>> +config ATH5K_LEDS
>> +       tristate "Atheros 5xxx wireless cards LEDs support"
>> +       depends on ATH5K
>> +       ---help---
>> +         Atheros 5xxx LED support.
>> +
>
> This can select the proper LED classes?  Then you can get rid of another
> ifdef check later.
>
>> -/* GPIO-controlled software LED */
>> -#define AR5K_SOFTLED_PIN       0
>> -#define AR5K_SOFTLED_ON                0
>> -#define AR5K_SOFTLED_OFF       1
>> -
>
> Please drop this hunk, no problem keeping it around.
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>  /* LED functions */
>>  int ath5k_init_leds(struct ath5k_softc *sc);
>>  void ath5k_led_enable(struct ath5k_softc *sc);
>>  void ath5k_led_off(struct ath5k_softc *sc);
>>  void ath5k_unregister_leds(struct ath5k_softc *sc);
>> +#else
>> +#define ath5k_init_leds(sc) do {} while (0)
>> +#define ath5k_led_enable(sc) do {} while (0)
>> +#define ath5k_led_off(sc) do {} while (0)
>> +#define ath5k_unregister_leds(sc) do {} while (0)
>> +#endif
>
> I prefer:
>
> #ifdef
> ...
> #else
> static inline int ath5k_init_leds(struct ath5k_softc *sc)
> {
>    return 0;
> }
> ...
> #endif
>
> so you get type-checking.  Also this doesn't quite work in your version:
>
>    int foo = ath5k_init_leds(sc);
>
>> +#ifdef CONFIG_ATH5K_LEDS
>>  /*
>>  * State for LED triggers
>>  */
>>  struct ath5k_led
>>  {
>> -       char name[ATH5K_LED_MAX_NAME_LEN + 1];  /* name of the LED in sysfs */
>>        struct ath5k_softc *sc;                 /* driver state */
>> +#ifdef CONFIG_LEDS_CLASS
>> +       char name[ATH5K_LED_MAX_NAME_LEN + 1];  /* name of the LED in sysfs */
>>        struct led_classdev led_dev;            /* led classdev */
>> +#endif
>>  };
>> +#endif
>
> Why move name?
>
>>  /* Rfkill */
>>  struct ath5k_rfkill {
>> @@ -186,9 +190,6 @@ struct ath5k_softc {
>>
>>        u8                      bssidmask[ETH_ALEN];
>>
>> -       unsigned int            led_pin,        /* GPIO pin for driving LED */
>> -                               led_on;         /* pin setting for LED on */
>> -
>>        struct tasklet_struct   restq;          /* reset tasklet */
>>
>>        unsigned int            rxbufsize;      /* rx size based on mtu */
>> @@ -196,7 +197,6 @@ struct ath5k_softc {
>>        spinlock_t              rxbuflock;
>>        u32                     *rxlink;        /* link ptr in last RX desc */
>>        struct tasklet_struct   rxtq;           /* rx intr tasklet */
>> -       struct ath5k_led        rx_led;         /* rx led */
>>
>>        struct list_head        txbuf;          /* transmit buffer */
>>        spinlock_t              txbuflock;
>> @@ -204,7 +204,14 @@ struct ath5k_softc {
>>        struct ath5k_txq        txqs[AR5K_NUM_TX_QUEUES];       /* tx queues */
>>        struct ath5k_txq        *txq;           /* main tx queue */
>>        struct tasklet_struct   txtq;           /* tx intr tasklet */
>> +
>> +
>> +#ifdef CONFIG_ATH5K_LEDS
>> +       unsigned int            led_pin,        /* GPIO pin for driving LED */
>> +                               led_on;         /* pin setting for LED on */
>> +       struct ath5k_led        rx_led;         /* rx led */
>>        struct ath5k_led        tx_led;         /* tx led */
>> +#endif
>
> You might want to do this in two stages: move the led-dependent things
> together in the structure (or into a separate structure) and then only
> have one #ifdef section.
>
> Still too many ifdefs in general.
>
> --
> Bob Copeland %% www.bobcopeland.com
>

^ permalink raw reply

* Re: [2.6.35-rc1 regression] tg3 vpd r/w failed
From: Matt Carlson @ 2010-06-01 23:28 UTC (permalink / raw)
  To: David Miller
  Cc: mikpe@it.uu.se, netdev@vger.kernel.org, Matthew Carlson,
	linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org
In-Reply-To: <20100601.153209.135521213.davem@davemloft.net>

On Tue, Jun 01, 2010 at 03:32:09PM -0700, David Miller wrote:
> 
> Mikael I'm getting tired of continuing to add the proper
> mailing list CC: for your sparc bug reports.  Please make
> sure to add sparclinux@vger.kernel.org to your reports
> in the future.
> 
> Thanks.
> 
> > Booting 2.6.35-rc1 on a Sun Blade 2500 Red with builtin tg3 I get:
> > 
> > tg3.c:v3.110 (April 9, 2010)
> > PCI: Enabling device: (0000:00:03.0), cmd 2
> > ...
> > tg3 0000:00:03.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
> > tg3 0000:00:03.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
> > tg3 0000:00:03.0: vpd r/w failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update.
> > tg3 0000:00:03.0: eth0: Tigon3 [partno(none) rev 1002] (PCI:66MHz:64-bit) MAC address ...
> > tg3 0000:00:03.0: eth0: attached PHY is 5703 (10/100/1000Base-T Ethernet) (WireSpeed[1])
> > tg3 0000:00:03.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> > tg3 0000:00:03.0: eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
> > ...
> > tg3 0000:00:03.0: eth0: No firmware running
> > tg3 0000:00:03.0: eth0: Link is up at 100 Mbps, full duplex
> > tg3 0000:00:03.0: eth0: Flow control is on for TX and on for RX
> > 
> > The three 'vpd r/w failed' messages are new and did not occur with 2.6.34
> > or earlier kernels.

These messages are originating from the PCI layer.  It looks like your
device does not have NVRAM, which is a known issue.  I have an idea on
how to reduce the noise, but let me investigate it a little more before
I commit to it.

^ permalink raw reply

* RE: IAMT broken by commit 82776a4bcd7aa5fbcd2e6339b3ce88b727dd40ab
From: Allan, Bruce W @ 2010-06-01 23:33 UTC (permalink / raw)
  To: Aurelien Jarno, Kirsher, Jeffrey T; +Cc: netdev@vger.kernel.org
In-Reply-To: <20100530010219.GA20368@volta.aurel32.net>

I will look into this in the next couple days.

-----Original Message-----
From: Aurelien Jarno [mailto:aurelien@aurel32.net] 
Sent: Saturday, May 29, 2010 6:02 PM
To: Allan, Bruce W; Kirsher, Jeffrey T
Cc: netdev@vger.kernel.org
Subject: IAMT broken by commit 82776a4bcd7aa5fbcd2e6339b3ce88b727dd40ab

Hi,

I have recently upgrade my kernel, and found that Intel AMT support is
not working anymore as expected. I have configured IAMT so that is 
always available, even when the machine is off ("Desktop: ON in S0, S3,
S4-5").

On recent kernels, IAMT support does not work after the machine has 
been powered-off. Even worse, it also goes into this state when I try
to reboot it.

I have done a bisect and got this commit:

| commit 82776a4bcd7aa5fbcd2e6339b3ce88b727dd40ab
| Author: Bruce Allan <bruce.w.allan@intel.com>
| Date:   Fri Aug 14 14:35:33 2009 +0000
| 
|     e1000e: WoL does not work on 82577/82578 with manageability enabled
|     
|     With manageability (Intel AMT) enabled via BIOS, PHY wakeup does not get
|     configured on newer parts which use PHY wakeup vs. MAC wakeup which causes
|     WoL to not work.  The driver should configure PHY wakeup whether or not
|     manageability is enabled.
|     
|     Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
|     Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|     Signed-off-by: David S. Miller <davem@davemloft.net>

I have tried to revert it on recent kernels (2.6.34), and IAMT is then
working as expected. My machine is using a Gigabyte EQ45M-S2 motherboard
with an 82567LM-3 ethernet chip (8086:10de), that is a different model
than the one of the original problem.

I do wonder if the changes in the patch should not only be done on some 
chip models, and I will appreciate any help in fixing this issue.

Thanks,
Aurelien


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply

* Re: IAMT broken by commit 82776a4bcd7aa5fbcd2e6339b3ce88b727dd40ab
From: Kirsher, Jeffrey T @ 2010-06-01 23:35 UTC (permalink / raw)
  To: Allan, Bruce W, 'aurelien@aurel32.net'
  Cc: 'netdev@vger.kernel.org'

Thanks.

--
Cheers,
Jeff


-----Original Message-----
From: Allan, Bruce W <bruce.w.allan@intel.com>
To: Aurelien Jarno <aurelien@aurel32.net>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
CC: netdev@vger.kernel.org <netdev@vger.kernel.org>
Sent: Tue Jun 01 16:33:27 2010
Subject: RE: IAMT broken by commit 82776a4bcd7aa5fbcd2e6339b3ce88b727dd40ab

I will look into this in the next couple days.

-----Original Message-----
From: Aurelien Jarno [mailto:aurelien@aurel32.net] 
Sent: Saturday, May 29, 2010 6:02 PM
To: Allan, Bruce W; Kirsher, Jeffrey T
Cc: netdev@vger.kernel.org
Subject: IAMT broken by commit 82776a4bcd7aa5fbcd2e6339b3ce88b727dd40ab

Hi,

I have recently upgrade my kernel, and found that Intel AMT support is
not working anymore as expected. I have configured IAMT so that is 
always available, even when the machine is off ("Desktop: ON in S0, S3,
S4-5").

On recent kernels, IAMT support does not work after the machine has 
been powered-off. Even worse, it also goes into this state when I try
to reboot it.

I have done a bisect and got this commit:

| commit 82776a4bcd7aa5fbcd2e6339b3ce88b727dd40ab
| Author: Bruce Allan <bruce.w.allan@intel.com>
| Date:   Fri Aug 14 14:35:33 2009 +0000
| 
|     e1000e: WoL does not work on 82577/82578 with manageability enabled
|     
|     With manageability (Intel AMT) enabled via BIOS, PHY wakeup does not get
|     configured on newer parts which use PHY wakeup vs. MAC wakeup which causes
|     WoL to not work.  The driver should configure PHY wakeup whether or not
|     manageability is enabled.
|     
|     Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
|     Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|     Signed-off-by: David S. Miller <davem@davemloft.net>

I have tried to revert it on recent kernels (2.6.34), and IAMT is then
working as expected. My machine is using a Gigabyte EQ45M-S2 motherboard
with an 82567LM-3 ethernet chip (8086:10de), that is a different model
than the one of the original problem.

I do wonder if the changes in the patch should not only be done on some 
chip models, and I will appreciate any help in fixing this issue.

Thanks,
Aurelien


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply

* Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost workers
From: Tejun Heo @ 2010-06-01 23:59 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Michael S. Tsirkin, Oleg Nesterov, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <1275412754.989.10.camel@w-sridhar.beaverton.ibm.com>

On 06/01/2010 07:19 PM, Sridhar Samudrala wrote:
>> -	int i;
>> +	cpumask_var_t mask;
>> +	int i, ret = -ENOMEM;
>> +
>> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
>> +		goto out_free_mask;
> 
> I think this is another bug in the error path. You should simply
> do a return instead of a goto here when aloc_cpu_mask fails.

Oh... it's always safe to call free_cpumask_var() after failed
alloc_cpumask_var(), so that part isn't broken.

Thanks.

-- 
tejun

^ permalink raw reply

* [PATCH net-2.6] bnx2: Fix hang during rmmod bnx2.
From: Michael Chan @ 2010-06-02  1:05 UTC (permalink / raw)
  To: davem; +Cc: netdev

The regression is caused by:

commit 4327ba435a56ada13eedf3eb332e583c7a0586a9
    bnx2: Fix netpoll crash.

If ->open() and ->close() are called multiple times, the same napi structs
will be added to dev->napi_list multiple times, corrupting the dev->napi_list.
This causes free_netdev() to hang during rmmod.

We fix this by calling netif_napi_del() during ->close().

Also, bnx2_init_napi() must not be in the __devinit section since it is
called by ->open().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
---
 drivers/net/bnx2.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 188e356..949d7a9 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -247,6 +247,7 @@ static const struct flash_spec flash_5709 = {
 MODULE_DEVICE_TABLE(pci, bnx2_pci_tbl);
 
 static void bnx2_init_napi(struct bnx2 *bp);
+static void bnx2_del_napi(struct bnx2 *bp);
 
 static inline u32 bnx2_tx_avail(struct bnx2 *bp, struct bnx2_tx_ring_info *txr)
 {
@@ -6270,6 +6271,7 @@ open_err:
 	bnx2_free_skbs(bp);
 	bnx2_free_irq(bp);
 	bnx2_free_mem(bp);
+	bnx2_del_napi(bp);
 	return rc;
 }
 
@@ -6537,6 +6539,7 @@ bnx2_close(struct net_device *dev)
 	bnx2_free_irq(bp);
 	bnx2_free_skbs(bp);
 	bnx2_free_mem(bp);
+	bnx2_del_napi(bp);
 	bp->link_up = 0;
 	netif_carrier_off(bp->dev);
 	bnx2_set_power_state(bp, PCI_D3hot);
@@ -8227,7 +8230,16 @@ bnx2_bus_string(struct bnx2 *bp, char *str)
 	return str;
 }
 
-static void __devinit
+static void
+bnx2_del_napi(struct bnx2 *bp)
+{
+	int i;
+
+	for (i = 0; i < bp->irq_nvecs; i++)
+		netif_napi_del(&bp->bnx2_napi[i].napi);
+}
+
+static void
 bnx2_init_napi(struct bnx2 *bp)
 {
 	int i;
-- 
1.6.4.GIT



^ permalink raw reply related

* Re: [PATCH]: vxge: Fix checkstack warning in vxge_probe()
From: Jon Mason @ 2010-06-02  3:03 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: netdev, mschmidt
In-Reply-To: <20100528175735.30903.21134.sendpatchset@prarit.bos.redhat.com>

On Fri, 2010-05-28 at 14:01 -0400, Prarit Bhargava wrote:
> Linux 2.6.33 reports this checkstack warning:
> 
> drivers/net/vxge/vxge-main.c: In function 'vxge_probe':
> drivers/net/vxge/vxge-main.c:4409: warning: the frame size of 1028 bytes is larger than 1024 bytes
> 
> This warning does not occur in the latest linux-2.6 or linux-next, however,
> when I do a 'make -j32 CONFIG_FRAME_WARN=512' instead of 1024 I see
> 
> drivers/net/vxge/vxge-main.c: In function ‘vxge_probe’:
> drivers/net/vxge/vxge-main.c:4423: warning: the frame size of 1024 bytes is larger than 512 bytes
> 
> This patch moves the large vxge_config struct off the stack.

The patch is sufficient, but I think it doesn't go far enough.  The
struct vxge_config in struct vxgedev should be malloc'd as well.  struct
vxgedev is entirely too big and needs to be smaller, for cache perf and
other reasons.

Thanks,
Jon

> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> 
> diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
> index 2bab364..b955802 100644
> --- a/drivers/net/vxge/vxge-main.c
> +++ b/drivers/net/vxge/vxge-main.c
> @@ -4016,7 +4016,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  	int high_dma = 0;
>  	u64 vpath_mask = 0;
>  	struct vxgedev *vdev;
> -	struct vxge_config ll_config;
> +	struct vxge_config *ll_config = NULL;
>  	struct vxge_hw_device_config *device_config = NULL;
>  	struct vxge_hw_device_attr attr;
>  	int i, j, no_of_vpath = 0, max_vpath_supported = 0;
> @@ -4075,17 +4075,24 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  		goto _exit0;
>  	}
>  
> -	memset(&ll_config, 0, sizeof(struct vxge_config));
> -	ll_config.tx_steering_type = TX_MULTIQ_STEERING;
> -	ll_config.intr_type = MSI_X;
> -	ll_config.napi_weight = NEW_NAPI_WEIGHT;
> -	ll_config.rth_steering = RTH_STEERING;
> +	ll_config = kzalloc(sizeof(*ll_config), GFP_KERNEL);
> +	if (!ll_config) {
> +		ret = -ENOMEM;
> +		vxge_debug_init(VXGE_ERR,
> +			"ll_config : malloc failed %s %d",
> +			__FILE__, __LINE__);
> +		goto _exit0;
> +	}
> +	ll_config->tx_steering_type = TX_MULTIQ_STEERING;
> +	ll_config->intr_type = MSI_X;
> +	ll_config->napi_weight = NEW_NAPI_WEIGHT;
> +	ll_config->rth_steering = RTH_STEERING;
>  
>  	/* get the default configuration parameters */
>  	vxge_hw_device_config_default_get(device_config);
>  
>  	/* initialize configuration parameters */
> -	vxge_device_config_init(device_config, &ll_config.intr_type);
> +	vxge_device_config_init(device_config, &ll_config->intr_type);
>  
>  	ret = pci_enable_device(pdev);
>  	if (ret) {
> @@ -4138,7 +4145,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  		(unsigned long long)pci_resource_start(pdev, 0));
>  
>  	status = vxge_hw_device_hw_info_get(attr.bar0,
> -			&ll_config.device_hw_info);
> +			&ll_config->device_hw_info);
>  	if (status != VXGE_HW_OK) {
>  		vxge_debug_init(VXGE_ERR,
>  			"%s: Reading of hardware info failed."
> @@ -4147,7 +4154,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  		goto _exit3;
>  	}
>  
> -	if (ll_config.device_hw_info.fw_version.major !=
> +	if (ll_config->device_hw_info.fw_version.major !=
>  		VXGE_DRIVER_FW_VERSION_MAJOR) {
>  		vxge_debug_init(VXGE_ERR,
>  			"%s: Incorrect firmware version."
> @@ -4157,7 +4164,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  		goto _exit3;
>  	}
>  
> -	vpath_mask = ll_config.device_hw_info.vpath_mask;
> +	vpath_mask = ll_config->device_hw_info.vpath_mask;
>  	if (vpath_mask == 0) {
>  		vxge_debug_ll_config(VXGE_TRACE,
>  			"%s: No vpaths available in device", VXGE_DRIVER_NAME);
> @@ -4169,10 +4176,10 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  		"%s:%d  Vpath mask = %llx", __func__, __LINE__,
>  		(unsigned long long)vpath_mask);
>  
> -	function_mode = ll_config.device_hw_info.function_mode;
> -	host_type = ll_config.device_hw_info.host_type;
> +	function_mode = ll_config->device_hw_info.function_mode;
> +	host_type = ll_config->device_hw_info.host_type;
>  	is_privileged = __vxge_hw_device_is_privilaged(host_type,
> -		ll_config.device_hw_info.func_id);
> +		ll_config->device_hw_info.func_id);
>  
>  	/* Check how many vpaths are available */
>  	for (i = 0; i < VXGE_HW_MAX_VIRTUAL_PATHS; i++) {
> @@ -4186,7 +4193,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  
>  	/* Enable SRIOV mode, if firmware has SRIOV support and if it is a PF */
>  	if (is_sriov(function_mode) && (max_config_dev > 1) &&
> -		(ll_config.intr_type != INTA) &&
> +		(ll_config->intr_type != INTA) &&
>  		(is_privileged == VXGE_HW_OK)) {
>  		ret = pci_enable_sriov(pdev, ((max_config_dev - 1) < num_vfs)
>  			? (max_config_dev - 1) : num_vfs);
> @@ -4199,7 +4206,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  	 * Configure vpaths and get driver configured number of vpaths
>  	 * which is less than or equal to the maximum vpaths per function.
>  	 */
> -	no_of_vpath = vxge_config_vpaths(device_config, vpath_mask, &ll_config);
> +	no_of_vpath = vxge_config_vpaths(device_config, vpath_mask, ll_config);
>  	if (!no_of_vpath) {
>  		vxge_debug_ll_config(VXGE_ERR,
>  			"%s: No more vpaths to configure", VXGE_DRIVER_NAME);
> @@ -4234,21 +4241,21 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  	/* set private device info */
>  	pci_set_drvdata(pdev, hldev);
>  
> -	ll_config.gro_enable = VXGE_GRO_ALWAYS_AGGREGATE;
> -	ll_config.fifo_indicate_max_pkts = VXGE_FIFO_INDICATE_MAX_PKTS;
> -	ll_config.addr_learn_en = addr_learn_en;
> -	ll_config.rth_algorithm = RTH_ALG_JENKINS;
> -	ll_config.rth_hash_type_tcpipv4 = VXGE_HW_RING_HASH_TYPE_TCP_IPV4;
> -	ll_config.rth_hash_type_ipv4 = VXGE_HW_RING_HASH_TYPE_NONE;
> -	ll_config.rth_hash_type_tcpipv6 = VXGE_HW_RING_HASH_TYPE_NONE;
> -	ll_config.rth_hash_type_ipv6 = VXGE_HW_RING_HASH_TYPE_NONE;
> -	ll_config.rth_hash_type_tcpipv6ex = VXGE_HW_RING_HASH_TYPE_NONE;
> -	ll_config.rth_hash_type_ipv6ex = VXGE_HW_RING_HASH_TYPE_NONE;
> -	ll_config.rth_bkt_sz = RTH_BUCKET_SIZE;
> -	ll_config.tx_pause_enable = VXGE_PAUSE_CTRL_ENABLE;
> -	ll_config.rx_pause_enable = VXGE_PAUSE_CTRL_ENABLE;
> -
> -	if (vxge_device_register(hldev, &ll_config, high_dma, no_of_vpath,
> +	ll_config->gro_enable = VXGE_GRO_ALWAYS_AGGREGATE;
> +	ll_config->fifo_indicate_max_pkts = VXGE_FIFO_INDICATE_MAX_PKTS;
> +	ll_config->addr_learn_en = addr_learn_en;
> +	ll_config->rth_algorithm = RTH_ALG_JENKINS;
> +	ll_config->rth_hash_type_tcpipv4 = VXGE_HW_RING_HASH_TYPE_TCP_IPV4;
> +	ll_config->rth_hash_type_ipv4 = VXGE_HW_RING_HASH_TYPE_NONE;
> +	ll_config->rth_hash_type_tcpipv6 = VXGE_HW_RING_HASH_TYPE_NONE;
> +	ll_config->rth_hash_type_ipv6 = VXGE_HW_RING_HASH_TYPE_NONE;
> +	ll_config->rth_hash_type_tcpipv6ex = VXGE_HW_RING_HASH_TYPE_NONE;
> +	ll_config->rth_hash_type_ipv6ex = VXGE_HW_RING_HASH_TYPE_NONE;
> +	ll_config->rth_bkt_sz = RTH_BUCKET_SIZE;
> +	ll_config->tx_pause_enable = VXGE_PAUSE_CTRL_ENABLE;
> +	ll_config->rx_pause_enable = VXGE_PAUSE_CTRL_ENABLE;
> +
> +	if (vxge_device_register(hldev, ll_config, high_dma, no_of_vpath,
>  		&vdev)) {
>  		ret = -EINVAL;
>  		goto _exit4;
> @@ -4279,7 +4286,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  		vdev->vpaths[j].vdev = vdev;
>  		vdev->vpaths[j].max_mac_addr_cnt = max_mac_vpath;
>  		memcpy((u8 *)vdev->vpaths[j].macaddr,
> -				(u8 *)ll_config.device_hw_info.mac_addrs[i],
> +				ll_config->device_hw_info.mac_addrs[i],
>  				ETH_ALEN);
>  
>  		/* Initialize the mac address list header */
> @@ -4300,18 +4307,18 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  
>  	macaddr = (u8 *)vdev->vpaths[0].macaddr;
>  
> -	ll_config.device_hw_info.serial_number[VXGE_HW_INFO_LEN - 1] = '\0';
> -	ll_config.device_hw_info.product_desc[VXGE_HW_INFO_LEN - 1] = '\0';
> -	ll_config.device_hw_info.part_number[VXGE_HW_INFO_LEN - 1] = '\0';
> +	ll_config->device_hw_info.serial_number[VXGE_HW_INFO_LEN - 1] = '\0';
> +	ll_config->device_hw_info.product_desc[VXGE_HW_INFO_LEN - 1] = '\0';
> +	ll_config->device_hw_info.part_number[VXGE_HW_INFO_LEN - 1] = '\0';
>  
>  	vxge_debug_init(VXGE_TRACE, "%s: SERIAL NUMBER: %s",
> -		vdev->ndev->name, ll_config.device_hw_info.serial_number);
> +		vdev->ndev->name, ll_config->device_hw_info.serial_number);
>  
>  	vxge_debug_init(VXGE_TRACE, "%s: PART NUMBER: %s",
> -		vdev->ndev->name, ll_config.device_hw_info.part_number);
> +		vdev->ndev->name, ll_config->device_hw_info.part_number);
>  
>  	vxge_debug_init(VXGE_TRACE, "%s: Neterion %s Server Adapter",
> -		vdev->ndev->name, ll_config.device_hw_info.product_desc);
> +		vdev->ndev->name, ll_config->device_hw_info.product_desc);
>  
>  	vxge_debug_init(VXGE_TRACE, "%s: MAC ADDR: %pM",
>  		vdev->ndev->name, macaddr);
> @@ -4321,11 +4328,11 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  
>  	vxge_debug_init(VXGE_TRACE,
>  		"%s: Firmware version : %s Date : %s", vdev->ndev->name,
> -		ll_config.device_hw_info.fw_version.version,
> -		ll_config.device_hw_info.fw_date.date);
> +		ll_config->device_hw_info.fw_version.version,
> +		ll_config->device_hw_info.fw_date.date);
>  
>  	if (new_device) {
> -		switch (ll_config.device_hw_info.function_mode) {
> +		switch (ll_config->device_hw_info.function_mode) {
>  		case VXGE_HW_FUNCTION_MODE_SINGLE_FUNCTION:
>  			vxge_debug_init(VXGE_TRACE,
>  			"%s: Single Function Mode Enabled", vdev->ndev->name);
> @@ -4348,7 +4355,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  	vxge_print_parm(vdev, vpath_mask);
>  
>  	/* Store the fw version for ethttool option */
> -	strcpy(vdev->fw_version, ll_config.device_hw_info.fw_version.version);
> +	strcpy(vdev->fw_version, ll_config->device_hw_info.fw_version.version);
>  	memcpy(vdev->ndev->dev_addr, (u8 *)vdev->vpaths[0].macaddr, ETH_ALEN);
>  	memcpy(vdev->ndev->perm_addr, vdev->ndev->dev_addr, ETH_ALEN);
>  
> @@ -4387,7 +4394,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  	 * present to prevent such a failure.
>  	 */
>  
> -	if (ll_config.device_hw_info.function_mode ==
> +	if (ll_config->device_hw_info.function_mode ==
>  		VXGE_HW_FUNCTION_MODE_MULTI_FUNCTION)
>  		if (vdev->config.intr_type == INTA)
>  			vxge_hw_device_unmask_all(hldev);
> @@ -4399,6 +4406,7 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  	VXGE_COPY_DEBUG_INFO_TO_LL(vdev, vxge_hw_device_error_level_get(hldev),
>  		vxge_hw_device_trace_level_get(hldev));
>  
> +	kfree(ll_config);
>  	return 0;
>  
>  _exit5:
> @@ -4416,6 +4424,7 @@ _exit2:
>  _exit1:
>  	pci_disable_device(pdev);
>  _exit0:
> +	kfree(ll_config);
>  	kfree(device_config);
>  	driver_config->config_dev_cnt--;
>  	pci_set_drvdata(pdev, NULL);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* [PATCH nf-next-2.6] netfilter: vmalloc_node cleanup
From: Eric Dumazet @ 2010-06-02  5:02 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Netfilter Development Mailinglist

Using vmalloc_node(size, numa_node_id()) for temporary storage is not
needed. vmalloc(size) is more respectful of user NUMA policy.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/netfilter/arp_tables.c |    7 +++----
 net/ipv4/netfilter/ip_tables.c  |    4 ++--
 net/ipv6/netfilter/ip6_tables.c |    7 +++----
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 1ac01b1..16c0ba0 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -758,7 +758,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	 * about).
 	 */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vmalloc_node(countersize, numa_node_id());
+	counters = vmalloc(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
@@ -1005,8 +1005,7 @@ static int __do_replace(struct net *net, const char *name,
 	struct arpt_entry *iter;
 
 	ret = 0;
-	counters = vmalloc_node(num_counters * sizeof(struct xt_counters),
-				numa_node_id());
+	counters = vmalloc(num_counters * sizeof(struct xt_counters));
 	if (!counters) {
 		ret = -ENOMEM;
 		goto out;
@@ -1159,7 +1158,7 @@ static int do_add_counters(struct net *net, const void __user *user,
 	if (len != size + num_counters * sizeof(struct xt_counters))
 		return -EINVAL;
 
-	paddc = vmalloc_node(len - size, numa_node_id());
+	paddc = vmalloc(len - size);
 	if (!paddc)
 		return -ENOMEM;
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 63958f3..7c0b8ad 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -928,7 +928,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	   (other than comefrom, which userspace doesn't care
 	   about). */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vmalloc_node(countersize, numa_node_id());
+	counters = vmalloc(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
@@ -1352,7 +1352,7 @@ do_add_counters(struct net *net, const void __user *user,
 	if (len != size + num_counters * sizeof(struct xt_counters))
 		return -EINVAL;
 
-	paddc = vmalloc_node(len - size, numa_node_id());
+	paddc = vmalloc(len - size);
 	if (!paddc)
 		return -ENOMEM;
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 6f517bd..82945ef 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -943,7 +943,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	   (other than comefrom, which userspace doesn't care
 	   about). */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vmalloc_node(countersize, numa_node_id());
+	counters = vmalloc(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
@@ -1213,8 +1213,7 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks,
 	struct ip6t_entry *iter;
 
 	ret = 0;
-	counters = vmalloc_node(num_counters * sizeof(struct xt_counters),
-				numa_node_id());
+	counters = vmalloc(num_counters * sizeof(struct xt_counters));
 	if (!counters) {
 		ret = -ENOMEM;
 		goto out;
@@ -1368,7 +1367,7 @@ do_add_counters(struct net *net, const void __user *user, unsigned int len,
 	if (len != size + num_counters * sizeof(struct xt_counters))
 		return -EINVAL;
 
-	paddc = vmalloc_node(len - size, numa_node_id());
+	paddc = vmalloc(len - size);
 	if (!paddc)
 		return -ENOMEM;
 



^ permalink raw reply related

* Re: replace hooks in __netif_receive_skb (v4)
From: Jiri Pirko @ 2010-06-02  6:50 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: Stephen Hemminger, davem@davemloft.net, netdev@vger.kernel.org,
	kaber@trash.net, eric.dumazet@gmail.com
In-Reply-To: <0199E0D51A61344794750DC57738F58E70B17D377A@GVW1118EXC.americas.hpqcorp.net>

Tue, Jun 01, 2010 at 06:13:51PM CEST, anna.fischer@hp.com wrote:
>> Subject: net: replace hooks in __netif_receive_skb (v4)
>> 
>> 
>> From: Jiri Pirko <jpirko@redhat.com>
>> 
>> What this patch does is it removes two receive frame hooks (for bridge
>> and for
>> macvlan) from __netif_receive_skb. These are replaced them with a
>> single
>> hook for both. It only supports one hook per device because it makes no
>> sense to do bridging and macvlan on the same device.
>> 
>> Then a network driver (of virtual netdev like macvlan or bridge) can
>> register
>> an rx_handler for needed net device.
>
>
>I think the idea of this is really good, and it has been long required to get rid of the bridging hook and the "hack" for macvlan to get into the network stack. 
>
>However, I wonder, if this is to be used as a generic interface, then why the restriction of only having a single hook per device? Yes, it makes sense to do this for the bridge/macvlan combination, but in general there could be other cases where you would want to allow multiple receivers per device. 

Right, I agree.

>
>
>> @@ -2792,6 +2778,7 @@ EXPORT_SYMBOL(__skb_bond_should_drop);
>>  static int __netif_receive_skb(struct sk_buff *skb)
>>  {
>>  	struct packet_type *ptype, *pt_prev;
>> +	rx_callback_func_t *rh;
>>  	struct net_device *orig_dev;
>>  	struct net_device *master;
>>  	struct net_device *null_or_orig;
>> @@ -2855,12 +2842,16 @@ static int __netif_receive_skb(struct sk
>>  ncls:
>>  #endif
>> 
>> -	skb = handle_bridge(skb, &pt_prev, &ret, orig_dev);
>> -	if (!skb)
>> -		goto out;
>> -	skb = handle_macvlan(skb, &pt_prev, &ret, orig_dev);
>> -	if (!skb)
>> -		goto out;
>> +	/* Handle special case of bridge or macvlan */
>> +	if ((rh = rcu_dereference(skb->dev->rx_handler)) != NULL) {
>> +		if (pt_prev) {
>> +			ret = deliver_skb(skb, pt_prev, orig_dev);
>
>What happens with 'ret' here? It is completely ignored, isn't it? Because you assume that there is only a single receiver per device? I think it would be much better to have return codes that indicate whether a packet has been consumed by a receive handler, or whether it is supposed to be processed further. The same actually applies for the packet handlers that are processed a bit further down in netif_receive_skb() - the return code is ignored and so a handler has no control over further processing of the packet.

It's not ignored. it's returned at the end of the function __netif_receive_skb.

^ permalink raw reply

* Re: net: replace hooks in __netif_receive_skb (v4)
From: Jiri Pirko @ 2010-06-02  7:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev, kaber, eric.dumazet
In-Reply-To: <20100601091004.7885cd8c@nehalam>

Tue, Jun 01, 2010 at 06:10:04PM CEST, shemminger@vyatta.com wrote:
>On Tue, 1 Jun 2010 17:41:07 +0200
>Jiri Pirko <jpirko@redhat.com> wrote:
>
>> Actually, I'm not happy about this (reducing to only one hook) and for two
>> reasons:
>> 
>> 1) I think it's a good behaviour to "mask" one handler by another in case device
>> is for example used for macvlan and then added to bridge. Because when it's
>> again removed from the bridge, the original functionality is restored. And also,
>> this would be consistent with the current behaviour.
>> 
>> 2) I would imagine a situation, when multiple handers are needed in cascade.
>> Actually I'm working on a virtual device draft which uses two handlers, although
>> in corner situation.
>> 
>> Regards,
>> 	Jirka
>
>I don't like it because:
>
>1) Adding macvlan to bridge makes no sense. The bridge is already in promicious mode.

Right, I'm not saying it has sense to have it together in the same moment. But
you can to this:

# ip link add link eth0 type macvlan

You can use eth0 + macvlan0. Now you do:

# brctl addif br0 eth0

Direct use of eth0 and macvlan0 is not not possible now.

# brctl delif br0 eth0

Now the original functionality ot eth0 and macvlan0 is restored.

>
>2) I don't like to see functionality added when it is not needed.
>
>3) The extra overhead of traversing list causes more cache activity in extreme
>   hot path.
>

But ok, I hear your arguments, I thought about them myself before I did the
patch and I thought that added overhead (which is not too big, I see one more
dereference, rx_handler pointer) would be acceptible.

Maybe someone other then us has opinion too :)

>Wait and add the multiple handlers when your code is included.
>
>-- 
>http://www.extremeprogramming.org/rules/early.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox