Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/4] Implement persistent grant in xen-netfront/netback
From: annie li @ 2012-11-17  4:39 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel@lists.xensource.com, netdev@vger.kernel.org,
	konrad.wilk@oracle.com
In-Reply-To: <1353066382.3499.227.camel@zakaz.uk.xensource.com>



On 2012-11-16 19:46, Ian Campbell wrote:
>
> It seems like having netfront simply allocate itself a pool of grant
> references which it reuses would give equivalent benefits whilst being a
> smaller patch, with no protocol change and avoiding double copying. In
> fact by avoiding the double copy I'd expect it to be even better.
>
>   

OK, I can try this implementation.
And I'd better to compare the performance between double copy and this 
implementation to see how much grant lock affects grant copy too.

> I think you need to measure both dom0->domU and domU->dom0 to get the
> full picture since AIUI netperf sends the bulk data in only one
> direction with just ACKs coming back the other way.
>   
Yes.
My environment does not meet requirement of more VMs, I would do more 
thorough test after Konrad setups the environment.

Thanks
Annie

^ permalink raw reply

* Re: [PATCH net-next 09/17] net: Allow userns root control of the core of the network stack.
From: Eric W. Biederman @ 2012-11-17  2:46 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers, David Miller
In-Reply-To: <1353112116.2743.79.camel-/LGg1Z1CJKReKY3V0RtoKmatzQS1i7+A3tAM5lWOD0I@public.gmane.org>

Ben Hutchings <bhutchings-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org> writes:

> On Fri, 2012-11-16 at 06:32 -0800, Eric W. Biederman wrote:
>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>> 
>> > On 11/16/2012 05:03 PM, Eric W. Biederman wrote:
>> >> +	if (!capable(CAP_NET_ADMIN))
>> >> +		return -EPERM;
>> >> +
>> >>  	return netdev_store(dev, attr, buf, len, change_tx_queue_len);
>> >
>> > You mean ns_capable here?
>> 
>> No.  There I meant capable.
>> 
>> I deliberately call capable here because I don't understand what
>> the tx_queue_len well enough to be certain it is safe to relax
>> that check to be just ns_capable.
>> 
>> My get feel is that allowing an unprivileged user to be able to
>> arbitrarily change the tx_queue_len on a networking device would be a
>> nice way to allow queuing as many network packets as you would like with
>> kernel memory and DOSing the machine.
>> 
>> So since with a quick read of the code I could not convince myself it
>> was safe to allow unprivilged users to change tx_queue_len I left it
>> protected by capable.  While at the same time I relaxed the check in
>> netdev_store to be ns_capable.
>
> Tor the same reason you had better be very selective about which ethtool
> commands are allowed based on per-user_ns CAP_NET_ADMIN.  Consider for a
> start:
>
> ETHTOOL_SEEPROM => brick the NIC
> ETHTOOL_FLASHDEV => brick the NIC; own the system if it's not using an IOMMU

These are prevented by not having access to real hardware by default. A
physical network interface must be moved into a network namespace for
you to have access to it.

There are a handful of software network devices that are generally safe
macvlan, veth, tun, ipip tunnels, etc.  Using those network devices is
very interesting and about as performant as you can get while still
being safe.

A buffer overflow in an ethtool command looks as likely to me as being
able to own the system by reflashing the NIC.

Access to a real physical NIC is an act of trust.  Given the general
linux policy that drivers are merged when they mostly work I don't
currently know of any trust models between "I trust you with full access
to this device" and "I don't trust you with direct access to this
device" that I would feel confident giving to an untrusted user.

Which is a convoluted way of saying "ip link set eth0 netns bob" is the
moral equivalent of "chown bob.bob /dev/eth0; chmod u+rwx /dev/eth0"

> ETHTOOL_SMSGLVL => fill up the system log

That one might be worth doing something about, as there is non-local
effect.  Still I don't believe for any of the software based "safe"
networking devices ETHTOOL_SMSGLVL has any effect, and being able to
tweak the debug level could be important for debugging if you do have
direct access to the NIC.

Eric

^ permalink raw reply

* [PATCH v6 1/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-11-17  2:27 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, linux-pci, netdev, Dave Jiang, Nicholas Bellinger
In-Reply-To: <1353119233-32752-1-git-send-email-jon.mason@intel.com>

A PCI-Express non-transparent bridge (NTB) is a point-to-point PCIe bus
connecting 2 systems, providing electrical isolation between the two subsystems.
A non-transparent bridge is functionally similar to a transparent bridge except
that both sides of the bridge have their own independent address domains.  The
host on one side of the bridge will not have the visibility of the complete
memory or I/O space on the other side of the bridge.  To communicate across the
non-transparent bridge, each NTB endpoint has one (or more) apertures exposed to
the local system.  Writes to these apertures are mirrored to memory on the
remote system.  Communications can also occur through the use of doorbell
registers that initiate interrupts to the alternate domain, and scratch-pad
registers accessible from both sides.

The NTB device driver is needed to configure these memory windows, doorbell, and
scratch-pad registers as well as use them in such a way as they can be turned
into a viable communication channel to the remote system.  ntb_hw.[ch]
determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
the underlying hardware to provide access and a common interface to the doorbell
registers, scratch pads, and memory windows.  These hardware interfaces are
exported so that other, non-mainlined kernel drivers can access these.
ntb_transport.[ch] also uses the exported interfaces in ntb_hw.[ch] to setup a
communication channel(s) and provide a reliable way of transferring data from
one side to the other, which it then exports so that "client" drivers can access
them.  These client drivers are used to provide a standard kernel interface
(i.e., Ethernet device) to NTB, such that Linux can transfer data from one
system to the other in a standard way.

Signed-off-by: Jon Mason <jon.mason@intel.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 MAINTAINERS                 |    6 +
 drivers/Kconfig             |    2 +
 drivers/Makefile            |    1 +
 drivers/ntb/Kconfig         |   13 +
 drivers/ntb/Makefile        |    3 +
 drivers/ntb/ntb_hw.c        | 1157 +++++++++++++++++++++++++++++++++++
 drivers/ntb/ntb_hw.h        |  181 ++++++
 drivers/ntb/ntb_regs.h      |  139 +++++
 drivers/ntb/ntb_transport.c | 1427 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/ntb.h         |   83 +++
 10 files changed, 3012 insertions(+)
 create mode 100644 drivers/ntb/Kconfig
 create mode 100644 drivers/ntb/Makefile
 create mode 100644 drivers/ntb/ntb_hw.c
 create mode 100644 drivers/ntb/ntb_hw.h
 create mode 100644 drivers/ntb/ntb_regs.h
 create mode 100644 drivers/ntb/ntb_transport.c
 create mode 100644 include/linux/ntb.h

diff --git a/MAINTAINERS b/MAINTAINERS
index bb0b27d..b6139ad 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5193,6 +5193,12 @@ S:	Maintained
 F:	Documentation/scsi/NinjaSCSI.txt
 F:	drivers/scsi/nsp32*
 
+NTB DRIVER
+M:	Jon Mason <jon.mason@intel.com>
+S:	Supported
+F:	drivers/ntb/
+F:	include/linux/ntb.h
+
 NTFS FILESYSTEM
 M:	Anton Altaparmakov <anton@tuxera.com>
 L:	linux-ntfs-dev@lists.sourceforge.net
diff --git a/drivers/Kconfig b/drivers/Kconfig
index dbdefa3..7992e69 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -150,6 +150,8 @@ source "drivers/memory/Kconfig"
 
 source "drivers/iio/Kconfig"
 
+source "drivers/ntb/Kconfig"
+
 source "drivers/vme/Kconfig"
 
 source "drivers/pwm/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index a16a8d0..fdafa34 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -145,3 +145,4 @@ obj-$(CONFIG_EXTCON)		+= extcon/
 obj-$(CONFIG_MEMORY)		+= memory/
 obj-$(CONFIG_IIO)		+= iio/
 obj-$(CONFIG_VME_BUS)		+= vme/
+obj-$(CONFIG_NTB)		+= ntb/
diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
new file mode 100644
index 0000000..f69df793
--- /dev/null
+++ b/drivers/ntb/Kconfig
@@ -0,0 +1,13 @@
+config NTB
+       tristate "Intel Non-Transparent Bridge support"
+       depends on PCI
+       depends on X86
+       help
+        The PCI-E Non-transparent bridge hardware is a point-to-point PCI-E bus
+        connecting 2 systems.  When configured, writes to the device's PCI
+        mapped memory will be mirrored to a buffer on the remote system.  The
+        ntb Linux driver uses this point-to-point communication as a method to
+        transfer data from one system to the other.
+
+        If unsure, say N.
+
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
new file mode 100644
index 0000000..15cb59f
--- /dev/null
+++ b/drivers/ntb/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_NTB) += ntb.o
+
+ntb-objs := ntb_hw.o ntb_transport.o
diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
new file mode 100644
index 0000000..facad51
--- /dev/null
+++ b/drivers/ntb/ntb_hw.c
@@ -0,0 +1,1157 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/debugfs.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include "ntb_hw.h"
+#include "ntb_regs.h"
+
+#define NTB_NAME	"Intel(R) PCI-E Non-Transparent Bridge Driver"
+#define NTB_VER		"0.24"
+
+MODULE_DESCRIPTION(NTB_NAME);
+MODULE_VERSION(NTB_VER);
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Intel Corporation");
+
+enum {
+	NTB_CONN_CLASSIC = 0,
+	NTB_CONN_B2B,
+	NTB_CONN_RP,
+};
+
+enum {
+	NTB_DEV_USD = 0,
+	NTB_DEV_DSD,
+};
+
+enum {
+	SNB_HW = 0,
+	BWD_HW,
+};
+
+/* Translate memory window 0,1 to BAR 2,4 */
+#define MW_TO_BAR(mw)	(mw * 2 + 2)
+
+static DEFINE_PCI_DEVICE_TABLE(ntb_pci_tbl) = {
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_BWD)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_JSF)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_JSF)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_SNB)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_SNB)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB)},
+	{0}
+};
+MODULE_DEVICE_TABLE(pci, ntb_pci_tbl);
+
+/**
+ * ntb_register_event_callback() - register event callback
+ * @ndev: pointer to ntb_device instance
+ * @func: callback function to register
+ *
+ * This function registers a callback for any HW driver events such as link
+ * up/down, power management notices and etc.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_register_event_callback(struct ntb_device *ndev,
+				void (*func)(void *handle, unsigned int event))
+{
+	if (ndev->event_cb)
+		return -EINVAL;
+
+	ndev->event_cb = func;
+
+	return 0;
+}
+
+/**
+ * ntb_unregister_event_callback() - unregisters the event callback
+ * @ndev: pointer to ntb_device instance
+ *
+ * This function unregisters the existing callback from transport
+ */
+void ntb_unregister_event_callback(struct ntb_device *ndev)
+{
+	ndev->event_cb = NULL;
+}
+
+/**
+ * ntb_register_db_callback() - register a callback for doorbell interrupt
+ * @ndev: pointer to ntb_device instance
+ * @idx: doorbell index to register callback, zero based
+ * @func: callback function to register
+ *
+ * This function registers a callback function for the doorbell interrupt
+ * on the primary side. The function will unmask the doorbell as well to
+ * allow interrupt.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_register_db_callback(struct ntb_device *ndev, unsigned int idx,
+			     void *data, void (*func)(void *data, int db_num))
+{
+	unsigned long mask;
+
+	if (idx >= ndev->max_cbs || ndev->db_cb[idx].callback) {
+		dev_warn(&ndev->pdev->dev, "Invalid Index.\n");
+		return -EINVAL;
+	}
+
+	ndev->db_cb[idx].callback = func;
+	ndev->db_cb[idx].data = data;
+
+	/* unmask interrupt */
+	mask = readw(ndev->reg_ofs.pdb_mask);
+	clear_bit(idx * ndev->bits_per_vector, &mask);
+	writew(mask, ndev->reg_ofs.pdb_mask);
+
+	return 0;
+}
+
+/**
+ * ntb_unregister_db_callback() - unregister a callback for doorbell interrupt
+ * @ndev: pointer to ntb_device instance
+ * @idx: doorbell index to register callback, zero based
+ *
+ * This function unregisters a callback function for the doorbell interrupt
+ * on the primary side. The function will also mask the said doorbell.
+ */
+void ntb_unregister_db_callback(struct ntb_device *ndev, unsigned int idx)
+{
+	unsigned long mask;
+
+	if (idx >= ndev->max_cbs || !ndev->db_cb[idx].callback)
+		return;
+
+	mask = readw(ndev->reg_ofs.pdb_mask);
+	set_bit(idx * ndev->bits_per_vector, &mask);
+	writew(mask, ndev->reg_ofs.pdb_mask);
+
+	ndev->db_cb[idx].callback = NULL;
+}
+
+/**
+ * ntb_find_transport() - find the transport pointer
+ * @transport: pointer to pci device
+ *
+ * Given the pci device pointer, return the transport pointer passed in when
+ * the transport attached when it was inited.
+ *
+ * RETURNS: pointer to transport.
+ */
+void *ntb_find_transport(struct pci_dev *pdev)
+{
+	struct ntb_device *ndev = pci_get_drvdata(pdev);
+	return ndev->ntb_transport;
+}
+
+/**
+ * ntb_register_transport() - Register NTB transport with NTB HW driver
+ * @transport: transport identifier
+ *
+ * This function allows a transport to reserve the hardware driver for
+ * NTB usage.
+ *
+ * RETURNS: pointer to ntb_device, NULL on error.
+ */
+struct ntb_device *ntb_register_transport(struct pci_dev *pdev, void *transport)
+{
+	struct ntb_device *ndev = pci_get_drvdata(pdev);
+
+	if (ndev->ntb_transport)
+		return NULL;
+
+	ndev->ntb_transport = transport;
+	return ndev;
+}
+
+/**
+ * ntb_unregister_transport() - Unregister the transport with the NTB HW driver
+ * @ndev - ntb_device of the transport to be freed
+ *
+ * This function unregisters the transport from the HW driver and performs any
+ * necessary cleanups.
+ */
+void ntb_unregister_transport(struct ntb_device *ndev)
+{
+	int i;
+
+	if (!ndev->ntb_transport)
+		return;
+
+	for (i = 0; i < ndev->max_cbs; i++)
+		ntb_unregister_db_callback(ndev, i);
+
+	ntb_unregister_event_callback(ndev);
+	ndev->ntb_transport = NULL;
+}
+
+/**
+ * ntb_get_max_spads() - get the total scratch regs usable
+ * @ndev: pointer to ntb_device instance
+ *
+ * This function returns the max 32bit scratchpad registers usable by the
+ * upper layer.
+ *
+ * RETURNS: total number of scratch pad registers available
+ */
+int ntb_get_max_spads(struct ntb_device *ndev)
+{
+	return ndev->limits.max_spads;
+}
+
+/**
+ * ntb_write_local_spad() - write to the secondary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to the scratchpad register, 0 based
+ * @val: the data value to put into the register
+ *
+ * This function allows writing of a 32bit value to the indexed scratchpad
+ * register. This writes over the data mirrored to the local scratchpad register
+ * by the remote system.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val)
+{
+	if (idx >= ndev->limits.max_spads)
+		return -EINVAL;
+
+	dev_dbg(&ndev->pdev->dev, "Writing %x to local scratch pad index %d\n",
+		val, idx);
+	writel(val, ndev->reg_ofs.spad_read + idx * 4);
+
+	return 0;
+}
+
+/**
+ * ntb_read_local_spad() - read from the primary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to scratchpad register, 0 based
+ * @val: pointer to 32bit integer for storing the register value
+ *
+ * This function allows reading of the 32bit scratchpad register on
+ * the primary (internal) side.  This allows the local system to read data
+ * written and mirrored to the scratchpad register by the remote system.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
+{
+	if (idx >= ndev->limits.max_spads)
+		return -EINVAL;
+
+	*val = readl(ndev->reg_ofs.spad_write + idx * 4);
+	dev_dbg(&ndev->pdev->dev,
+		"Reading %x from local scratch pad index %d\n", *val, idx);
+
+	return 0;
+}
+
+/**
+ * ntb_write_remote_spad() - write to the secondary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to the scratchpad register, 0 based
+ * @val: the data value to put into the register
+ *
+ * This function allows writing of a 32bit value to the indexed scratchpad
+ * register. The register resides on the secondary (external) side.  This allows
+ * the local system to write data to be mirrored to the remote systems
+ * scratchpad register.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val)
+{
+	if (idx >= ndev->limits.max_spads)
+		return -EINVAL;
+
+	dev_dbg(&ndev->pdev->dev, "Writing %x to remote scratch pad index %d\n",
+		val, idx);
+	writel(val, ndev->reg_ofs.spad_write + idx * 4);
+
+	return 0;
+}
+
+/**
+ * ntb_read_remote_spad() - read from the primary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to scratchpad register, 0 based
+ * @val: pointer to 32bit integer for storing the register value
+ *
+ * This function allows reading of the 32bit scratchpad register on
+ * the primary (internal) side.  This alloows the local system to read the data
+ * it wrote to be mirrored on the remote system.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
+{
+	if (idx >= ndev->limits.max_spads)
+		return -EINVAL;
+
+	*val = readl(ndev->reg_ofs.spad_read + idx * 4);
+	dev_dbg(&ndev->pdev->dev,
+		"Reading %x from remote scratch pad index %d\n", *val, idx);
+
+	return 0;
+}
+
+/**
+ * ntb_get_mw_vbase() - get virtual addr for the NTB memory window
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ *
+ * This function provides the base virtual address of the memory window
+ * specified.
+ *
+ * RETURNS: pointer to virtual address, or NULL on error.
+ */
+void *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw)
+{
+	if (mw > NTB_NUM_MW)
+		return NULL;
+
+	return ndev->mw[mw].vbase;
+}
+
+/**
+ * ntb_get_mw_size() - return size of NTB memory window
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ *
+ * This function provides the physical size of the memory window specified
+ *
+ * RETURNS: the size of the memory window or zero on error
+ */
+resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw)
+{
+	if (mw > NTB_NUM_MW)
+		return 0;
+
+	return ndev->mw[mw].bar_sz;
+}
+
+/**
+ * ntb_set_mw_addr - set the memory window address
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ * @addr: base address for data
+ *
+ * This function sets the base physical address of the memory window.  This
+ * memory address is where data from the remote system will be transfered into
+ * or out of depending on how the transport is configured.
+ */
+void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr)
+{
+	if (mw > NTB_NUM_MW)
+		return;
+
+	dev_dbg(&ndev->pdev->dev, "Writing addr %Lx to BAR %d\n", addr,
+		MW_TO_BAR(mw));
+
+	ndev->mw[mw].phys_addr = addr;
+
+	switch (MW_TO_BAR(mw)) {
+	case NTB_BAR_23:
+		writeq(addr, ndev->reg_ofs.sbar2_xlat);
+		break;
+	case NTB_BAR_45:
+		writeq(addr, ndev->reg_ofs.sbar4_xlat);
+		break;
+	}
+}
+
+/**
+ * ntb_ring_sdb() - Set the doorbell on the secondary/external side
+ * @ndev: pointer to ntb_device instance
+ * @db: doorbell to ring
+ *
+ * This function allows triggering of a doorbell on the secondary/external
+ * side that will initiate an interrupt on the remote host
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+void ntb_ring_sdb(struct ntb_device *ndev, unsigned int db)
+{
+	dev_dbg(&ndev->pdev->dev, "%s: ringing doorbell %d\n", __func__, db);
+
+	if (ndev->hw_type == BWD_HW)
+		writeq((u64) 1 << db, ndev->reg_ofs.sdb);
+	else
+		writew(((1 << ndev->bits_per_vector) - 1) <<
+		       (db * ndev->bits_per_vector), ndev->reg_ofs.sdb);
+}
+
+static void ntb_link_event(struct ntb_device *ndev, int link_state)
+{
+	unsigned int event;
+
+	if (ndev->link_status == link_state)
+		return;
+
+	if (link_state == NTB_LINK_UP) {
+		u16 status;
+
+		dev_info(&ndev->pdev->dev, "Link Up\n");
+		ndev->link_status = NTB_LINK_UP;
+		event = NTB_EVENT_HW_LINK_UP;
+
+		if (ndev->hw_type == BWD_HW)
+			status = readw(ndev->reg_ofs.lnk_stat);
+		else {
+			int rc = pci_read_config_word(ndev->pdev,
+						      SNB_LINK_STATUS_OFFSET,
+						      &status);
+			if (rc)
+				return;
+		}
+		dev_info(&ndev->pdev->dev, "Link Width %d, Link Speed %d\n",
+			 (status & NTB_LINK_WIDTH_MASK) >> 4,
+			 (status & NTB_LINK_SPEED_MASK));
+	} else {
+		dev_info(&ndev->pdev->dev, "Link Down\n");
+		ndev->link_status = NTB_LINK_DOWN;
+		event = NTB_EVENT_HW_LINK_DOWN;
+	}
+
+	/* notify the upper layer if we have an event change */
+	if (ndev->event_cb)
+		ndev->event_cb(ndev->ntb_transport, event);
+}
+
+static int ntb_link_status(struct ntb_device *ndev)
+{
+	int link_state;
+
+	if (ndev->hw_type == BWD_HW) {
+		u32 ntb_cntl;
+
+		ntb_cntl = readl(ndev->reg_ofs.lnk_cntl);
+		if (ntb_cntl & BWD_CNTL_LINK_DOWN)
+			link_state = NTB_LINK_DOWN;
+		else
+			link_state = NTB_LINK_UP;
+	} else {
+		u16 status;
+		int rc;
+
+		rc = pci_read_config_word(ndev->pdev, SNB_LINK_STATUS_OFFSET,
+					  &status);
+		if (rc)
+			return rc;
+
+		if (status & NTB_LINK_STATUS_ACTIVE)
+			link_state = NTB_LINK_UP;
+		else
+			link_state = NTB_LINK_DOWN;
+	}
+
+	ntb_link_event(ndev, link_state);
+
+	return 0;
+}
+
+/* BWD doesn't have link status interrupt, poll on that platform */
+static void bwd_link_poll(struct work_struct *work)
+{
+	struct ntb_device *ndev = container_of(work, struct ntb_device,
+					       hb_timer.work);
+	unsigned long ts = jiffies;
+
+	/* If we haven't gotten an interrupt in a while, check the BWD link
+	 * status bit
+	 */
+	if (ts > ndev->last_ts + NTB_HB_TIMEOUT) {
+		int rc = ntb_link_status(ndev);
+		if (rc)
+			dev_err(&ndev->pdev->dev,
+				"Error determining link status\n");
+	}
+
+	schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);
+}
+
+static int ntb_xeon_setup(struct ntb_device *ndev)
+{
+	int rc;
+	u8 val;
+
+	ndev->hw_type = SNB_HW;
+
+	rc = pci_read_config_byte(ndev->pdev, NTB_PPD_OFFSET, &val);
+	if (rc)
+		return rc;
+
+	switch (val & SNB_PPD_CONN_TYPE) {
+	case NTB_CONN_B2B:
+		ndev->conn_type = NTB_CONN_B2B;
+		break;
+	case NTB_CONN_CLASSIC:
+	case NTB_CONN_RP:
+	default:
+		dev_err(&ndev->pdev->dev, "Only B2B supported at this time\n");
+		return -EINVAL;
+	}
+
+	if (val & SNB_PPD_DEV_TYPE)
+		ndev->dev_type = NTB_DEV_DSD;
+	else
+		ndev->dev_type = NTB_DEV_USD;
+
+	ndev->reg_ofs.pdb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
+	ndev->reg_ofs.pdb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
+	ndev->reg_ofs.sbar2_xlat = ndev->reg_base + SNB_SBAR2XLAT_OFFSET;
+	ndev->reg_ofs.sbar4_xlat = ndev->reg_base + SNB_SBAR4XLAT_OFFSET;
+	ndev->reg_ofs.lnk_cntl = ndev->reg_base + SNB_NTBCNTL_OFFSET;
+	ndev->reg_ofs.lnk_stat = ndev->reg_base + SNB_LINK_STATUS_OFFSET;
+	ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET;
+	ndev->reg_ofs.spci_cmd = ndev->reg_base + SNB_PCICMD_OFFSET;
+
+	if (ndev->conn_type == NTB_CONN_B2B) {
+		ndev->reg_ofs.sdb = ndev->reg_base + SNB_B2B_DOORBELL_OFFSET;
+		ndev->reg_ofs.spad_write = ndev->reg_base + SNB_B2B_SPAD_OFFSET;
+		ndev->limits.max_spads = SNB_MAX_SPADS;
+	} else {
+		ndev->reg_ofs.sdb = ndev->reg_base + SNB_SDOORBELL_OFFSET;
+		ndev->reg_ofs.spad_write = ndev->reg_base + SNB_SPAD_OFFSET;
+		ndev->limits.max_spads = SNB_MAX_COMPAT_SPADS;
+	}
+
+	ndev->limits.max_db_bits = SNB_MAX_DB_BITS;
+	ndev->limits.msix_cnt = SNB_MSIX_CNT;
+	ndev->bits_per_vector = SNB_DB_BITS_PER_VEC;
+
+	return 0;
+}
+
+static int ntb_bwd_setup(struct ntb_device *ndev)
+{
+	int rc;
+	u32 val;
+
+	ndev->hw_type = BWD_HW;
+
+	rc = pci_read_config_dword(ndev->pdev, NTB_PPD_OFFSET, &val);
+	if (rc)
+		return rc;
+
+	switch ((val & BWD_PPD_CONN_TYPE) >> 8) {
+	case NTB_CONN_B2B:
+		ndev->conn_type = NTB_CONN_B2B;
+		break;
+	case NTB_CONN_RP:
+	default:
+		dev_err(&ndev->pdev->dev, "Only B2B supported at this time\n");
+		return -EINVAL;
+	}
+
+	if (val & BWD_PPD_DEV_TYPE)
+		ndev->dev_type = NTB_DEV_DSD;
+	else
+		ndev->dev_type = NTB_DEV_USD;
+
+	/* Initiate PCI-E link training */
+	rc = pci_write_config_dword(ndev->pdev, NTB_PPD_OFFSET,
+				    val | BWD_PPD_INIT_LINK);
+	if (rc)
+		return rc;
+
+	ndev->reg_ofs.pdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
+	ndev->reg_ofs.pdb_mask = ndev->reg_base + BWD_PDBMSK_OFFSET;
+	ndev->reg_ofs.sbar2_xlat = ndev->reg_base + BWD_SBAR2XLAT_OFFSET;
+	ndev->reg_ofs.sbar4_xlat = ndev->reg_base + BWD_SBAR4XLAT_OFFSET;
+	ndev->reg_ofs.lnk_cntl = ndev->reg_base + BWD_NTBCNTL_OFFSET;
+	ndev->reg_ofs.lnk_stat = ndev->reg_base + BWD_LINK_STATUS_OFFSET;
+	ndev->reg_ofs.spad_read = ndev->reg_base + BWD_SPAD_OFFSET;
+	ndev->reg_ofs.spci_cmd = ndev->reg_base + BWD_PCICMD_OFFSET;
+
+	if (ndev->conn_type == NTB_CONN_B2B) {
+		ndev->reg_ofs.sdb = ndev->reg_base + BWD_B2B_DOORBELL_OFFSET;
+		ndev->reg_ofs.spad_write = ndev->reg_base + BWD_B2B_SPAD_OFFSET;
+		ndev->limits.max_spads = BWD_MAX_SPADS;
+	} else {
+		ndev->reg_ofs.sdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
+		ndev->reg_ofs.spad_write = ndev->reg_base + BWD_SPAD_OFFSET;
+		ndev->limits.max_spads = BWD_MAX_COMPAT_SPADS;
+	}
+
+	ndev->limits.max_db_bits = BWD_MAX_DB_BITS;
+	ndev->limits.msix_cnt = BWD_MSIX_CNT;
+	ndev->bits_per_vector = BWD_DB_BITS_PER_VEC;
+
+	/* Since bwd doesn't have a link interrupt, setup a poll timer */
+	INIT_DELAYED_WORK(&ndev->hb_timer, bwd_link_poll);
+	schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);
+
+	return 0;
+}
+
+static int __devinit ntb_device_setup(struct ntb_device *ndev)
+{
+	int rc;
+
+	switch (ndev->pdev->device) {
+	case PCI_DEVICE_ID_INTEL_NTB_2ND_SNB:
+	case PCI_DEVICE_ID_INTEL_NTB_RP_JSF:
+	case PCI_DEVICE_ID_INTEL_NTB_RP_SNB:
+	case PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF:
+	case PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB:
+	case PCI_DEVICE_ID_INTEL_NTB_B2B_JSF:
+	case PCI_DEVICE_ID_INTEL_NTB_B2B_SNB:
+		rc = ntb_xeon_setup(ndev);
+		break;
+	case PCI_DEVICE_ID_INTEL_NTB_B2B_BWD:
+		rc = ntb_bwd_setup(ndev);
+		break;
+	default:
+		rc = -ENODEV;
+	}
+
+	/* Enable Bus Master and Memory Space on the secondary side */
+	writew(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER, ndev->reg_ofs.spci_cmd);
+
+	return rc;
+}
+
+static void ntb_device_free(struct ntb_device *ndev)
+{
+	if (ndev->hw_type == BWD_HW)
+		cancel_delayed_work_sync(&ndev->hb_timer);
+}
+
+static irqreturn_t bwd_callback_msix_irq(int irq, void *data)
+{
+	struct ntb_db_cb *db_cb = data;
+	struct ntb_device *ndev = db_cb->ndev;
+
+	dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for DB %d\n", irq,
+		db_cb->db_num);
+
+	if (db_cb->callback)
+		db_cb->callback(db_cb->data, db_cb->db_num);
+
+	/* No need to check for the specific HB irq, any interrupt means
+	 * we're connected.
+	 */
+	ndev->last_ts = jiffies;
+
+	writeq((u64) 1 << db_cb->db_num, ndev->reg_ofs.pdb);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t xeon_callback_msix_irq(int irq, void *data)
+{
+	struct ntb_db_cb *db_cb = data;
+	struct ntb_device *ndev = db_cb->ndev;
+
+	dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for DB %d\n", irq,
+		db_cb->db_num);
+
+	if (db_cb->callback)
+		db_cb->callback(db_cb->data, db_cb->db_num);
+
+	/* On Sandybridge, there are 16 bits in the interrupt register
+	 * but only 4 vectors.  So, 5 bits are assigned to the first 3
+	 * vectors, with the 4th having a single bit for link
+	 * interrupts.
+	 */
+	writew(((1 << ndev->bits_per_vector) - 1) <<
+	       (db_cb->db_num * ndev->bits_per_vector), ndev->reg_ofs.pdb);
+
+	return IRQ_HANDLED;
+}
+
+/* Since we do not have a HW doorbell in BWD, this is only used in JF/JT */
+static irqreturn_t xeon_event_msix_irq(int irq, void *dev)
+{
+	struct ntb_device *ndev = dev;
+	int rc;
+
+	dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for Events\n", irq);
+
+	rc = ntb_link_status(ndev);
+	if (rc)
+		dev_err(&ndev->pdev->dev, "Error determining link status\n");
+
+	/* bit 15 is always the link bit */
+	writew(1 << ndev->limits.max_db_bits, ndev->reg_ofs.pdb);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ntb_interrupt(int irq, void *dev)
+{
+	struct ntb_device *ndev = dev;
+	unsigned int i = 0;
+
+	if (ndev->hw_type == BWD_HW) {
+		u64 pdb = readq(ndev->reg_ofs.pdb);
+
+		dev_dbg(&ndev->pdev->dev, "irq %d - pdb = %Lx\n", irq, pdb);
+
+		while (pdb) {
+			i = __ffs(pdb);
+			pdb &= pdb - 1;
+			bwd_callback_msix_irq(irq, &ndev->db_cb[i]);
+		}
+	} else {
+		u16 pdb = readw(ndev->reg_ofs.pdb);
+
+		dev_dbg(&ndev->pdev->dev, "irq %d - pdb = %x sdb %x\n", irq,
+			pdb, readw(ndev->reg_ofs.sdb));
+
+		if (pdb & SNB_DB_HW_LINK) {
+			xeon_event_msix_irq(irq, dev);
+			pdb &= ~SNB_DB_HW_LINK;
+		}
+
+		while (pdb) {
+			i = __ffs(pdb);
+			pdb &= pdb - 1;
+			xeon_callback_msix_irq(irq, &ndev->db_cb[i]);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static int ntb_setup_msix(struct ntb_device *ndev)
+{
+	struct pci_dev *pdev = ndev->pdev;
+	struct msix_entry *msix;
+	int msix_entries;
+	int rc, i, pos;
+	u16 val;
+
+	pos = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
+	if (!pos) {
+		rc = -EIO;
+		goto err;
+	}
+
+	rc = pci_read_config_word(pdev, pos + PCI_MSIX_FLAGS, &val);
+	if (rc)
+		goto err;
+
+	msix_entries = msix_table_size(val);
+	if (msix_entries > ndev->limits.msix_cnt) {
+		rc = -EINVAL;
+		goto err;
+	}
+
+	ndev->msix_entries = kmalloc(sizeof(struct msix_entry) * msix_entries,
+				     GFP_KERNEL);
+	if (!ndev->msix_entries) {
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	for (i = 0; i < msix_entries; i++)
+		ndev->msix_entries[i].entry = i;
+
+	rc = pci_enable_msix(pdev, ndev->msix_entries, msix_entries);
+	if (rc < 0)
+		goto err1;
+	if (rc > 0) {
+		/* On SNB, the link interrupt is always tied to 4th vector.  If
+		 * we can't get all 4, then we can't use MSI-X.
+		 */
+		if (ndev->hw_type != BWD_HW) {
+			rc = -EIO;
+			goto err1;
+		}
+
+		dev_warn(&pdev->dev,
+			 "Only %d MSI-X vectors.  Limiting the number of queues to that number.\n",
+			 rc);
+		msix_entries = rc;
+	}
+
+	for (i = 0; i < msix_entries; i++) {
+		msix = &ndev->msix_entries[i];
+		WARN_ON(!msix->vector);
+
+		/* Use the last MSI-X vector for Link status */
+		if (ndev->hw_type == BWD_HW) {
+			rc = request_irq(msix->vector, bwd_callback_msix_irq, 0,
+					 "ntb-callback-msix", &ndev->db_cb[i]);
+			if (rc)
+				goto err2;
+		} else {
+			if (i == msix_entries - 1) {
+				rc = request_irq(msix->vector,
+						 xeon_event_msix_irq, 0,
+						 "ntb-event-msix", ndev);
+				if (rc)
+					goto err2;
+			} else {
+				rc = request_irq(msix->vector,
+						 xeon_callback_msix_irq, 0,
+						 "ntb-callback-msix",
+						 &ndev->db_cb[i]);
+				if (rc)
+					goto err2;
+			}
+		}
+	}
+
+	ndev->num_msix = msix_entries;
+	if (ndev->hw_type == BWD_HW)
+		ndev->max_cbs = msix_entries;
+	else
+		ndev->max_cbs = msix_entries - 1;
+
+	return 0;
+
+err2:
+	while (--i >= 0) {
+		msix = &ndev->msix_entries[i];
+		if (ndev->hw_type != BWD_HW && i == ndev->num_msix - 1)
+			free_irq(msix->vector, ndev);
+		else
+			free_irq(msix->vector, &ndev->db_cb[i]);
+	}
+	pci_disable_msix(pdev);
+err1:
+	kfree(ndev->msix_entries);
+	dev_err(&pdev->dev, "Error allocating MSI-X interrupt\n");
+err:
+	ndev->num_msix = 0;
+	return rc;
+}
+
+static int ntb_setup_msi(struct ntb_device *ndev)
+{
+	struct pci_dev *pdev = ndev->pdev;
+	int rc;
+
+	rc = pci_enable_msi(pdev);
+	if (rc)
+		return rc;
+
+	rc = request_irq(pdev->irq, ntb_interrupt, 0, "ntb-msi", ndev);
+	if (rc) {
+		pci_disable_msi(pdev);
+		dev_err(&pdev->dev, "Error allocating MSI interrupt\n");
+		return rc;
+	}
+
+	return 0;
+}
+
+static int ntb_setup_intx(struct ntb_device *ndev)
+{
+	struct pci_dev *pdev = ndev->pdev;
+	int rc;
+
+	pci_msi_off(pdev);
+
+	/* Verify intx is enabled */
+	pci_intx(pdev, 1);
+
+	rc = request_irq(pdev->irq, ntb_interrupt, IRQF_SHARED, "ntb-intx",
+			 ndev);
+	if (rc)
+		return rc;
+
+	return 0;
+}
+
+static int __devinit ntb_setup_interrupts(struct ntb_device *ndev)
+{
+	int rc;
+
+	/* On BWD, disable all interrupts.  On SNB, disable all but Link
+	 * Interrupt.  The rest will be unmasked as callbacks are registered.
+	 */
+	if (ndev->hw_type == BWD_HW)
+		writeq(~0, ndev->reg_ofs.pdb_mask);
+	else
+		writew(~(1 << ndev->limits.max_db_bits),
+		       ndev->reg_ofs.pdb_mask);
+
+	rc = ntb_setup_msix(ndev);
+	if (!rc)
+		goto done;
+
+	ndev->bits_per_vector = 1;
+	ndev->max_cbs = ndev->limits.max_db_bits;
+
+	rc = ntb_setup_msi(ndev);
+	if (!rc)
+		goto done;
+
+	rc = ntb_setup_intx(ndev);
+	if (rc) {
+		dev_err(&ndev->pdev->dev, "no usable interrupts\n");
+		return rc;
+	}
+
+done:
+	return 0;
+}
+
+static void __devexit ntb_free_interrupts(struct ntb_device *ndev)
+{
+	struct pci_dev *pdev = ndev->pdev;
+
+	/* mask interrupts */
+	if (ndev->hw_type == BWD_HW)
+		writeq(~0, ndev->reg_ofs.pdb_mask);
+	else
+		writew(~0, ndev->reg_ofs.pdb_mask);
+
+	if (ndev->num_msix) {
+		struct msix_entry *msix;
+		u32 i;
+
+		for (i = 0; i < ndev->num_msix; i++) {
+			msix = &ndev->msix_entries[i];
+			if (ndev->hw_type != BWD_HW && i == ndev->num_msix - 1)
+				free_irq(msix->vector, ndev);
+			else
+				free_irq(msix->vector, &ndev->db_cb[i]);
+		}
+		pci_disable_msix(pdev);
+	} else {
+		free_irq(pdev->irq, ndev);
+
+		if (pci_dev_msi_enabled(pdev))
+			pci_disable_msi(pdev);
+	}
+}
+
+static int __devinit ntb_create_callbacks(struct ntb_device *ndev)
+{
+	int i;
+
+	/* Checken-egg issue.  We won't know how many callbacks are necessary
+	 * until we see how many MSI-X vectors we get, but these pointers need
+	 * to be passed into the MSI-X register fucntion.  So, we allocate the
+	 * max, knowing that they might not all be used, to work around this.
+	 */
+	ndev->db_cb = kcalloc(ndev->limits.max_db_bits,
+			      sizeof(struct ntb_db_cb),
+			      GFP_KERNEL);
+	if (!ndev->db_cb)
+		return -ENOMEM;
+
+	for (i = 0; i < ndev->limits.max_db_bits; i++) {
+		ndev->db_cb[i].db_num = i;
+		ndev->db_cb[i].ndev = ndev;
+	}
+
+	return 0;
+}
+
+static void ntb_free_callbacks(struct ntb_device *ndev)
+{
+	int i;
+
+	for (i = 0; i < ndev->limits.max_db_bits; i++)
+		ntb_unregister_db_callback(ndev, i);
+
+	kfree(ndev->db_cb);
+}
+
+static int __devinit
+ntb_pci_probe(struct pci_dev *pdev,
+	      __attribute__((unused)) const struct pci_device_id *id)
+{
+	struct ntb_device *ndev;
+	int rc, i;
+
+	ndev = kzalloc(sizeof(struct ntb_device), GFP_KERNEL);
+	if (!ndev)
+		return -ENOMEM;
+
+	ndev->pdev = pdev;
+	ndev->link_status = NTB_LINK_DOWN;
+	pci_set_drvdata(pdev, ndev);
+
+	rc = pci_enable_device(pdev);
+	if (rc)
+		goto err;
+
+	pci_set_master(ndev->pdev);
+
+	rc = pci_request_selected_regions(pdev, NTB_BAR_MASK, KBUILD_MODNAME);
+	if (rc)
+		goto err1;
+
+	ndev->reg_base = pci_ioremap_bar(pdev, NTB_BAR_MMIO);
+	if (!ndev->reg_base) {
+		dev_warn(&pdev->dev, "Cannot remap BAR 0\n");
+		rc = -EIO;
+		goto err2;
+	}
+
+	for (i = 0; i < NTB_NUM_MW; i++) {
+		ndev->mw[i].bar_sz = pci_resource_len(pdev, MW_TO_BAR(i));
+		ndev->mw[i].vbase =
+		    ioremap_wc(pci_resource_start(pdev, MW_TO_BAR(i)),
+			       ndev->mw[i].bar_sz);
+		dev_info(&pdev->dev, "MW %d size %d\n", i,
+			 (u32) pci_resource_len(pdev, MW_TO_BAR(i)));
+		if (!ndev->mw[i].vbase) {
+			dev_warn(&pdev->dev, "Cannot remap BAR %d\n",
+				 MW_TO_BAR(i));
+			rc = -EIO;
+			goto err3;
+		}
+	}
+
+	rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (rc) {
+		rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (rc)
+			goto err3;
+
+		dev_warn(&pdev->dev, "Cannot DMA highmem\n");
+	}
+
+	rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (rc) {
+		rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (rc)
+			goto err3;
+
+		dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
+	}
+
+	rc = ntb_device_setup(ndev);
+	if (rc)
+		goto err3;
+
+	rc = ntb_create_callbacks(ndev);
+	if (rc)
+		goto err4;
+
+	rc = ntb_setup_interrupts(ndev);
+	if (rc)
+		goto err5;
+
+	/* The scratchpad registers keep the values between rmmod/insmod,
+	 * blast them now
+	 */
+	for (i = 0; i < ndev->limits.max_spads; i++) {
+		ntb_write_local_spad(ndev, i, 0);
+		ntb_write_remote_spad(ndev, i, 0);
+	}
+
+	rc = ntb_transport_init(pdev);
+	if (rc)
+		goto err6;
+
+	/* Let's bring the NTB link up */
+	writel(NTB_CNTL_BAR23_SNOOP | NTB_CNTL_BAR45_SNOOP,
+	       ndev->reg_ofs.lnk_cntl);
+
+	return 0;
+
+err6:
+	ntb_free_interrupts(ndev);
+err5:
+	ntb_free_callbacks(ndev);
+err4:
+	ntb_device_free(ndev);
+err3:
+	for (i--; i >= 0; i--)
+		iounmap(ndev->mw[i].vbase);
+	iounmap(ndev->reg_base);
+err2:
+	pci_release_selected_regions(pdev, NTB_BAR_MASK);
+err1:
+	pci_disable_device(pdev);
+err:
+	kfree(ndev);
+
+	dev_err(&pdev->dev, "Error loading %s module\n", KBUILD_MODNAME);
+	return rc;
+}
+
+static void __devexit ntb_pci_remove(struct pci_dev *pdev)
+{
+	struct ntb_device *ndev = pci_get_drvdata(pdev);
+	int i;
+	u32 ntb_cntl;
+
+	/* Bring NTB link down */
+	ntb_cntl = readl(ndev->reg_ofs.lnk_cntl);
+	ntb_cntl |= NTB_LINK_DISABLE;
+	writel(ntb_cntl, ndev->reg_ofs.lnk_cntl);
+
+	ntb_transport_free(ndev->ntb_transport);
+
+	ntb_free_interrupts(ndev);
+	ntb_free_callbacks(ndev);
+	ntb_device_free(ndev);
+
+	for (i = 0; i < NTB_NUM_MW; i++)
+		iounmap(ndev->mw[i].vbase);
+
+	iounmap(ndev->reg_base);
+	pci_release_selected_regions(pdev, NTB_BAR_MASK);
+	pci_disable_device(pdev);
+	kfree(ndev);
+}
+
+static struct pci_driver ntb_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = ntb_pci_tbl,
+	.probe = ntb_pci_probe,
+	.remove = __devexit_p(ntb_pci_remove),
+};
+module_pci_driver(ntb_pci_driver);
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
new file mode 100644
index 0000000..5e00951
--- /dev/null
+++ b/drivers/ntb/ntb_hw.h
@@ -0,0 +1,181 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_JSF		0x3725
+#define PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF	0x3726
+#define PCI_DEVICE_ID_INTEL_NTB_RP_JSF		0x3727
+#define PCI_DEVICE_ID_INTEL_NTB_RP_SNB		0x3C08
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_SNB		0x3C0D
+#define PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB	0x3C0E
+#define PCI_DEVICE_ID_INTEL_NTB_2ND_SNB		0x3C0F
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_BWD		0x0C4E
+
+#define msix_table_size(control)	((control & PCI_MSIX_FLAGS_QSIZE)+1)
+
+#define NTB_BAR_MMIO		0
+#define NTB_BAR_23		2
+#define NTB_BAR_45		4
+#define NTB_BAR_MASK		((1 << NTB_BAR_MMIO) | (1 << NTB_BAR_23) |\
+				 (1 << NTB_BAR_45))
+
+#define NTB_LINK_DOWN		0
+#define NTB_LINK_UP		1
+
+#define NTB_HB_TIMEOUT		msecs_to_jiffies(1000)
+
+#define NTB_NUM_MW		2
+
+enum ntb_hw_event {
+	NTB_EVENT_SW_EVENT0 = 0,
+	NTB_EVENT_SW_EVENT1,
+	NTB_EVENT_SW_EVENT2,
+	NTB_EVENT_HW_ERROR,
+	NTB_EVENT_HW_LINK_UP,
+	NTB_EVENT_HW_LINK_DOWN,
+};
+
+struct ntb_mw {
+	dma_addr_t phys_addr;
+	void __iomem *vbase;
+	resource_size_t bar_sz;
+};
+
+struct ntb_db_cb {
+	void (*callback) (void *data, int db_num);
+	unsigned int db_num;
+	void *data;
+	struct ntb_device *ndev;
+};
+
+struct ntb_device {
+	struct pci_dev *pdev;
+	struct msix_entry *msix_entries;
+	void __iomem *reg_base;
+	struct ntb_mw mw[NTB_NUM_MW];
+	struct {
+		unsigned int max_spads;
+		unsigned int max_db_bits;
+		unsigned int msix_cnt;
+	} limits;
+	struct {
+		void __iomem *pdb;
+		void __iomem *pdb_mask;
+		void __iomem *sdb;
+		void __iomem *sbar2_xlat;
+		void __iomem *sbar4_xlat;
+		void __iomem *spad_write;
+		void __iomem *spad_read;
+		void __iomem *lnk_cntl;
+		void __iomem *lnk_stat;
+		void __iomem *spci_cmd;
+	} reg_ofs;
+	struct ntb_transport *ntb_transport;
+	void (*event_cb)(void *handle, enum ntb_hw_event event);
+
+	struct ntb_db_cb *db_cb;
+	unsigned char hw_type;
+	unsigned char conn_type;
+	unsigned char dev_type;
+	unsigned char num_msix;
+	unsigned char bits_per_vector;
+	unsigned char max_cbs;
+	unsigned char link_status;
+	struct delayed_work hb_timer;
+	unsigned long last_ts;
+};
+
+/**
+ * ntb_hw_link_status() - return the hardware link status
+ * @ndev: pointer to ntb_device instance
+ *
+ * Returns true if the hardware is connected to the remote system
+ *
+ * RETURNS: true or false based on the hardware link state
+ */
+static inline bool ntb_hw_link_status(struct ntb_device *ndev)
+{
+	return ndev->link_status == NTB_LINK_UP;
+}
+
+/**
+ * ntb_query_pdev() - return the pci_dev pointer
+ * @ndev: pointer to ntb_device instance
+ *
+ * Given the ntb pointer return the pci_dev pointerfor the NTB hardware device
+ *
+ * RETURNS: a pointer to the ntb pci_dev
+ */
+static inline struct pci_dev *ntb_query_pdev(struct ntb_device *ndev)
+{
+	return ndev->pdev;
+}
+
+struct ntb_device *ntb_register_transport(struct pci_dev *pdev,
+					  void *transport);
+void ntb_unregister_transport(struct ntb_device *ndev);
+void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr);
+int ntb_register_db_callback(struct ntb_device *ndev, unsigned int idx,
+			     void *data, void (*db_cb_func) (void *data,
+							     int db_num));
+void ntb_unregister_db_callback(struct ntb_device *ndev, unsigned int idx);
+int ntb_register_event_callback(struct ntb_device *ndev,
+				void (*event_cb_func) (void *handle,
+						       unsigned int event));
+void ntb_unregister_event_callback(struct ntb_device *ndev);
+int ntb_get_max_spads(struct ntb_device *ndev);
+int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
+int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
+int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
+int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
+void *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
+resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
+void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
+void *ntb_find_transport(struct pci_dev *pdev);
+
+int ntb_transport_init(struct pci_dev *pdev);
+void ntb_transport_free(void *transport);
diff --git a/drivers/ntb/ntb_regs.h b/drivers/ntb/ntb_regs.h
new file mode 100644
index 0000000..5bfa8c0
--- /dev/null
+++ b/drivers/ntb/ntb_regs.h
@@ -0,0 +1,139 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+#define NTB_LINK_ENABLE		0x0000
+#define NTB_LINK_DISABLE	0x0002
+#define NTB_LINK_STATUS_ACTIVE	0x2000
+#define NTB_LINK_SPEED_MASK	0x000f
+#define NTB_LINK_WIDTH_MASK	0x03f0
+
+#define SNB_MSIX_CNT		4
+#define SNB_MAX_SPADS		16
+#define SNB_MAX_COMPAT_SPADS	8
+/* Reserve the uppermost bit for link interrupt */
+#define SNB_MAX_DB_BITS		15
+#define SNB_DB_BITS_PER_VEC	5
+
+#define SNB_DB_HW_LINK		0x8000
+
+#define SNB_PCICMD_OFFSET	0x0504
+#define SNB_DEVCTRL_OFFSET	0x0598
+#define SNB_LINK_STATUS_OFFSET	0x01A2
+
+#define SNB_PBAR2LMT_OFFSET	0x0000
+#define SNB_PBAR4LMT_OFFSET	0x0008
+#define SNB_PBAR2XLAT_OFFSET	0x0010
+#define SNB_PBAR4XLAT_OFFSET	0x0018
+#define SNB_SBAR2LMT_OFFSET	0x0020
+#define SNB_SBAR4LMT_OFFSET	0x0028
+#define SNB_SBAR2XLAT_OFFSET	0x0030
+#define SNB_SBAR4XLAT_OFFSET	0x0038
+#define SNB_SBAR0BASE_OFFSET	0x0040
+#define SNB_SBAR2BASE_OFFSET	0x0048
+#define SNB_SBAR4BASE_OFFSET	0x0050
+#define SNB_NTBCNTL_OFFSET	0x0058
+#define SNB_SBDF_OFFSET		0x005C
+#define SNB_PDOORBELL_OFFSET	0x0060
+#define SNB_PDBMSK_OFFSET	0x0062
+#define SNB_SDOORBELL_OFFSET	0x0064
+#define SNB_SDBMSK_OFFSET	0x0066
+#define SNB_USMEMMISS		0x0070
+#define SNB_SPAD_OFFSET		0x0080
+#define SNB_SPADSEMA4_OFFSET	0x00c0
+#define SNB_WCCNTRL_OFFSET	0x00e0
+#define SNB_B2B_SPAD_OFFSET	0x0100
+#define SNB_B2B_DOORBELL_OFFSET	0x0140
+#define SNB_B2B_XLAT_OFFSET	0x0144
+
+#define BWD_MSIX_CNT		34
+#define BWD_MAX_SPADS		16
+#define BWD_MAX_COMPAT_SPADS	16
+#define BWD_MAX_DB_BITS		34
+#define BWD_DB_BITS_PER_VEC	1
+
+#define BWD_PCICMD_OFFSET	0xb004
+#define BWD_MBAR23_OFFSET	0xb018
+#define BWD_MBAR45_OFFSET	0xb020
+#define BWD_DEVCTRL_OFFSET	0xb048
+#define BWD_LINK_STATUS_OFFSET	0xb052
+
+#define BWD_SBAR2XLAT_OFFSET	0x0008
+#define BWD_SBAR4XLAT_OFFSET	0x0010
+#define BWD_PDOORBELL_OFFSET	0x0020
+#define BWD_PDBMSK_OFFSET	0x0028
+#define BWD_NTBCNTL_OFFSET	0x0060
+#define BWD_EBDF_OFFSET		0x0064
+#define BWD_SPAD_OFFSET		0x0080
+#define BWD_SPADSEMA_OFFSET	0x00c0
+#define BWD_STKYSPAD_OFFSET	0x00c4
+#define BWD_PBAR2XLAT_OFFSET	0x8008
+#define BWD_PBAR4XLAT_OFFSET	0x8010
+#define BWD_B2B_DOORBELL_OFFSET	0x8020
+#define BWD_B2B_SPAD_OFFSET	0x8080
+#define BWD_B2B_SPADSEMA_OFFSET	0x80c0
+#define BWD_B2B_STKYSPAD_OFFSET	0x80c4
+
+#define NTB_CNTL_BAR23_SNOOP	(1 << 2)
+#define NTB_CNTL_BAR45_SNOOP	(1 << 6)
+#define BWD_CNTL_LINK_DOWN	(1 << 16)
+
+#define NTB_PPD_OFFSET		0x00D4
+#define SNB_PPD_CONN_TYPE	0x0003
+#define SNB_PPD_DEV_TYPE	0x0010
+#define BWD_PPD_INIT_LINK	0x0008
+#define BWD_PPD_CONN_TYPE	0x0300
+#define BWD_PPD_DEV_TYPE	0x1000
+
+#define BWD_PBAR2XLAT_USD_ADDR	0x0000004000000000
+#define BWD_PBAR4XLAT_USD_ADDR	0x0000008000000000
+#define BWD_MBAR23_USD_ADDR	0x000000410000000C
+#define BWD_MBAR45_USD_ADDR	0x000000810000000C
+#define BWD_PBAR2XLAT_DSD_ADDR	0x0000004100000000
+#define BWD_PBAR4XLAT_DSD_ADDR	0x0000008100000000
+#define BWD_MBAR23_DSD_ADDR	0x000000400000000C
+#define BWD_MBAR45_DSD_ADDR	0x000000800000000C
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
new file mode 100644
index 0000000..c907e07
--- /dev/null
+++ b/drivers/ntb/ntb_transport.c
@@ -0,0 +1,1427 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/errno.h>
+#include <linux/export.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/ntb.h>
+#include "ntb_hw.h"
+
+#define NTB_TRANSPORT_VERSION	1
+
+static int transport_mtu = 0x401E;
+module_param(transport_mtu, uint, 0644);
+MODULE_PARM_DESC(transport_mtu, "Maximum size of NTB transport packets");
+
+static unsigned char max_num_clients = 2;
+module_param(max_num_clients, byte, 0644);
+MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");
+
+struct ntb_queue_entry {
+	/* ntb_queue list reference */
+	struct list_head entry;
+	/* pointers to data to be transfered */
+	void *cb_data;
+	void *buf;
+	unsigned int len;
+	unsigned int flags;
+};
+
+struct ntb_transport_qp {
+	struct ntb_transport *transport;
+	struct ntb_device *ndev;
+	void *cb_data;
+
+	bool client_ready;
+	bool qp_link;
+	u8 qp_num;	/* Only 64 QP's are allowed.  0-63 */
+
+	void (*tx_handler) (struct ntb_transport_qp *qp, void *qp_data,
+			    void *data, int len);
+	struct list_head tx_free_q;
+	spinlock_t ntb_tx_free_q_lock;
+	void *tx_mw_begin;
+	void *tx_mw_end;
+	void *tx_offset;
+
+	void (*rx_handler) (struct ntb_transport_qp *qp, void *qp_data,
+			    void *data, int len);
+	struct tasklet_struct rx_work;
+	struct list_head rx_pend_q;
+	struct list_head rx_free_q;
+	spinlock_t ntb_rx_pend_q_lock;
+	spinlock_t ntb_rx_free_q_lock;
+	void *rx_buff_begin;
+	void *rx_buff_end;
+	void *rx_offset;
+
+	void (*event_handler) (void *data, int status);
+	struct delayed_work link_work;
+
+	struct dentry *debugfs_dir;
+	struct dentry *debugfs_stats;
+
+	/* Stats */
+	u64 rx_bytes;
+	u64 rx_pkts;
+	u64 rx_ring_empty;
+	u64 rx_err_no_buf;
+	u64 rx_err_oflow;
+	u64 rx_err_ver;
+	u64 tx_bytes;
+	u64 tx_pkts;
+	u64 tx_ring_full;
+};
+
+struct ntb_transport_mw {
+	size_t size;
+	void *virt_addr;
+	dma_addr_t dma_addr;
+};
+
+struct ntb_transport_client_dev {
+	struct list_head entry;
+	struct device dev;
+};
+
+struct ntb_transport {
+	struct list_head entry;
+	struct list_head client_devs;
+
+	struct ntb_device *ndev;
+	struct ntb_transport_mw mw[NTB_NUM_MW];
+	struct ntb_transport_qp *qps;
+	unsigned int max_qps;
+	unsigned long qp_bitmap;
+	bool transport_link;
+	struct delayed_work link_work;
+	struct dentry *debugfs_dir;
+};
+
+enum {
+	DESC_DONE_FLAG = 1 << 0,
+	LINK_DOWN_FLAG = 1 << 1,
+};
+
+struct ntb_payload_header {
+	u64 ver;
+	unsigned int len;
+	unsigned int flags;
+};
+
+enum {
+	VERSION = 0,
+	MW0_SZ,
+	MW1_SZ,
+	NUM_QPS,
+	QP_LINKS,
+	MAX_SPAD,
+};
+
+#define QP_TO_MW(qp)		((qp) % NTB_NUM_MW)
+#define NTB_QP_DEF_NUM_ENTRIES	100
+#define NTB_LINK_DOWN_TIMEOUT	10
+
+static int ntb_match_bus(struct device *dev, struct device_driver *drv)
+{
+	return !strncmp(dev_name(dev), drv->name, strlen(drv->name));
+}
+
+static int ntb_client_probe(struct device *dev)
+{
+	const struct ntb_client *drv = container_of(dev->driver,
+						    struct ntb_client, driver);
+	struct pci_dev *pdev = container_of(dev->parent, struct pci_dev, dev);
+	int rc = -EINVAL;
+
+	get_device(dev);
+	if (drv && drv->probe)
+		rc = drv->probe(pdev);
+	if (rc)
+		put_device(dev);
+
+	return rc;
+}
+
+static int ntb_client_remove(struct device *dev)
+{
+	const struct ntb_client *drv = container_of(dev->driver,
+						    struct ntb_client, driver);
+	struct pci_dev *pdev = container_of(dev->parent, struct pci_dev, dev);
+
+	if (drv && drv->remove)
+		drv->remove(pdev);
+
+	put_device(dev);
+
+	return 0;
+}
+
+struct bus_type ntb_bus_type = {
+	.name = "ntb_bus",
+	.match = ntb_match_bus,
+	.probe = ntb_client_probe,
+	.remove = ntb_client_remove,
+};
+
+static LIST_HEAD(ntb_transport_list);
+
+static int __devinit ntb_bus_init(struct ntb_transport *nt)
+{
+	if (list_empty(&ntb_transport_list)) {
+		int rc = bus_register(&ntb_bus_type);
+		if (rc)
+			return rc;
+	}
+
+	list_add(&nt->entry, &ntb_transport_list);
+
+	return 0;
+}
+
+static void __devexit ntb_bus_remove(struct ntb_transport *nt)
+{
+	struct ntb_transport_client_dev *client_dev, *cd;
+
+	list_for_each_entry_safe(client_dev, cd, &nt->client_devs, entry) {
+		dev_err(client_dev->dev.parent, "%s still attached to bus, removing\n",
+			dev_name(&client_dev->dev));
+		list_del(&client_dev->entry);
+		device_unregister(&client_dev->dev);
+	}
+
+	list_del(&nt->entry);
+
+	if (list_empty(&ntb_transport_list))
+		bus_unregister(&ntb_bus_type);
+}
+
+static void ntb_client_release(struct device *dev)
+{
+	struct ntb_transport_client_dev *client_dev;
+	client_dev = container_of(dev, struct ntb_transport_client_dev, dev);
+
+	kfree(client_dev);
+}
+
+/**
+ * ntb_unregister_client_dev - Unregister NTB client device
+ * @device_name: Name of NTB client device
+ *
+ * Unregister an NTB client device with the NTB transport layer
+ */
+void ntb_unregister_client_dev(char *device_name)
+{
+	struct ntb_transport_client_dev *client, *cd;
+	struct ntb_transport *nt;
+
+	list_for_each_entry(nt, &ntb_transport_list, entry)
+		list_for_each_entry_safe(client, cd, &nt->client_devs, entry)
+			if (!strncmp(dev_name(&client->dev), device_name,
+				     strlen(device_name))) {
+				list_del(&client->entry);
+				device_unregister(&client->dev);
+			}
+}
+EXPORT_SYMBOL_GPL(ntb_unregister_client_dev);
+
+/**
+ * ntb_register_client_dev - Register NTB client device
+ * @device_name: Name of NTB client device
+ *
+ * Register an NTB client device with the NTB transport layer
+ */
+int ntb_register_client_dev(char *device_name)
+{
+	struct ntb_transport_client_dev *client_dev;
+	struct ntb_transport *nt;
+	int rc;
+
+	list_for_each_entry(nt, &ntb_transport_list, entry) {
+		struct device *dev;
+
+		client_dev = kzalloc(sizeof(struct ntb_transport_client_dev),
+				     GFP_KERNEL);
+		if (!client_dev) {
+			rc = -ENOMEM;
+			goto err;
+		}
+
+		dev = &client_dev->dev;
+
+		/* setup and register client devices */
+		dev_set_name(dev, "%s", device_name);
+		dev->bus = &ntb_bus_type;
+		dev->release = ntb_client_release;
+		dev->parent = &ntb_query_pdev(nt->ndev)->dev;
+
+		rc = device_register(dev);
+		if (rc) {
+			kfree(client_dev);
+			goto err;
+		}
+
+		list_add_tail(&client_dev->entry, &nt->client_devs);
+	}
+
+	return 0;
+
+err:
+	ntb_unregister_client_dev(device_name);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(ntb_register_client_dev);
+
+/**
+ * ntb_register_client - Register NTB client driver
+ * @drv: NTB client driver to be registered
+ *
+ * Register an NTB client driver with the NTB transport layer
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_register_client(struct ntb_client *drv)
+{
+	drv->driver.bus = &ntb_bus_type;
+
+	return driver_register(&drv->driver);
+}
+EXPORT_SYMBOL_GPL(ntb_register_client);
+
+/**
+ * ntb_unregister_client - Unregister NTB client driver
+ * @drv: NTB client driver to be unregistered
+ *
+ * Unregister an NTB client driver with the NTB transport layer
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+void ntb_unregister_client(struct ntb_client *drv)
+{
+	driver_unregister(&drv->driver);
+}
+EXPORT_SYMBOL_GPL(ntb_unregister_client);
+
+static int debugfs_open(struct inode *inode, struct file *filp)
+{
+	filp->private_data = inode->i_private;
+	return 0;
+}
+
+static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
+			    loff_t *offp)
+{
+	struct ntb_transport_qp *qp;
+	char buf[1024];
+	ssize_t ret, out_offset, out_count;
+
+	out_count = 1024;
+
+	qp = filp->private_data;
+	out_offset = 0;
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "NTB QP stats\n");
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_bytes - \t%llu\n", qp->rx_bytes);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_pkts - \t%llu\n", qp->rx_pkts);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_ring_empty - %llu\n", qp->rx_ring_empty);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_err_oflow - \t%llu\n", qp->rx_err_oflow);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_err_ver - \t%llu\n", qp->rx_err_ver);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_buff_begin - %p\n", qp->rx_buff_begin);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_offset - \t%p\n", qp->rx_offset);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_buff_end - \t%p\n", qp->rx_buff_end);
+
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_bytes - \t%llu\n", qp->tx_bytes);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_pkts - \t%llu\n", qp->tx_pkts);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_ring_full - \t%llu\n", qp->tx_ring_full);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_mw_begin - \t%p\n", qp->tx_mw_begin);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_offset - \t%p\n", qp->tx_offset);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_mw_end - \t%p\n", qp->tx_mw_end);
+
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "QP Link %s\n", (qp->qp_link == NTB_LINK_UP) ?
+			       "Up" : "Down");
+
+	ret = simple_read_from_buffer(ubuf, count, offp, buf, out_offset);
+	return ret;
+}
+
+static const struct file_operations ntb_qp_debugfs_stats = {
+	.owner = THIS_MODULE,
+	.open = debugfs_open,
+	.read = debugfs_read,
+};
+
+static void ntb_list_add(spinlock_t *lock, struct list_head *entry,
+			 struct list_head *list)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(lock, flags);
+	list_add_tail(entry, list);
+	spin_unlock_irqrestore(lock, flags);
+}
+
+static struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock,
+						struct list_head *list)
+{
+	struct ntb_queue_entry *entry;
+	unsigned long flags;
+
+	spin_lock_irqsave(lock, flags);
+	if (list_empty(list)) {
+		entry = NULL;
+		goto out;
+	}
+	entry = list_first_entry(list, struct ntb_queue_entry, entry);
+	list_del(&entry->entry);
+out:
+	spin_unlock_irqrestore(lock, flags);
+
+	return entry;
+}
+
+static void ntb_transport_setup_qp_mw(struct ntb_transport *nt,
+				      unsigned int qp_num)
+{
+	struct ntb_transport_qp *qp = &nt->qps[qp_num];
+	unsigned int size, num_qps_mw;
+	u8 mw_num = QP_TO_MW(qp_num);
+
+	WARN_ON(nt->mw[mw_num].virt_addr == 0);
+
+	if (nt->max_qps % NTB_NUM_MW && !mw_num)
+		num_qps_mw = nt->max_qps / NTB_NUM_MW +
+			     (nt->max_qps % NTB_NUM_MW - mw_num);
+	else
+		num_qps_mw = nt->max_qps / NTB_NUM_MW;
+
+	size = nt->mw[mw_num].size / num_qps_mw;
+
+	qp->rx_buff_begin = nt->mw[mw_num].virt_addr +
+			    (qp_num / NTB_NUM_MW * size);
+	qp->rx_buff_end = qp->rx_buff_begin + size;
+	qp->rx_offset = qp->rx_buff_begin;
+
+	qp->tx_mw_begin = ntb_get_mw_vbase(nt->ndev, mw_num) +
+			  (qp_num / NTB_NUM_MW * size);
+	qp->tx_mw_end = qp->tx_mw_begin + size;
+	qp->tx_offset = qp->tx_mw_begin;
+
+	qp->rx_pkts = 0;
+	qp->tx_pkts = 0;
+}
+
+static int ntb_set_mw(struct ntb_transport *nt, int num_mw, unsigned int size)
+{
+	struct ntb_transport_mw *mw = &nt->mw[num_mw];
+	struct pci_dev *pdev = ntb_query_pdev(nt->ndev);
+	void *offset;
+
+	/* Alloc memory for receiving data.  Must be 4k aligned */
+	mw->size = ALIGN(size, 4096);
+
+	mw->virt_addr = dma_alloc_coherent(&pdev->dev, mw->size, &mw->dma_addr,
+					   GFP_KERNEL);
+	if (!mw->virt_addr) {
+		dev_err(&pdev->dev, "Unable to allocate MW buffer of size %d\n",
+		       (int) mw->size);
+		return -ENOMEM;
+	}
+
+	/* setup the hdr offsets with 0's */
+	for (offset = mw->virt_addr + transport_mtu -
+		      sizeof(struct ntb_payload_header);
+	     offset < mw->virt_addr + size; offset += transport_mtu)
+		memset(offset, 0, sizeof(struct ntb_payload_header));
+
+	/* Notify HW the memory location of the receive buffer */
+	ntb_set_mw_addr(nt->ndev, num_mw, mw->dma_addr);
+
+	return 0;
+}
+
+static void ntb_qp_link_down(struct ntb_transport_qp *qp)
+{
+	struct ntb_transport *nt = qp->transport;
+	struct pci_dev *pdev = ntb_query_pdev(nt->ndev);
+
+	if (qp->qp_link == NTB_LINK_DOWN) {
+		cancel_delayed_work_sync(&qp->link_work);
+		return;
+	}
+
+	if (qp->event_handler)
+		qp->event_handler(qp->cb_data, NTB_LINK_DOWN);
+
+	dev_info(&pdev->dev, "qp %d: Link Down\n", qp->qp_num);
+	qp->qp_link = NTB_LINK_DOWN;
+
+	if (nt->transport_link == NTB_LINK_UP)
+		schedule_delayed_work(&qp->link_work,
+				      msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+}
+
+static void ntb_transport_conn_down(struct ntb_transport *nt)
+{
+	int i;
+
+	if (nt->transport_link == NTB_LINK_DOWN)
+		cancel_delayed_work_sync(&nt->link_work);
+	else
+		nt->transport_link = NTB_LINK_DOWN;
+
+	/* Pass along the info to any clients */
+	for (i = 0; i < nt->max_qps; i++)
+		if (!test_bit(i, &nt->qp_bitmap))
+			ntb_qp_link_down(&nt->qps[i]);
+
+	/* The scratchpad registers keep the values if the remote side
+	 * goes down, blast them now to give them a sane value the next
+	 * time they are accessed
+	 */
+	for (i = 0; i < MAX_SPAD; i++)
+		ntb_write_local_spad(nt->ndev, i, 0);
+}
+
+static void ntb_transport_event_callback(void *data, enum ntb_hw_event event)
+{
+	struct ntb_transport *nt = data;
+
+	switch (event) {
+	case NTB_EVENT_HW_LINK_UP:
+		schedule_delayed_work(&nt->link_work, 0);
+		break;
+	case NTB_EVENT_HW_LINK_DOWN:
+		ntb_transport_conn_down(nt);
+		break;
+	default:
+		BUG();
+	}
+}
+
+static void ntb_transport_link_work(struct work_struct *work)
+{
+	struct ntb_transport *nt = container_of(work, struct ntb_transport,
+						link_work.work);
+	struct ntb_device *ndev = nt->ndev;
+	struct pci_dev *pdev = ntb_query_pdev(ndev);
+	u32 val;
+	int rc, i;
+
+	/* send the local info */
+	rc = ntb_write_remote_spad(ndev, VERSION, NTB_TRANSPORT_VERSION);
+	if (rc) {
+		dev_err(&pdev->dev, "Error writing %x to remote spad %d\n",
+			0, VERSION);
+		goto out;
+	}
+
+	rc = ntb_write_remote_spad(ndev, MW0_SZ, ntb_get_mw_size(ndev, 0));
+	if (rc) {
+		dev_err(&pdev->dev, "Error writing %x to remote spad %d\n",
+			(u32) ntb_get_mw_size(ndev, 0), MW0_SZ);
+		goto out;
+	}
+
+	rc = ntb_write_remote_spad(ndev, MW1_SZ, ntb_get_mw_size(ndev, 1));
+	if (rc) {
+		dev_err(&pdev->dev, "Error writing %x to remote spad %d\n",
+			(u32) ntb_get_mw_size(ndev, 1), MW1_SZ);
+		goto out;
+	}
+
+	rc = ntb_write_remote_spad(ndev, NUM_QPS, nt->max_qps);
+	if (rc) {
+		dev_err(&pdev->dev, "Error writing %x to remote spad %d\n",
+			nt->max_qps, NUM_QPS);
+		goto out;
+	}
+
+	rc = ntb_read_local_spad(nt->ndev, QP_LINKS, &val);
+	if (rc) {
+		dev_err(&pdev->dev, "Error reading spad %d\n", QP_LINKS);
+		goto out;
+	}
+
+	rc = ntb_write_remote_spad(ndev, QP_LINKS, val);
+	if (rc) {
+		dev_err(&pdev->dev, "Error writing %x to remote spad %d\n",
+			val, QP_LINKS);
+		goto out;
+	}
+
+	/* Query the remote side for its info */
+	rc = ntb_read_remote_spad(ndev, VERSION, &val);
+	if (rc) {
+		dev_err(&pdev->dev, "Error reading remote spad %d\n", VERSION);
+		goto out;
+	}
+
+	if (val != NTB_TRANSPORT_VERSION)
+		goto out;
+	dev_dbg(&pdev->dev, "Remote version = %d\n", val);
+
+	rc = ntb_read_remote_spad(ndev, NUM_QPS, &val);
+	if (rc) {
+		dev_err(&pdev->dev, "Error reading remote spad %d\n", NUM_QPS);
+		goto out;
+	}
+
+	if (val != nt->max_qps)
+		goto out;
+	dev_dbg(&pdev->dev, "Remote max number of qps = %d\n", val);
+
+	rc = ntb_read_remote_spad(ndev, MW0_SZ, &val);
+	if (rc) {
+		dev_err(&pdev->dev, "Error reading remote spad %d\n", MW0_SZ);
+		goto out;
+	}
+
+	if (!val)
+		goto out;
+	dev_dbg(&pdev->dev, "Remote MW0 size = %d\n", val);
+
+	rc = ntb_set_mw(nt, 0, val);
+	if (rc)
+		goto out;
+
+	rc = ntb_read_remote_spad(ndev, MW1_SZ, &val);
+	if (rc) {
+		dev_err(&pdev->dev, "Error reading remote spad %d\n", MW1_SZ);
+		goto out;
+	}
+
+	if (!val)
+		goto out;
+	dev_dbg(&pdev->dev, "Remote MW1 size = %d\n", val);
+
+	rc = ntb_set_mw(nt, 1, val);
+	if (rc)
+		goto out;
+
+	nt->transport_link = NTB_LINK_UP;
+
+	for (i = 0; i < nt->max_qps; i++) {
+		struct ntb_transport_qp *qp = &nt->qps[i];
+
+		ntb_transport_setup_qp_mw(nt, i);
+
+		if (qp->client_ready == NTB_LINK_UP)
+			schedule_delayed_work(&qp->link_work, 0);
+	}
+
+	return;
+
+out:
+	if (ntb_hw_link_status(ndev))
+		schedule_delayed_work(&nt->link_work,
+				      msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+}
+
+static void ntb_qp_link_work(struct work_struct *work)
+{
+	struct ntb_transport_qp *qp = container_of(work,
+						   struct ntb_transport_qp,
+						   link_work.work);
+	struct pci_dev *pdev = ntb_query_pdev(qp->ndev);
+	struct ntb_transport *nt = qp->transport;
+	int rc, val;
+
+	WARN_ON(nt->transport_link != NTB_LINK_UP);
+
+	rc = ntb_read_local_spad(nt->ndev, QP_LINKS, &val);
+	if (rc) {
+		dev_err(&pdev->dev, "Error reading spad %d\n", QP_LINKS);
+		return;
+	}
+
+	rc = ntb_write_remote_spad(nt->ndev, QP_LINKS, val | 1 << qp->qp_num);
+	if (rc)
+		dev_err(&pdev->dev, "Error writing %x to remote spad %d\n",
+			val | 1 << qp->qp_num, QP_LINKS);
+
+	/* query remote spad for qp ready bits */
+	rc = ntb_read_remote_spad(nt->ndev, QP_LINKS, &val);
+	if (rc)
+		dev_err(&pdev->dev, "Error reading remote spad %d\n", QP_LINKS);
+
+	dev_dbg(&pdev->dev, "Remote QP link status = %x\n", val);
+
+	/* See if the remote side is up */
+	if (1 << qp->qp_num & val) {
+		qp->qp_link = NTB_LINK_UP;
+
+		dev_info(&pdev->dev, "qp %d: Link Up\n", qp->qp_num);
+		if (qp->event_handler)
+			qp->event_handler(qp->cb_data, NTB_LINK_UP);
+	} else if (nt->transport_link == NTB_LINK_UP)
+		schedule_delayed_work(&qp->link_work,
+				      msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+}
+
+static void ntb_transport_init_queue(struct ntb_transport *nt,
+				     unsigned int qp_num)
+{
+	struct ntb_transport_qp *qp;
+
+	qp = &nt->qps[qp_num];
+	qp->qp_num = qp_num;
+	qp->transport = nt;
+	qp->ndev = nt->ndev;
+	qp->qp_link = NTB_LINK_DOWN;
+	qp->client_ready = NTB_LINK_DOWN;
+	qp->event_handler = NULL;
+
+	if (nt->debugfs_dir) {
+		char debugfs_name[4];
+
+		snprintf(debugfs_name, 4, "qp%d", qp_num);
+		qp->debugfs_dir = debugfs_create_dir(debugfs_name,
+						     nt->debugfs_dir);
+
+		qp->debugfs_stats = debugfs_create_file("stats", S_IRUSR,
+							qp->debugfs_dir, qp,
+							&ntb_qp_debugfs_stats);
+	}
+
+	INIT_DELAYED_WORK(&qp->link_work, ntb_qp_link_work);
+
+	spin_lock_init(&qp->ntb_rx_pend_q_lock);
+	spin_lock_init(&qp->ntb_rx_free_q_lock);
+	spin_lock_init(&qp->ntb_tx_free_q_lock);
+
+	INIT_LIST_HEAD(&qp->rx_pend_q);
+	INIT_LIST_HEAD(&qp->rx_free_q);
+	INIT_LIST_HEAD(&qp->tx_free_q);
+}
+
+int ntb_transport_init(struct pci_dev *pdev)
+{
+	struct ntb_transport *nt;
+	int rc, i;
+
+	nt = kzalloc(sizeof(struct ntb_transport), GFP_KERNEL);
+	if (!nt)
+		return -ENOMEM;
+
+	if (debugfs_initialized())
+		nt->debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
+	else
+		nt->debugfs_dir = NULL;
+
+	nt->ndev = ntb_register_transport(pdev, nt);
+	if (!nt->ndev) {
+		rc = -EIO;
+		goto err;
+	}
+
+	nt->max_qps = min(nt->ndev->max_cbs, max_num_clients);
+
+	nt->qps = kcalloc(nt->max_qps, sizeof(struct ntb_transport_qp),
+			  GFP_KERNEL);
+	if (!nt->qps) {
+		rc = -ENOMEM;
+		goto err1;
+	}
+
+	nt->qp_bitmap = ((u64) 1 << nt->max_qps) - 1;
+
+	for (i = 0; i < nt->max_qps; i++)
+		ntb_transport_init_queue(nt, i);
+
+	INIT_DELAYED_WORK(&nt->link_work, ntb_transport_link_work);
+
+	rc = ntb_register_event_callback(nt->ndev,
+					 ntb_transport_event_callback);
+	if (rc)
+		goto err2;
+
+	INIT_LIST_HEAD(&nt->client_devs);
+	rc = ntb_bus_init(nt);
+	if (rc)
+		goto err3;
+
+	if (ntb_hw_link_status(nt->ndev))
+		schedule_delayed_work(&nt->link_work, 0);
+
+	return 0;
+
+err3:
+	ntb_unregister_event_callback(nt->ndev);
+err2:
+	kfree(nt->qps);
+err1:
+	ntb_unregister_transport(nt->ndev);
+err:
+	debugfs_remove_recursive(nt->debugfs_dir);
+	kfree(nt);
+	return rc;
+}
+
+void ntb_transport_free(void *transport)
+{
+	struct ntb_transport *nt = transport;
+	struct pci_dev *pdev;
+	int i;
+
+	nt->transport_link = NTB_LINK_DOWN;
+
+	/* verify that all the qp's are freed */
+	for (i = 0; i < nt->max_qps; i++)
+		if (!test_bit(i, &nt->qp_bitmap))
+			ntb_transport_free_queue(&nt->qps[i]);
+
+	ntb_bus_remove(nt);
+
+	cancel_delayed_work_sync(&nt->link_work);
+
+	debugfs_remove_recursive(nt->debugfs_dir);
+
+	ntb_unregister_event_callback(nt->ndev);
+
+	pdev = ntb_query_pdev(nt->ndev);
+
+	for (i = 0; i < NTB_NUM_MW; i++)
+		if (nt->mw[i].virt_addr)
+			dma_free_coherent(&pdev->dev, nt->mw[i].size,
+					  nt->mw[i].virt_addr,
+					  nt->mw[i].dma_addr);
+
+	kfree(nt->qps);
+	ntb_unregister_transport(nt->ndev);
+	kfree(nt);
+}
+
+static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
+			     struct ntb_queue_entry *entry, void *offset)
+{
+
+	struct ntb_payload_header *hdr;
+
+	BUG_ON(offset < qp->rx_buff_begin ||
+	       offset + transport_mtu >= qp->rx_buff_end);
+
+	hdr = offset + transport_mtu - sizeof(struct ntb_payload_header);
+	entry->len = hdr->len;
+
+	memcpy(entry->buf, offset, entry->len);
+
+	/* Ensure that the data is fully copied out before clearing the flag */
+	wmb();
+	hdr->flags = 0;
+
+	if (qp->rx_handler && qp->client_ready == NTB_LINK_UP)
+		qp->rx_handler(qp, qp->cb_data, entry->cb_data, entry->len);
+
+	ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry, &qp->rx_free_q);
+}
+
+static int ntb_process_rxc(struct ntb_transport_qp *qp)
+{
+	struct ntb_payload_header *hdr;
+	struct ntb_queue_entry *entry;
+	void *offset;
+
+	entry = ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q);
+	if (!entry) {
+		hdr = offset + transport_mtu -
+		      sizeof(struct ntb_payload_header);
+		dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
+			"no buffer - HDR ver %llu, len %d, flags %x\n",
+			hdr->ver, hdr->len, hdr->flags);
+		qp->rx_err_no_buf++;
+		return -ENOMEM;
+	}
+
+	offset = qp->rx_offset;
+	hdr = offset + transport_mtu - sizeof(struct ntb_payload_header);
+
+	if (!(hdr->flags & DESC_DONE_FLAG)) {
+		ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+				  &qp->rx_pend_q);
+		qp->rx_ring_empty++;
+		return -EAGAIN;
+	}
+
+	if (hdr->ver != qp->rx_pkts) {
+		dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
+			"qp %d: version mismatch, expected %llu - got %llu\n",
+			qp->qp_num, qp->rx_pkts, hdr->ver);
+		ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+				  &qp->rx_pend_q);
+		qp->rx_err_ver++;
+		return -EIO;
+	}
+
+	if (hdr->flags & LINK_DOWN_FLAG) {
+		ntb_qp_link_down(qp);
+
+		ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+				  &qp->rx_pend_q);
+
+		/* Ensure that the data is fully copied out before clearing the
+		 * done flag
+		 */
+		wmb();
+		hdr->flags = 0;
+		goto out;
+	}
+
+	dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
+		"rx offset %p, ver %llu - %d payload received, buf size %d\n",
+		qp->rx_offset, hdr->ver, hdr->len, entry->len);
+
+	if (hdr->len <= entry->len)
+		ntb_rx_copy_task(qp, entry, offset);
+	else {
+		ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+				  &qp->rx_pend_q);
+
+		/* Ensure that the data is fully copied out before clearing the
+		 * done flag
+		 */
+		wmb();
+		hdr->flags = 0;
+		qp->rx_err_oflow++;
+		dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
+			"RX overflow! Wanted %d got %d\n",
+			hdr->len, entry->len);
+	}
+
+	qp->rx_bytes += hdr->len;
+	qp->rx_pkts++;
+
+out:
+	qp->rx_offset += transport_mtu;
+	if (qp->rx_offset + transport_mtu >= qp->rx_buff_end)
+		qp->rx_offset = qp->rx_buff_begin;
+
+	return 0;
+}
+
+static void ntb_transport_rx(unsigned long data)
+{
+	struct ntb_transport_qp *qp = (struct ntb_transport_qp *)data;
+	int rc;
+
+	do {
+		rc = ntb_process_rxc(qp);
+	} while (!rc);
+}
+
+static void ntb_transport_rxc_db(void *data, int db_num)
+{
+	struct ntb_transport_qp *qp = data;
+
+	dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%s: doorbell %d received\n",
+		__func__, db_num);
+
+	tasklet_schedule(&qp->rx_work);
+}
+
+static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
+			     struct ntb_queue_entry *entry,
+			     void *offset)
+{
+	struct ntb_payload_header *hdr;
+
+	BUG_ON(offset < qp->tx_mw_begin ||
+	       offset + transport_mtu >= qp->tx_mw_end);
+
+	memcpy_toio(offset, entry->buf, entry->len);
+
+	hdr = offset + transport_mtu - sizeof(struct ntb_payload_header);
+	hdr->len = entry->len;
+	hdr->ver = qp->tx_pkts;
+
+	/* Ensure that the data is fully copied out before setting the flag */
+	mmiowb();
+	hdr->flags = entry->flags | DESC_DONE_FLAG;
+
+	ntb_ring_sdb(qp->ndev, qp->qp_num);
+
+	/* The entry length can only be zero if the packet is intended to be a
+	 * "link down" or similar.  Since no payload is being sent in these
+	 * cases, there is nothing to add to the completion queue.
+	 */
+	if (entry->len > 0) {
+		qp->tx_bytes += entry->len;
+
+		if (qp->tx_handler)
+			qp->tx_handler(qp, qp->cb_data, entry->cb_data,
+				       entry->len);
+	}
+
+	ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry, &qp->tx_free_q);
+}
+
+static int ntb_process_tx(struct ntb_transport_qp *qp,
+			  struct ntb_queue_entry *entry)
+{
+	struct ntb_payload_header *hdr;
+	void *offset;
+
+	offset = qp->tx_offset;
+	hdr = offset + transport_mtu - sizeof(struct ntb_payload_header);
+
+	dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - offset %p, tx %p, entry len %d flags %x buff %p\n",
+		 qp->tx_pkts, offset, qp->tx_offset, entry->len, entry->flags,
+		 entry->buf);
+	if (hdr->flags) {
+		qp->tx_ring_full++;
+		return -EAGAIN;
+	}
+
+	if (entry->len > transport_mtu - sizeof(struct ntb_payload_header)) {
+		if (qp->tx_handler)
+			qp->tx_handler(qp->cb_data, qp, NULL, -EIO);
+
+		ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
+			     &qp->tx_free_q);
+		return 0;
+	}
+
+	ntb_tx_copy_task(qp, entry, offset);
+
+	qp->tx_offset += transport_mtu;
+	if (qp->tx_offset + transport_mtu >= qp->tx_mw_end)
+		qp->tx_offset = qp->tx_mw_begin;
+
+	qp->tx_pkts++;
+
+	return 0;
+}
+
+static void ntb_send_link_down(struct ntb_transport_qp *qp)
+{
+	struct pci_dev *pdev = ntb_query_pdev(qp->ndev);
+	struct ntb_queue_entry *entry;
+	int i, rc;
+
+	if (qp->qp_link == NTB_LINK_DOWN)
+		return;
+
+	qp->qp_link = NTB_LINK_DOWN;
+	dev_info(&pdev->dev, "qp %d: Link Down\n", qp->qp_num);
+
+	for (i = 0; i < NTB_LINK_DOWN_TIMEOUT; i++) {
+		entry = ntb_list_rm(&qp->ntb_tx_free_q_lock,
+					 &qp->tx_free_q);
+		if (entry)
+			break;
+		msleep(100);
+	}
+
+	if (!entry)
+		return;
+
+	entry->cb_data = NULL;
+	entry->buf = NULL;
+	entry->len = 0;
+	entry->flags = LINK_DOWN_FLAG;
+
+	rc = ntb_process_tx(qp, entry);
+	if (rc)
+		dev_err(&pdev->dev, "ntb: QP%d unable to send linkdown msg\n",
+			qp->qp_num);
+}
+
+/**
+ * ntb_transport_create_queue - Create a new NTB transport layer queue
+ * @rx_handler: receive callback function
+ * @tx_handler: transmit callback function
+ * @event_handler: event callback function
+ *
+ * Create a new NTB transport layer queue and provide the queue with a callback
+ * routine for both transmit and receive.  The receive callback routine will be
+ * used to pass up data when the transport has received it on the queue.   The
+ * transmit callback routine will be called when the transport has completed the
+ * transmission of the data on the queue and the data is ready to be freed.
+ *
+ * RETURNS: pointer to newly created ntb_queue, NULL on error.
+ */
+struct ntb_transport_qp *
+ntb_transport_create_queue(void *data, struct pci_dev *pdev,
+			   const struct ntb_queue_handlers *handlers)
+{
+	struct ntb_queue_entry *entry;
+	struct ntb_transport_qp *qp;
+	struct ntb_transport *nt;
+	unsigned int free_queue;
+	int rc, i;
+
+	nt = ntb_find_transport(pdev);
+	if (!nt)
+		goto err;
+
+	free_queue = ffs(nt->qp_bitmap);
+	if (!free_queue)
+		goto err;
+
+	/* decrement free_queue to make it zero based */
+	free_queue--;
+
+	clear_bit(free_queue, &nt->qp_bitmap);
+
+	qp = &nt->qps[free_queue];
+	qp->cb_data = data;
+	qp->rx_handler = handlers->rx_handler;
+	qp->tx_handler = handlers->tx_handler;
+	qp->event_handler = handlers->event_handler;
+
+	for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
+		entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
+		if (!entry)
+			goto err1;
+
+		ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
+				  &qp->rx_free_q);
+	}
+
+	for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
+		entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
+		if (!entry)
+			goto err2;
+
+		ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
+				  &qp->tx_free_q);
+	}
+
+	tasklet_init(&qp->rx_work, ntb_transport_rx, (unsigned long) qp);
+
+	rc = ntb_register_db_callback(qp->ndev, free_queue, qp,
+				      ntb_transport_rxc_db);
+	if (rc)
+		goto err3;
+
+	dev_info(&pdev->dev, "NTB Transport QP %d created\n", qp->qp_num);
+
+	return qp;
+
+err3:
+	tasklet_disable(&qp->rx_work);
+err2:
+	while ((entry =
+		ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
+		kfree(entry);
+err1:
+	while ((entry =
+		ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
+		kfree(entry);
+	set_bit(free_queue, &nt->qp_bitmap);
+err:
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_create_queue);
+
+/**
+ * ntb_transport_free_queue - Frees NTB transport queue
+ * @qp: NTB queue to be freed
+ *
+ * Frees NTB transport queue
+ */
+void ntb_transport_free_queue(struct ntb_transport_qp *qp)
+{
+	struct pci_dev *pdev = ntb_query_pdev(qp->ndev);
+	struct ntb_queue_entry *entry;
+
+	if (!qp)
+		return;
+
+	cancel_delayed_work_sync(&qp->link_work);
+
+	ntb_unregister_db_callback(qp->ndev, qp->qp_num);
+	tasklet_disable(&qp->rx_work);
+
+	while ((entry =
+		ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
+		kfree(entry);
+
+	while ((entry =
+		ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q))) {
+		dev_warn(&pdev->dev, "Freeing item from a non-empty queue\n");
+		kfree(entry);
+	}
+
+	while ((entry =
+		ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
+		kfree(entry);
+
+	set_bit(qp->qp_num, &qp->transport->qp_bitmap);
+
+	dev_info(&pdev->dev, "NTB Transport QP %d freed\n", qp->qp_num);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_free_queue);
+
+/**
+ * ntb_transport_rx_remove - Dequeues enqueued rx packet
+ * @qp: NTB queue to be freed
+ * @len: pointer to variable to write enqueued buffers length
+ *
+ * Dequeues unused buffers from receive queue.  Should only be used during
+ * shutdown of qp.
+ *
+ * RETURNS: NULL error value on error, or void* for success.
+ */
+void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len)
+{
+	struct ntb_queue_entry *entry;
+	void *buf;
+
+	if (!qp || qp->client_ready == NTB_LINK_UP)
+		return NULL;
+
+	entry = ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q);
+	if (!entry)
+		return NULL;
+
+	buf = entry->cb_data;
+	*len = entry->len;
+
+	ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
+			  &qp->rx_free_q);
+
+	return buf;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_rx_remove);
+
+/**
+ * ntb_transport_rx_enqueue - Enqueue a new NTB queue entry
+ * @qp: NTB transport layer queue the entry is to be enqueued on
+ * @cb: per buffer pointer for callback function to use
+ * @data: pointer to data buffer that incoming packets will be copied into
+ * @len: length of the data buffer
+ *
+ * Enqueue a new receive buffer onto the transport queue into which a NTB
+ * payload can be received into.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+			     unsigned int len)
+{
+	struct ntb_queue_entry *entry;
+
+	if (!qp)
+		return -EINVAL;
+
+	entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q);
+	if (!entry)
+		return -ENOMEM;
+
+	entry->cb_data = cb;
+	entry->buf = data;
+	entry->len = len;
+
+	ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+			  &qp->rx_pend_q);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_rx_enqueue);
+
+/**
+ * ntb_transport_tx_enqueue - Enqueue a new NTB queue entry
+ * @qp: NTB transport layer queue the entry is to be enqueued on
+ * @cb: per buffer pointer for callback function to use
+ * @data: pointer to data buffer that will be sent
+ * @len: length of the data buffer
+ *
+ * Enqueue a new transmit buffer onto the transport queue from which a NTB
+ * payload will be transmitted.  This assumes that a lock is behing held to
+ * serialize access to the qp.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+			     unsigned int len)
+{
+	struct ntb_queue_entry *entry;
+	int rc;
+
+	if (!qp || qp->qp_link != NTB_LINK_UP || !len)
+		return -EINVAL;
+
+	entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
+	if (!entry)
+		return -ENOMEM;
+
+	entry->cb_data = cb;
+	entry->buf = data;
+	entry->len = len;
+	entry->flags = 0;
+
+	rc = ntb_process_tx(qp, entry);
+	if (rc)
+		ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
+			     &qp->tx_free_q);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_tx_enqueue);
+
+/**
+ * ntb_transport_link_up - Notify NTB transport of client readiness to use queue
+ * @qp: NTB transport layer queue to be enabled
+ *
+ * Notify NTB transport layer of client readiness to use queue
+ */
+void ntb_transport_link_up(struct ntb_transport_qp *qp)
+{
+	if (!qp)
+		return;
+
+	qp->client_ready = NTB_LINK_UP;
+
+	if (qp->transport->transport_link == NTB_LINK_UP)
+		schedule_delayed_work(&qp->link_work, 0);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_link_up);
+
+/**
+ * ntb_transport_link_down - Notify NTB transport to no longer enqueue data
+ * @qp: NTB transport layer queue to be disabled
+ *
+ * Notify NTB transport layer of client's desire to no longer receive data on
+ * transport queue specified.  It is the client's responsibility to ensure all
+ * entries on queue are purged or otherwise handled appropraitely.
+ */
+void ntb_transport_link_down(struct ntb_transport_qp *qp)
+{
+	struct pci_dev *pdev = ntb_query_pdev(qp->ndev);
+	int rc, val;
+
+	if (!qp)
+		return;
+
+	qp->client_ready = NTB_LINK_DOWN;
+
+	rc = ntb_read_local_spad(qp->ndev, QP_LINKS, &val);
+	if (rc) {
+		dev_err(&pdev->dev, "Error reading spad %d\n", QP_LINKS);
+		return;
+	}
+
+	rc = ntb_write_remote_spad(qp->ndev, QP_LINKS,
+				   val & ~(1 << qp->qp_num));
+	if (rc)
+		dev_err(&pdev->dev, "Error writing %x to remote spad %d\n",
+			val & ~(1 << qp->qp_num), QP_LINKS);
+
+	if (qp->qp_link == NTB_LINK_UP)
+		ntb_send_link_down(qp);
+	else
+		cancel_delayed_work_sync(&qp->link_work);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_link_down);
+
+/**
+ * ntb_transport_link_query - Query transport link state
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query connectivity to the remote system of the NTB transport queue
+ *
+ * RETURNS: true for link up or false for link down
+ */
+bool ntb_transport_link_query(struct ntb_transport_qp *qp)
+{
+	return qp->qp_link == NTB_LINK_UP;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_link_query);
+
+/**
+ * ntb_transport_qp_num - Query the qp number
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query qp number of the NTB transport queue
+ *
+ * RETURNS: a zero based number specifying the qp number
+ */
+unsigned char ntb_transport_qp_num(struct ntb_transport_qp *qp)
+{
+	return qp->qp_num;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
+
+/**
+ * ntb_transport_max_size - Query the max payload size of a qp
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query the maximum payload size permissible on the given qp
+ *
+ * RETURNS: the max payload size of a qp
+ */
+unsigned int
+ntb_transport_max_size(__attribute__((unused)) struct ntb_transport_qp *qp)
+{
+	return transport_mtu - sizeof(struct ntb_payload_header);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_max_size);
diff --git a/include/linux/ntb.h b/include/linux/ntb.h
new file mode 100644
index 0000000..f6a1520
--- /dev/null
+++ b/include/linux/ntb.h
@@ -0,0 +1,83 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+struct ntb_transport_qp;
+
+struct ntb_client {
+	struct device_driver driver;
+	int (*probe) (struct pci_dev *pdev);
+	void (*remove) (struct pci_dev *pdev);
+};
+
+int ntb_register_client(struct ntb_client *drvr);
+void ntb_unregister_client(struct ntb_client *drvr);
+int ntb_register_client_dev(char *device_name);
+void ntb_unregister_client_dev(char *device_name);
+
+struct ntb_queue_handlers {
+	void (*rx_handler) (struct ntb_transport_qp *qp, void *qp_data,
+			    void *data, int len);
+	void (*tx_handler) (struct ntb_transport_qp *qp, void *qp_data,
+			    void *data, int len);
+	void (*event_handler) (void *data, int status);
+};
+
+unsigned char ntb_transport_qp_num(struct ntb_transport_qp *qp);
+unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp);
+struct ntb_transport_qp *
+ntb_transport_create_queue(void *data, struct pci_dev *pdev,
+			   const struct ntb_queue_handlers *handlers);
+void ntb_transport_free_queue(struct ntb_transport_qp *qp);
+int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+			     unsigned int len);
+int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+			     unsigned int len);
+void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len);
+void ntb_transport_link_up(struct ntb_transport_qp *qp);
+void ntb_transport_link_down(struct ntb_transport_qp *qp);
+bool ntb_transport_link_query(struct ntb_transport_qp *qp);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v6 2/2] net: Add support for NTB virtual ethernet device
From: Jon Mason @ 2012-11-17  2:27 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, linux-pci, netdev, Dave Jiang, Nicholas Bellinger
In-Reply-To: <1353119233-32752-1-git-send-email-jon.mason@intel.com>

A virtual ethernet device that uses the NTB transport API to
send/receive data.

Signed-off-by: Jon Mason <jon.mason@intel.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 MAINTAINERS              |    1 +
 drivers/net/Kconfig      |    4 +
 drivers/net/Makefile     |    1 +
 drivers/net/ntb_netdev.c |  419 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 425 insertions(+)
 create mode 100644 drivers/net/ntb_netdev.c

diff --git a/MAINTAINERS b/MAINTAINERS
index b6139ad..e4e7652 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5197,6 +5197,7 @@ NTB DRIVER
 M:	Jon Mason <jon.mason@intel.com>
 S:	Supported
 F:	drivers/ntb/
+F:	drivers/net/ntb_netdev.c
 F:	include/linux/ntb.h
 
 NTFS FILESYSTEM
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 6a70184..5db9acb 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -189,6 +189,10 @@ config NETPOLL_TRAP
 config NET_POLL_CONTROLLER
 	def_bool NETPOLL
 
+config NTB_NETDEV
+	tristate "Virtual Ethernet over NTB"
+	depends on NTB
+
 config RIONET
 	tristate "RapidIO Ethernet over messaging driver support"
 	depends on RAPIDIO
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 335db78..ef3d090 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -71,3 +71,4 @@ obj-$(CONFIG_USB_IPHETH)        += usb/
 obj-$(CONFIG_USB_CDC_PHONET)   += usb/
 
 obj-$(CONFIG_HYPERV_NET) += hyperv/
+obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o
diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
new file mode 100644
index 0000000..af48a69
--- /dev/null
+++ b/drivers/net/ntb_netdev.c
@@ -0,0 +1,419 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Network Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/ntb.h>
+
+#define NTB_NETDEV_VER	"0.6"
+
+MODULE_DESCRIPTION(KBUILD_MODNAME);
+MODULE_VERSION(NTB_NETDEV_VER);
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Intel Corporation");
+
+struct ntb_netdev {
+	struct list_head list;
+	struct pci_dev *pdev;
+	struct net_device *ndev;
+	struct ntb_transport_qp *qp;
+};
+
+#define	NTB_TX_TIMEOUT_MS	1000
+#define	NTB_RXQ_SIZE		100
+
+static LIST_HEAD(dev_list);
+
+static void ntb_netdev_event_handler(void *data, int status)
+{
+	struct net_device *ndev = data;
+	struct ntb_netdev *dev = netdev_priv(ndev);
+
+	netdev_dbg(ndev, "Event %x, Link %x\n", status,
+		   ntb_transport_link_query(dev->qp));
+
+	/* Currently, only link status event is supported */
+	if (status)
+		netif_carrier_on(ndev);
+	else
+		netif_carrier_off(ndev);
+}
+
+static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp, void *qp_data,
+				  void *data, int len)
+{
+	struct net_device *ndev = qp_data;
+	struct sk_buff *skb;
+	int rc;
+
+	skb = data;
+	if (!skb)
+		return;
+
+	netdev_dbg(ndev, "%s: %d byte payload received\n", __func__, len);
+
+	skb_put(skb, len);
+	skb->protocol = eth_type_trans(skb, ndev);
+	skb->ip_summed = CHECKSUM_NONE;
+
+	if (netif_rx(skb) == NET_RX_DROP) {
+		ndev->stats.rx_errors++;
+		ndev->stats.rx_dropped++;
+	} else {
+		ndev->stats.rx_packets++;
+		ndev->stats.rx_bytes += len;
+	}
+
+	skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
+	if (!skb) {
+		ndev->stats.rx_errors++;
+		ndev->stats.rx_frame_errors++;
+		return;
+	}
+
+	rc = ntb_transport_rx_enqueue(qp, skb, skb->data, ndev->mtu + ETH_HLEN);
+	if (rc) {
+		ndev->stats.rx_errors++;
+		ndev->stats.rx_fifo_errors++;
+	}
+}
+
+static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp, void *qp_data,
+				  void *data, int len)
+{
+	struct net_device *ndev = qp_data;
+	struct sk_buff *skb;
+
+	skb = data;
+	if (!skb || !ndev)
+		return;
+
+	if (len > 0) {
+		ndev->stats.tx_packets++;
+		ndev->stats.tx_bytes += skb->len;
+	} else {
+		ndev->stats.tx_errors++;
+		ndev->stats.tx_aborted_errors++;
+	}
+
+	dev_kfree_skb(skb);
+
+	if (netif_queue_stopped(ndev))
+		netif_wake_queue(ndev);
+}
+
+static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
+					 struct net_device *ndev)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+	int rc;
+
+	netdev_dbg(ndev, "ntb_transport_tx_enqueue\n");
+
+	rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
+	if (rc)
+		goto err;
+
+	return NETDEV_TX_OK;
+
+err:
+	ndev->stats.tx_dropped++;
+	ndev->stats.tx_errors++;
+	netif_stop_queue(ndev);
+	return NETDEV_TX_BUSY;
+}
+
+static int ntb_netdev_open(struct net_device *ndev)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct sk_buff *skb;
+	int rc, i, len;
+
+	/* Add some empty rx bufs */
+	for (i = 0; i < NTB_RXQ_SIZE; i++) {
+		skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
+		if (!skb) {
+			rc = -ENOMEM;
+			goto err;
+		}
+
+		rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
+					      ndev->mtu + ETH_HLEN);
+		if (rc == -EINVAL)
+			goto err;
+	}
+
+	netif_carrier_off(ndev);
+	ntb_transport_link_up(dev->qp);
+
+	return 0;
+
+err:
+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+		dev_kfree_skb(skb);
+	return rc;
+}
+
+static int ntb_netdev_close(struct net_device *ndev)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct sk_buff *skb;
+	int len;
+
+	ntb_transport_link_down(dev->qp);
+
+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+		dev_kfree_skb(skb);
+
+	return 0;
+}
+
+static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct sk_buff *skb;
+	int len, rc;
+
+	if (new_mtu > ntb_transport_max_size(dev->qp) - ETH_HLEN)
+		return -EINVAL;
+
+	if (!netif_running(ndev)) {
+		ndev->mtu = new_mtu;
+		return 0;
+	}
+
+	/* Bring down the link and dispose of posted rx entries */
+	ntb_transport_link_down(dev->qp);
+
+	if (ndev->mtu < new_mtu) {
+		int i;
+
+		for (i = 0; (skb = ntb_transport_rx_remove(dev->qp, &len)); i++)
+			dev_kfree_skb(skb);
+
+		for (; i; i--) {
+			skb = netdev_alloc_skb(ndev, new_mtu + ETH_HLEN);
+			if (!skb) {
+				rc = -ENOMEM;
+				goto err;
+			}
+
+			rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
+						      new_mtu + ETH_HLEN);
+			if (rc) {
+				dev_kfree_skb(skb);
+				goto err;
+			}
+		}
+	}
+
+	ndev->mtu = new_mtu;
+
+	ntb_transport_link_up(dev->qp);
+
+	return 0;
+
+err:
+	ntb_transport_link_down(dev->qp);
+
+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+		dev_kfree_skb(skb);
+
+	netdev_err(ndev, "Error changing MTU, device inoperable\n");
+	return rc;
+}
+
+static void ntb_netdev_tx_timeout(struct net_device *ndev)
+{
+	if (netif_running(ndev))
+		netif_wake_queue(ndev);
+}
+
+static const struct net_device_ops ntb_netdev_ops = {
+	.ndo_open = ntb_netdev_open,
+	.ndo_stop = ntb_netdev_close,
+	.ndo_start_xmit = ntb_netdev_start_xmit,
+	.ndo_change_mtu = ntb_netdev_change_mtu,
+	.ndo_tx_timeout = ntb_netdev_tx_timeout,
+	.ndo_set_mac_address = eth_mac_addr,
+};
+
+static void ntb_get_drvinfo(struct net_device *ndev,
+			    struct ethtool_drvinfo *info)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+
+	strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
+	strlcpy(info->version, NTB_NETDEV_VER, sizeof(info->version));
+	strlcpy(info->bus_info, pci_name(dev->pdev), sizeof(info->bus_info));
+}
+
+static int ntb_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+	cmd->supported = SUPPORTED_Backplane;
+	cmd->advertising = ADVERTISED_Backplane;
+	cmd->speed = SPEED_UNKNOWN;
+	ethtool_cmd_speed_set(cmd, SPEED_UNKNOWN);
+	cmd->duplex = DUPLEX_FULL;
+	cmd->port = PORT_OTHER;
+	cmd->phy_address = 0;
+	cmd->transceiver = XCVR_DUMMY1;
+	cmd->autoneg = AUTONEG_ENABLE;
+	cmd->maxtxpkt = 0;
+	cmd->maxrxpkt = 0;
+
+	return 0;
+}
+
+static const struct ethtool_ops ntb_ethtool_ops = {
+	.get_drvinfo = ntb_get_drvinfo,
+	.get_link = ethtool_op_get_link,
+	.get_settings = ntb_get_settings,
+};
+
+static const struct ntb_queue_handlers ntb_netdev_handlers = {
+	.tx_handler = ntb_netdev_tx_handler,
+	.rx_handler = ntb_netdev_rx_handler,
+	.event_handler = ntb_netdev_event_handler,
+};
+
+static int __devinit ntb_netdev_probe(struct pci_dev *pdev)
+{
+	struct net_device *ndev;
+	struct ntb_netdev *dev;
+	int rc;
+
+	ndev = alloc_etherdev(sizeof(struct ntb_netdev));
+	if (!ndev)
+		return -ENOMEM;
+
+	dev = netdev_priv(ndev);
+	dev->ndev = ndev;
+	dev->pdev = pdev;
+	BUG_ON(!dev->pdev);
+	ndev->features = NETIF_F_HIGHDMA;
+
+	ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+
+	ndev->hw_features = ndev->features;
+	ndev->watchdog_timeo = msecs_to_jiffies(NTB_TX_TIMEOUT_MS);
+
+	random_ether_addr(ndev->perm_addr);
+	memcpy(ndev->dev_addr, ndev->perm_addr, ndev->addr_len);
+
+	ndev->netdev_ops = &ntb_netdev_ops;
+	SET_ETHTOOL_OPS(ndev, &ntb_ethtool_ops);
+
+	dev->qp = ntb_transport_create_queue(ndev, pdev, &ntb_netdev_handlers);
+	if (!dev->qp) {
+		rc = -EIO;
+		goto err;
+	}
+
+	ndev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
+
+	rc = register_netdev(ndev);
+	if (rc)
+		goto err1;
+
+	list_add(&dev->list, &dev_list);
+	pr_info("%s: %s created\n", KBUILD_MODNAME, ndev->name);
+	return 0;
+
+err1:
+	ntb_transport_free_queue(dev->qp);
+err:
+	free_netdev(ndev);
+	return rc;
+}
+
+static void __exit ntb_netdev_remove(struct pci_dev *pdev)
+{
+	struct net_device *ndev;
+	struct ntb_netdev *dev;
+
+	list_for_each_entry(dev, &dev_list, list) {
+		if (dev->pdev == pdev)
+			break;
+	}
+	if (dev == NULL)
+		return;
+
+	ndev = dev->ndev;
+
+	unregister_netdev(ndev);
+	ntb_transport_free_queue(dev->qp);
+	free_netdev(ndev);
+}
+
+static struct ntb_client ntb_netdev_client = {
+	.driver.name = KBUILD_MODNAME,
+	.driver.owner = THIS_MODULE,
+	.probe = ntb_netdev_probe,
+	.remove = ntb_netdev_remove,
+};
+
+static int __init ntb_netdev_init_module(void)
+{
+	int rc;
+
+	rc = ntb_register_client_dev(KBUILD_MODNAME);
+	if (rc)
+		return rc;
+	return ntb_register_client(&ntb_netdev_client);
+}
+module_init(ntb_netdev_init_module);
+
+static void __exit ntb_netdev_exit_module(void)
+{
+	ntb_unregister_client(&ntb_netdev_client);
+	ntb_unregister_client_dev(KBUILD_MODNAME);
+	pr_info("%s: Driver removed\n", KBUILD_MODNAME);
+}
+module_exit(ntb_netdev_exit_module);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v6 0/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-11-17  2:27 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, linux-pci, netdev, Dave Jiang, Nicholas Bellinger

I am submitting version 6 of the PCI-Express Non-Transparent Bridge
patches for inclusion in 3.8 via Greg KH's char-misc-next tree.  All
outstanding issues have been addressed.

Version 6 corrects Greg KH's issues, most notably the improper usage of
the Linux device model.
http://thread.gmane.org/gmane.linux.kernel.pci/18599

Thanks,
Jon

^ permalink raw reply

* Re: [PATCH net-next] pch_gbe, ptp_pch: Fix the dependency direction between these drivers
From: Randy Dunlap @ 2012-11-17  2:02 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, netdev, Randy Dunlap, Takahiro Shimizu,
	Richard Cochran
In-Reply-To: <1353116639.2743.88.camel@bwh-desktop.uk.solarflarecom.com>

On 11/16/2012 05:43 PM, Ben Hutchings wrote:

> In commit a24006ed12616bde1bbdb26868495906a212d8dc ('ptp: Enable clock
> drivers along with associated net/PHY drivers') I wrongly made
> PTP_1588_CLOCK_PCH depend on PCH_GBE.  The dependency is really the
> other way around.  Therefore make PCH_GBE select PTP_1588_CLOCK_PCH
> and remove the 'default y' from the latter.
> 
> Reported-by: Randy Dunlap <rdunlap@xenotime.net>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>


Acked-by: Randy Dunlap <rdunlap@xenotime.net>

Thanks.

> ---
>  drivers/net/ethernet/oki-semi/pch_gbe/Kconfig |    1 +
>  drivers/ptp/Kconfig                           |    2 --
>  2 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
> index 00bc4fc..34d05bf 100644
> --- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
> +++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
> @@ -7,6 +7,7 @@ config PCH_GBE
>  	depends on PCI
>  	select NET_CORE
>  	select MII
> +	select PTP_1588_CLOCK_PCH
>  	---help---
>  	  This is a gigabit ethernet driver for EG20T PCH.
>  	  EG20T PCH is the platform controller hub that is used in Intel's
> diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
> index 70c5836..1ea6f1d 100644
> --- a/drivers/ptp/Kconfig
> +++ b/drivers/ptp/Kconfig
> @@ -72,9 +72,7 @@ config DP83640_PHY
>  
>  config PTP_1588_CLOCK_PCH
>  	tristate "Intel PCH EG20T as PTP clock"
> -	depends on PCH_GBE
>  	select PTP_1588_CLOCK
> -	default y
>  	help
>  	  This driver adds support for using the PCH EG20T as a PTP
>  	  clock. The hardware supports time stamping of PTP packets



-- 
~Randy

^ permalink raw reply

* [PATCH net-next] pch_gbe, ptp_pch: Fix the dependency direction between these drivers
From: Ben Hutchings @ 2012-11-17  1:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Randy Dunlap, Takahiro Shimizu, Richard Cochran

In commit a24006ed12616bde1bbdb26868495906a212d8dc ('ptp: Enable clock
drivers along with associated net/PHY drivers') I wrongly made
PTP_1588_CLOCK_PCH depend on PCH_GBE.  The dependency is really the
other way around.  Therefore make PCH_GBE select PTP_1588_CLOCK_PCH
and remove the 'default y' from the latter.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/oki-semi/pch_gbe/Kconfig |    1 +
 drivers/ptp/Kconfig                           |    2 --
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
index 00bc4fc..34d05bf 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
@@ -7,6 +7,7 @@ config PCH_GBE
 	depends on PCI
 	select NET_CORE
 	select MII
+	select PTP_1588_CLOCK_PCH
 	---help---
 	  This is a gigabit ethernet driver for EG20T PCH.
 	  EG20T PCH is the platform controller hub that is used in Intel's
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 70c5836..1ea6f1d 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -72,9 +72,7 @@ config DP83640_PHY
 
 config PTP_1588_CLOCK_PCH
 	tristate "Intel PCH EG20T as PTP clock"
-	depends on PCH_GBE
 	select PTP_1588_CLOCK
-	default y
 	help
 	  This driver adds support for using the PCH EG20T as a PTP
 	  clock. The hardware supports time stamping of PTP packets
-- 
1.7.7.6


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next] pch_gbe, ptp_pch: Fix the dependency direction between these drivers
From: Ben Hutchings @ 2012-11-17  1:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Randy Dunlap, Takahiro Shimizu, Richard Cochran

In commit a24006ed12616bde1bbdb26868495906a212d8dc ('ptp: Enable clock
drivers along with associated net/PHY drivers') I wrongly made
PTP_1588_CLOCK_PCH depend on PCH_GBE.  The dependency is really the
other way around.  Therefore make PCH_GBE select PTP_1588_CLOCK_PCH
and remove the 'default y' from the latter.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/oki-semi/pch_gbe/Kconfig |    1 +
 drivers/ptp/Kconfig                           |    2 --
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
index 00bc4fc..34d05bf 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
@@ -7,6 +7,7 @@ config PCH_GBE
 	depends on PCI
 	select NET_CORE
 	select MII
+	select PTP_1588_CLOCK_PCH
 	---help---
 	  This is a gigabit ethernet driver for EG20T PCH.
 	  EG20T PCH is the platform controller hub that is used in Intel's
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 70c5836..1ea6f1d 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -72,9 +72,7 @@ config DP83640_PHY
 
 config PTP_1588_CLOCK_PCH
 	tristate "Intel PCH EG20T as PTP clock"
-	depends on PCH_GBE
 	select PTP_1588_CLOCK
-	default y
 	help
 	  This driver adds support for using the PCH EG20T as a PTP
 	  clock. The hardware supports time stamping of PTP packets
-- 
1.7.7.6


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* Re: [PATCH net-next] gro: Handle inline VLAN tags
From: Andrew Gallatin @ 2012-11-17  1:17 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, netdev, linux-net-drivers, Eric Dumazet, Herbert Xu
In-Reply-To: <1353097030.2743.28.camel@bwh-desktop.uk.solarflarecom.com>

On 11/16/12 15:17, Ben Hutchings wrote:
> The receive paths for skbs with inline and out-of-line VLAN tags (VLAN
> RX accleration) were made largely consistent in 2.6.37, with tags
> pulled out by software as necessary.  However GRO doesn't do this, so
> it is not effective for VLAN-tagged packets received on devices
> without VLAN RX acceleration.
>
> napi_gro_frags() must not free the skb and does not advance the
> skb->data pointer, so cannot use vlan_untag().  Extract the core of
> vlan_untag() into a new function __vlan_untag() that allows the offset
> to the VLAN tag to be specified and returns an error code.  Add
> kernel-doc comments for both those functions.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Tested with sfc using both napi_gro_receive() and napi_gro_frags().  On
> a Core i7 920 (Nehalem) system it increased TCP/IPv4 receive throughput
> over a VLAN from ~8.0 to ~9.3 Gbit/s.


I verified similar results on myri10ge, using my recent GRO
patchset (minus the in-driver vtag removal) with just
napi_gro_frags().

I've no strong feeling as to whether or not this belongs in GRO or the
driver.  I'm just glad that it is being discussed.

Thank you,

Drew

^ permalink raw reply

* Re: [PATCH net-next] gro: Handle inline VLAN tags
From: Eric Dumazet @ 2012-11-17  1:09 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, netdev, linux-net-drivers, Andrew Gallatin,
	Herbert Xu
In-Reply-To: <1353112324.2743.81.camel@bwh-desktop.uk.solarflarecom.com>

On Sat, 2012-11-17 at 00:32 +0000, Ben Hutchings wrote:
> On Fri, 2012-11-16 at 16:16 -0800, Eric Dumazet wrote:
> > On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:
> > 
> > > I'm not sure what you mean by this.  Is your point that the
> > > copy-on-write is never needed?  It is still possible for pskb_may_pull()
> > > to fail.
> > > 
> > 
> > A packet sniffer should have a copy of bad frames, even if dropped later
> > in our stacks.
> > 
> > GRO layer is not allowed to drop a frame, even if not 'correct'.
> 
> What do you think the accelerated hardware does with frames that have a
> truncated VLAN tag?

The hardware should send us the frame, exactly like when RX checksum is
wrong.

^ permalink raw reply

* Re: [PATCH] checkpatch: add double empty line check
From: Joe Perches @ 2012-11-17  0:50 UTC (permalink / raw)
  To: eilong; +Cc: David Rientjes, Andy Whitcroft, linux-kernel, netdev
In-Reply-To: <1353096274.14327.9.camel@lb-tlvb-eilong.il.broadcom.com>

On Fri, 2012-11-16 at 22:04 +0200, Eilon Greenstein wrote:
> On Fri, 2012-11-16 at 11:55 -0800, David Rientjes wrote:
> > On Fri, 16 Nov 2012, Eilon Greenstein wrote:
> > 
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index 21a9f5d..7a9c153 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -3579,6 +3579,13 @@ sub process {
> > >  			WARN("EXPORTED_WORLD_WRITABLE",
> > >  			     "Exporting world writable files is usually an error. Consider more restrictive permissions.\n" . $herecurr);
> > >  		}
> > > +
> > > +# check for double empty lines
> > > +		if ($line =~ /^\+\s*$/ &&
> > > +		    ($prevline =~ /^\+?\s*$/ || $rawlines[$linenr] =~ /^\s*$/)) {
> > > +			WARN("DOUBLE_EMPTY_LINE",
> > > +			     "One empty line should be sufficient. Consider removing this one.\n" . $herecurr);
> > > +		}
> > >  	}
> > >  
> > >  	# If we have no input at all, then there is nothing to report on
> > 
> > This is fairly common in all the acpi code where variables declared in a 
> > function are separated from the code in a function.
> > 
> 
> Indeed, I see that you do use it in some functions.
> 
> Maybe we can limit it only to the networking tree (similar to the
> networking comments style) or if the ACPI is the exception, we can apply
> to all but ACPI.

I'm not sure this should be done.
Double line spacing has some utility and
is pretty common.

Perhaps make this a --strict/CHK option
and also perhaps make sure this isn't
emitted on consecutive lines.

^ permalink raw reply

* Re: [rfc net-next v6 2/3] virtio_net: multiqueue support
From: Ben Hutchings @ 2012-11-17  0:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, edumazet,
	davem
In-Reply-To: <20121113064026.GA27416@redhat.com>

On Tue, 2012-11-13 at 08:40 +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 05, 2012 at 11:38:39AM +1030, Rusty Russell wrote:
> > > @@ -924,11 +1032,10 @@ static void virtnet_get_ringparam(struct net_device *dev,
> > >  {
> > >  	struct virtnet_info *vi = netdev_priv(dev);
> > >  
> > > -	ring->rx_max_pending = virtqueue_get_vring_size(vi->rvq);
> > > -	ring->tx_max_pending = virtqueue_get_vring_size(vi->svq);
> > > +	ring->rx_max_pending = virtqueue_get_vring_size(vi->rq[0].vq);
> > > +	ring->tx_max_pending = virtqueue_get_vring_size(vi->sq[0].vq);
> > >  	ring->rx_pending = ring->rx_max_pending;
> > >  	ring->tx_pending = ring->tx_max_pending;
> > > -
> > >  }
> > 
> > This assumes all vqs are the same size.  I think this should probably
> > check: for mq mode, use the first vq, otherewise use the 0th.
> 
> For rx_pending/tx_pending I think what is required here is the
> actual number of outstanding buffers.
> Dave, Eric - right?
> 
> So this should be the total over all rings and to be useful,
> rx_max_pending/tx_max_pending should be the total too.

So far as I know, all current implementations use the number of
descriptors per ring here. virtio_net should be consistent with this.

Ben.

> > For bonus points, check this assertion at probe time.
> 
> Looks like we can easily support different queues too.
> 

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] gro: Handle inline VLAN tags
From: Ben Hutchings @ 2012-11-17  0:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, linux-net-drivers, Andrew Gallatin,
	Herbert Xu
In-Reply-To: <1353111414.10798.64.camel@edumazet-glaptop>

On Fri, 2012-11-16 at 16:16 -0800, Eric Dumazet wrote:
> On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:
> 
> > I'm not sure what you mean by this.  Is your point that the
> > copy-on-write is never needed?  It is still possible for pskb_may_pull()
> > to fail.
> > 
> 
> A packet sniffer should have a copy of bad frames, even if dropped later
> in our stacks.
> 
> GRO layer is not allowed to drop a frame, even if not 'correct'.

What do you think the accelerated hardware does with frames that have a
truncated VLAN tag?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next 09/17] net: Allow userns root control of the core of the network stack.
From: Ben Hutchings @ 2012-11-17  0:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers, David Miller
In-Reply-To: <871uft8vpm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

On Fri, 2012-11-16 at 06:32 -0800, Eric W. Biederman wrote:
> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> 
> > On 11/16/2012 05:03 PM, Eric W. Biederman wrote:
> >> +	if (!capable(CAP_NET_ADMIN))
> >> +		return -EPERM;
> >> +
> >>  	return netdev_store(dev, attr, buf, len, change_tx_queue_len);
> >
> > You mean ns_capable here?
> 
> No.  There I meant capable.
> 
> I deliberately call capable here because I don't understand what
> the tx_queue_len well enough to be certain it is safe to relax
> that check to be just ns_capable.
> 
> My get feel is that allowing an unprivileged user to be able to
> arbitrarily change the tx_queue_len on a networking device would be a
> nice way to allow queuing as many network packets as you would like with
> kernel memory and DOSing the machine.
> 
> So since with a quick read of the code I could not convince myself it
> was safe to allow unprivilged users to change tx_queue_len I left it
> protected by capable.  While at the same time I relaxed the check in
> netdev_store to be ns_capable.

Tor the same reason you had better be very selective about which ethtool
commands are allowed based on per-user_ns CAP_NET_ADMIN.  Consider for a
start:

ETHTOOL_SMSGLVL => fill up the system log
ETHTOOL_SEEPROM => brick the NIC
ETHTOOL_FLASHDEV => brick the NIC; own the system if it's not using an IOMMU

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] gro: Handle inline VLAN tags
From: Eric Dumazet @ 2012-11-17  0:16 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, netdev, linux-net-drivers, Andrew Gallatin,
	Herbert Xu
In-Reply-To: <1353110430.2743.73.camel@bwh-desktop.uk.solarflarecom.com>

On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:

> I'm not sure what you mean by this.  Is your point that the
> copy-on-write is never needed?  It is still possible for pskb_may_pull()
> to fail.
> 

A packet sniffer should have a copy of bad frames, even if dropped later
in our stacks.

GRO layer is not allowed to drop a frame, even if not 'correct'.

^ permalink raw reply

* Re: [PATCH net-next 2/2] PPC: net: bpf_jit_comp: add VLAN instructions for BPF JIT
From: Matt Evans @ 2012-11-17  0:06 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, benh, netdev
In-Reply-To: <20121108214137.GB29696@thinkbox>

Hi,

On 08/11/2012, at 9:41 PM, Daniel Borkmann wrote:

> This patch is a follow-up for patch "net: filter: add vlan tag access"
> to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.
> 
> Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
> Cc: Matt Evans <matt@ozlabs.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> Disclaimer: uncompiled and untested, since I don't have a PPC machine,
> but it should (hopefully) integrate cleanly; impl. after PPC instruction
> reference.

And for this too,

Acked-by: Matt Evans <matt@ozlabs.org>

Sorry for the delay in reviewing this!

Thanks,


Matt


> 
> arch/powerpc/net/bpf_jit_comp.c |   14 ++++++++++++++
> 1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index b9434de..e834f1e 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -13,6 +13,8 @@
> #include <asm/cacheflush.h>
> #include <linux/netdevice.h>
> #include <linux/filter.h>
> +#include <linux/if_vlan.h>
> +
> #include "bpf_jit.h"
> 
> #ifndef __BIG_ENDIAN
> @@ -89,6 +91,8 @@ static void bpf_jit_build_prologue(struct sk_filter *fp, u32 *image,
> 	case BPF_S_ANC_IFINDEX:
> 	case BPF_S_ANC_MARK:
> 	case BPF_S_ANC_RXHASH:
> +	case BPF_S_ANC_VLAN_TAG:
> +	case BPF_S_ANC_VLAN_TAG_PRESENT:
> 	case BPF_S_ANC_CPU:
> 	case BPF_S_ANC_QUEUE:
> 	case BPF_S_LD_W_ABS:
> @@ -382,6 +386,16 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 *image,
> 			PPC_LWZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
> 							  rxhash));
> 			break;
> +		case BPF_S_ANC_VLAN_TAG:
> +		case BPF_S_ANC_VLAN_TAG_PRESENT:
> +			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2);
> +			PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
> +							  vlan_tci));
> +			if (filter[i].code == BPF_S_ANC_VLAN_TAG)
> +				PPC_ANDI(r_A, r_A, VLAN_VID_MASK);
> +			else
> +				PPC_ANDI(r_A, r_A, VLAN_TAG_PRESENT);
> +			break;
> 		case BPF_S_ANC_QUEUE:
> 			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
> 						  queue_mapping) != 2);

^ permalink raw reply

* Re: [PATCH net-next 1/2] PPC: net: bpf_jit_comp: add XOR instruction for BPF JIT
From: Matt Evans @ 2012-11-17  0:00 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, benh, netdev
In-Reply-To: <20121108213939.GA29696@thinkbox>

Hi Daniel,


On 08/11/2012, at 9:39 PM, Daniel Borkmann wrote:

> This patch is a follow-up for patch "filter: add XOR instruction for use
> with X/K" that implements BPF PowerPC JIT parts for the BPF XOR operation.
> 
> Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
> Cc: Matt Evans <matt@ozlabs.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> Disclaimer: uncompiled and untested, since I don't have a PPC machine,
> but it should (hopefully) integrate cleanly; impl. after PPC instruction
> reference.

Unfortunately I can only compile and test this mentally, but it looks fine, instruction formats correct etc.  Thanks!


Acked-by: Matt Evans <matt@ozlabs.org>



Matt

> 
> arch/powerpc/include/asm/ppc-opcode.h |    3 +++
> arch/powerpc/net/bpf_jit.h            |    6 ++++++
> arch/powerpc/net/bpf_jit_comp.c       |   11 +++++++++++
> 3 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
> index 5f73ce6..42b1f43 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -168,9 +168,12 @@
> #define PPC_INST_AND			0x7c000038
> #define PPC_INST_ANDDOT			0x7c000039
> #define PPC_INST_OR			0x7c000378
> +#define PPC_INST_XOR			0x7c000278
> #define PPC_INST_ANDI			0x70000000
> #define PPC_INST_ORI			0x60000000
> #define PPC_INST_ORIS			0x64000000
> +#define PPC_INST_XORI			0x68000000
> +#define PPC_INST_XORIS			0x6c000000
> #define PPC_INST_NEG			0x7c0000d0
> #define PPC_INST_BRANCH			0x48000000
> #define PPC_INST_BRANCH_COND		0x40800000
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index 1fc8109..8a5dfaf 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -134,6 +134,12 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
> 				     ___PPC_RS(a) | IMM_L(i))
> #define PPC_ORIS(d, a, i)	EMIT(PPC_INST_ORIS | ___PPC_RA(d) |	      \
> 				     ___PPC_RS(a) | IMM_L(i))
> +#define PPC_XOR(d, a, b)	EMIT(PPC_INST_XOR | ___PPC_RA(d) |	      \
> +				     ___PPC_RS(a) | ___PPC_RB(b))
> +#define PPC_XORI(d, a, i)	EMIT(PPC_INST_XORI | ___PPC_RA(d) |	      \
> +				     ___PPC_RS(a) | IMM_L(i))
> +#define PPC_XORIS(d, a, i)	EMIT(PPC_INST_XORIS | ___PPC_RA(d) |	      \
> +				     ___PPC_RS(a) | IMM_L(i))
> #define PPC_SLW(d, a, s)	EMIT(PPC_INST_SLW | ___PPC_RA(d) |	      \
> 				     ___PPC_RS(a) | ___PPC_RB(s))
> #define PPC_SRW(d, a, s)	EMIT(PPC_INST_SRW | ___PPC_RA(d) |	      \
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index dd11306..b9434de 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -232,6 +232,17 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 *image,
> 			if (K >= 65536)
> 				PPC_ORIS(r_A, r_A, IMM_H(K));
> 			break;
> +		case BPF_S_ANC_ALU_XOR_X:
> +		case BPF_S_ALU_XOR_X: /* A ^= X */
> +			ctx->seen |= SEEN_XREG;
> +			PPC_XOR(r_A, r_A, r_X);
> +			break;
> +		case BPF_S_ALU_XOR_K: /* A ^= K */
> +			if (IMM_L(K))
> +				PPC_XORI(r_A, r_A, IMM_L(K));
> +			if (K >= 65536)
> +				PPC_XORIS(r_A, r_A, IMM_H(K));
> +			break;
> 		case BPF_S_ALU_LSH_X: /* A <<= X; */
> 			ctx->seen |= SEEN_XREG;
> 			PPC_SLW(r_A, r_A, r_X);

^ permalink raw reply

* Re: [PATCH net-next] gro: Handle inline VLAN tags
From: Ben Hutchings @ 2012-11-17  0:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, linux-net-drivers, Andrew Gallatin,
	Herbert Xu
In-Reply-To: <1353106865.10798.61.camel@edumazet-glaptop>

On Fri, 2012-11-16 at 15:01 -0800, Eric Dumazet wrote:
> On Fri, 2012-11-16 at 20:17 +0000, Ben Hutchings wrote:
> > The receive paths for skbs with inline and out-of-line VLAN tags (VLAN
> > RX accleration) were made largely consistent in 2.6.37, with tags
> > pulled out by software as necessary.  However GRO doesn't do this, so
> > it is not effective for VLAN-tagged packets received on devices
> > without VLAN RX acceleration.
> > 
> > napi_gro_frags() must not free the skb and does not advance the
> > skb->data pointer, so cannot use vlan_untag().  Extract the core of
> > vlan_untag() into a new function __vlan_untag() that allows the offset
> > to the VLAN tag to be specified and returns an error code.  Add
> > kernel-doc comments for both those functions.
> > 
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > ---
> > Tested with sfc using both napi_gro_receive() and napi_gro_frags().  On
> > a Core i7 920 (Nehalem) system it increased TCP/IPv4 receive throughput
> > over a VLAN from ~8.0 to ~9.3 Gbit/s.
> > 
> > Ben.
> > 
> 
> > index b4978e2..9d658eb 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3668,6 +3668,13 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
> >  
> >  gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
> >  {
> > +	if (unlikely(skb->protocol == htons(ETH_P_8021Q)) &&
> > +	    !vlan_tx_tag_present(skb)) {
> > +		skb = vlan_untag(skb);
> > +		if (unlikely(!skb))
> > +			return GRO_DROP;
> > +	}
> > +
> >  	skb_gro_reset_offset(skb);
> 
> I am not very convinced.
> 
> So for some drivers _not_ doing vlan acceleration, we are slowing down
> GRO.

It's a single comparison, hinted as 'unlikely'.

> I mean, driver authors should know if they need to call a helper before
> calling napi_gro_receive()
> 
> To date, only two drivers would need this, and it was discovered very
> recently.

It's not only two drivers.  qlcnic fakes RX VLAN acceleration precisely
to work around this limitation.  netxen, niu and Calxeda's xgmac might
also benefit (it's not clear whether the hardware does checksum
validation for VLAN-tagged packets).

> Also at GRO point, we totally own the skb and the vlan decap cannot
> possibly fail (and free the skb)
> 
> A packet sniffer should see all skbs delivered to napi_gro_receive()

I'm not sure what you mean by this.  Is your point that the
copy-on-write is never needed?  It is still possible for pskb_may_pull()
to fail.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v7 2/6] net: mvneta: driver for Marvell Armada 370/XP network unit
From: Ben Hutchings @ 2012-11-16 23:04 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: David S. Miller, Francois Romieu, Lennert Buytenhek, netdev,
	linux-arm-kernel, Jason Cooper, Andrew Lunn, Gregory Clement,
	Lior Amsalem, Dmitri Epshtein
In-Reply-To: <1352905010-24172-3-git-send-email-thomas.petazzoni@free-electrons.com>

On Wed, 2012-11-14 at 15:56 +0100, Thomas Petazzoni wrote:
[...]
> --- /dev/null
> +++ b/drivers/net/ethernet/marvell/mvneta.c
[...]
> +/* Set interrupt coalescing for ethtools */
> +static int mvneta_ethtool_set_coalesce(struct net_device *dev,
> +				       struct ethtool_coalesce *c)
> +{
> +	struct mvneta_port *pp = netdev_priv(dev);
> +	int queue;
> +
> +	for (queue = 0; queue < rxq_number; queue++) {
> +		struct mvneta_rx_queue *rxq = &pp->rxqs[queue];
> +		rxq->time_coal = c->rx_coalesce_usecs;
> +		rxq->pkts_coal = c->rx_max_coalesced_frames;
> +		mvneta_rx_pkts_coal_set(pp, rxq, rxq->pkts_coal);
> +		mvneta_rx_time_coal_set(pp, rxq, rxq->time_coal);
> +	}

Please check that c->rx_coalesce_usecs || c->rx_max_coalesced_frames
(see the comments in <linux/ethtool.h>).  Also please check that the
other fields, up to and including use_adaptive_tx_coalesce, are all set
to 0.

> +	for (queue = 0; queue < txq_number; queue++) {
> +		struct mvneta_tx_queue *txq = &pp->txqs[queue];
> +		txq->done_pkts_coal = c->tx_max_coalesced_frames;
> +		mvneta_tx_done_pkts_coal_set(pp, txq, txq->done_pkts_coal);
> +	}
> +
> +	return 0;
> +}
[...]
> +static int mvneta_ethtool_set_ringparam(struct net_device *dev,
> +					struct ethtool_ringparam *ring)
> +{
> +	struct mvneta_port *pp = netdev_priv(dev);
> +
> +	if ((ring->rx_pending == 0) || (ring->tx_pending == 0))
> +		return -EINVAL;

Please check that the other fields are all set to 0.

> +	pp->rx_ring_size = ring->rx_pending < MVNETA_MAX_RXD ?
> +		ring->rx_pending : MVNETA_MAX_RXD;
> +	pp->tx_ring_size = ring->tx_pending < MVNETA_MAX_TXD ?
> +		ring->tx_pending : MVNETA_MAX_TXD;
> +
> +	if (netif_running(dev)) {
> +		mvneta_stop(dev);
> +		if (mvneta_open(dev)) {
> +			netdev_err(dev,
> +				   "error on opening device after ring param change\n");
> +			return -ENOMEM;
> +		}

This is nasty because the stack will still considers the interface to be
up.  Ideally you would hold onto the old rings and revert to using them
in case of failure.  If that's not possible then use dev_close() and
dev_open() so that the stack knows the interface didn't come up again.

Ben.

> +	}
> +
> +	return 0;
> +}
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] gro: Handle inline VLAN tags
From: Eric Dumazet @ 2012-11-16 23:01 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, netdev, linux-net-drivers, Andrew Gallatin,
	Herbert Xu
In-Reply-To: <1353097030.2743.28.camel@bwh-desktop.uk.solarflarecom.com>

On Fri, 2012-11-16 at 20:17 +0000, Ben Hutchings wrote:
> The receive paths for skbs with inline and out-of-line VLAN tags (VLAN
> RX accleration) were made largely consistent in 2.6.37, with tags
> pulled out by software as necessary.  However GRO doesn't do this, so
> it is not effective for VLAN-tagged packets received on devices
> without VLAN RX acceleration.
> 
> napi_gro_frags() must not free the skb and does not advance the
> skb->data pointer, so cannot use vlan_untag().  Extract the core of
> vlan_untag() into a new function __vlan_untag() that allows the offset
> to the VLAN tag to be specified and returns an error code.  Add
> kernel-doc comments for both those functions.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Tested with sfc using both napi_gro_receive() and napi_gro_frags().  On
> a Core i7 920 (Nehalem) system it increased TCP/IPv4 receive throughput
> over a VLAN from ~8.0 to ~9.3 Gbit/s.
> 
> Ben.
> 

> index b4978e2..9d658eb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3668,6 +3668,13 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
>  
>  gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
>  {
> +	if (unlikely(skb->protocol == htons(ETH_P_8021Q)) &&
> +	    !vlan_tx_tag_present(skb)) {
> +		skb = vlan_untag(skb);
> +		if (unlikely(!skb))
> +			return GRO_DROP;
> +	}
> +
>  	skb_gro_reset_offset(skb);

I am not very convinced.

So for some drivers _not_ doing vlan acceleration, we are slowing down
GRO.

I mean, driver authors should know if they need to call a helper before
calling napi_gro_receive()

To date, only two drivers would need this, and it was discovered very
recently.

Also at GRO point, we totally own the skb and the vlan decap cannot
possibly fail (and free the skb)

A packet sniffer should see all skbs delivered to napi_gro_receive()

^ permalink raw reply

* [PATCH net-next 3/3] net: Remove bogus dependencies on INET
From: Ben Hutchings @ 2012-11-16 22:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <1353105635.2743.49.camel@bwh-desktop.uk.solarflarecom.com>

Various drivers depend on INET because they used to select INET_LRO,
but they have all been converted to use GRO which has no such
dependency.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
Compile-tested with INET disabled.

Ben.

 drivers/net/ethernet/chelsio/Kconfig      |    4 ++--
 drivers/net/ethernet/cisco/Kconfig        |    2 +-
 drivers/net/ethernet/cisco/enic/Kconfig   |    2 +-
 drivers/net/ethernet/emulex/Kconfig       |    2 +-
 drivers/net/ethernet/emulex/benet/Kconfig |    2 +-
 drivers/net/ethernet/intel/Kconfig        |    2 +-
 drivers/net/ethernet/mellanox/Kconfig     |    2 +-
 drivers/net/ethernet/neterion/Kconfig     |    2 +-
 drivers/net/ethernet/sfc/Kconfig          |    2 +-
 9 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/Kconfig b/drivers/net/ethernet/chelsio/Kconfig
index 2de50f9..a71c0f3 100644
--- a/drivers/net/ethernet/chelsio/Kconfig
+++ b/drivers/net/ethernet/chelsio/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_CHELSIO
 	bool "Chelsio devices"
 	default y
-	depends on PCI || INET
+	depends on PCI
 	---help---
 	  If you have a network (Ethernet) card belonging to this class, say Y
 	  and read the Ethernet-HOWTO, available from
@@ -48,7 +48,7 @@ config CHELSIO_T1_1G
 
 config CHELSIO_T3
 	tristate "Chelsio Communications T3 10Gb Ethernet support"
-	depends on PCI && INET
+	depends on PCI
 	select FW_LOADER
 	select MDIO
 	---help---
diff --git a/drivers/net/ethernet/cisco/Kconfig b/drivers/net/ethernet/cisco/Kconfig
index 94606f7..1c7b884 100644
--- a/drivers/net/ethernet/cisco/Kconfig
+++ b/drivers/net/ethernet/cisco/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_CISCO
 	bool "Cisco devices"
 	default y
-	depends on PCI && INET
+	depends on PCI
 	---help---
 	  If you have a network (Ethernet) card belonging to this class, say Y
 	  and read the Ethernet-HOWTO, available from
diff --git a/drivers/net/ethernet/cisco/enic/Kconfig b/drivers/net/ethernet/cisco/enic/Kconfig
index 9cc706a..b63f8d8 100644
--- a/drivers/net/ethernet/cisco/enic/Kconfig
+++ b/drivers/net/ethernet/cisco/enic/Kconfig
@@ -4,6 +4,6 @@
 
 config ENIC
 	tristate "Cisco VIC Ethernet NIC Support"
-	depends on PCI && INET
+	depends on PCI
 	---help---
 	  This enables the support for the Cisco VIC Ethernet card.
diff --git a/drivers/net/ethernet/emulex/Kconfig b/drivers/net/ethernet/emulex/Kconfig
index 7a28a64..1b8d638 100644
--- a/drivers/net/ethernet/emulex/Kconfig
+++ b/drivers/net/ethernet/emulex/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_EMULEX
 	bool "Emulex devices"
 	default y
-	depends on PCI && INET
+	depends on PCI
 	---help---
 	  If you have a network (Ethernet) card belonging to this class, say Y
 	  and read the Ethernet-HOWTO, available from
diff --git a/drivers/net/ethernet/emulex/benet/Kconfig b/drivers/net/ethernet/emulex/benet/Kconfig
index 804db04..231129d 100644
--- a/drivers/net/ethernet/emulex/benet/Kconfig
+++ b/drivers/net/ethernet/emulex/benet/Kconfig
@@ -1,6 +1,6 @@
 config BE2NET
 	tristate "ServerEngines' 10Gbps NIC - BladeEngine"
-	depends on PCI && INET
+	depends on PCI
 	---help---
 	  This driver implements the NIC functionality for ServerEngines'
 	  10Gbps network adapter - BladeEngine.
diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 73d28d5..ddee406 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -166,7 +166,7 @@ config IXGB
 
 config IXGBE
 	tristate "Intel(R) 10GbE PCI Express adapters support"
-	depends on PCI && INET
+	depends on PCI
 	select MDIO
 	select PTP_1588_CLOCK
 	---help---
diff --git a/drivers/net/ethernet/mellanox/Kconfig b/drivers/net/ethernet/mellanox/Kconfig
index d8099a7..bcdbc14 100644
--- a/drivers/net/ethernet/mellanox/Kconfig
+++ b/drivers/net/ethernet/mellanox/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_MELLANOX
 	bool "Mellanox devices"
 	default y
-	depends on PCI && INET
+	depends on PCI
 	---help---
 	  If you have a network (Ethernet) card belonging to this class, say Y
 	  and read the Ethernet-HOWTO, available from
diff --git a/drivers/net/ethernet/neterion/Kconfig b/drivers/net/ethernet/neterion/Kconfig
index ff26b54..87abb4f 100644
--- a/drivers/net/ethernet/neterion/Kconfig
+++ b/drivers/net/ethernet/neterion/Kconfig
@@ -32,7 +32,7 @@ config S2IO
 
 config VXGE
 	tristate "Exar X3100 Series 10GbE PCIe Server Adapter"
-	depends on PCI && INET
+	depends on PCI
 	---help---
 	  This driver supports Exar Corp's X3100 Series 10 GbE PCIe
 	  I/O Virtualized Server Adapter.
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index fb3cbc2..059e5e9 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -1,6 +1,6 @@
 config SFC
 	tristate "Solarflare SFC4000/SFC9000-family support"
-	depends on PCI && INET
+	depends on PCI
 	select MDIO
 	select CRC32
 	select I2C
-- 
1.7.7.6


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 2/3] ehea: Remove remnants of LRO support
From: Ben Hutchings @ 2012-11-16 22:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet, Thadeu Lima de Souza Cascardo
In-Reply-To: <1353105635.2743.49.camel@bwh-desktop.uk.solarflarecom.com>

Commit 2cb1deb56f5bf413da83491e0cb5a0474393c8ef ('ehea: Remove LRO
support') left behind the Kconfig depends/select and feature flag.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
This is not even compile-tested, as I don't have a suitable
cross-compiler to hand.

Ben.

 drivers/net/ethernet/ibm/Kconfig          |    5 ++---
 drivers/net/ethernet/ibm/ehea/ehea_main.c |    2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/Kconfig b/drivers/net/ethernet/ibm/Kconfig
index b9773d2..6529d31 100644
--- a/drivers/net/ethernet/ibm/Kconfig
+++ b/drivers/net/ethernet/ibm/Kconfig
@@ -6,7 +6,7 @@ config NET_VENDOR_IBM
 	bool "IBM devices"
 	default y
 	depends on MCA || PPC_PSERIES || PPC_PSERIES || PPC_DCR || \
-		   (IBMEBUS && INET && SPARSEMEM)
+		   (IBMEBUS && SPARSEMEM)
 	---help---
 	  If you have a network (Ethernet) card belonging to this class, say Y
 	  and read the Ethernet-HOWTO, available from
@@ -33,8 +33,7 @@ source "drivers/net/ethernet/ibm/emac/Kconfig"
 
 config EHEA
 	tristate "eHEA Ethernet support"
-	depends on IBMEBUS && INET && SPARSEMEM
-	select INET_LRO
+	depends on IBMEBUS && SPARSEMEM
 	---help---
 	  This driver supports the IBM pSeries eHEA ethernet adapter.
 
diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index f4d2da0..09faf33 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -3028,7 +3028,7 @@ static struct ehea_port *ehea_setup_single_port(struct ehea_adapter *adapter,
 	ehea_set_ethtool_ops(dev);
 
 	dev->hw_features = NETIF_F_SG | NETIF_F_TSO
-		      | NETIF_F_IP_CSUM | NETIF_F_HW_VLAN_TX | NETIF_F_LRO;
+		      | NETIF_F_IP_CSUM | NETIF_F_HW_VLAN_TX;
 	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO
 		      | NETIF_F_HIGHDMA | NETIF_F_IP_CSUM | NETIF_F_HW_VLAN_TX
 		      | NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_FILTER
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 1/3] mlx4_en: Remove remnants of LRO support
From: Ben Hutchings @ 2012-11-16 22:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet, Or Gerlitz, Roland Dreier
In-Reply-To: <1353105635.2743.49.camel@bwh-desktop.uk.solarflarecom.com>

Commit fa37a9586f92051de03a13e55e5ec3880bb6783e ('mlx4_en: Moving to
work with GRO') left behind the Kconfig depends/select, some dead
code and comments referring to LRO.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
The check for GRO eligibility in mlx4_en_process_rx_cq() should also be
redundant, but I didn't touch it.

Ben.

 drivers/net/ethernet/mellanox/mlx4/Kconfig   |    3 +--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |    9 +++------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |   17 -----------------
 3 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/Kconfig b/drivers/net/ethernet/mellanox/mlx4/Kconfig
index 5f027f9..eb520ab 100644
--- a/drivers/net/ethernet/mellanox/mlx4/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx4/Kconfig
@@ -4,9 +4,8 @@
 
 config MLX4_EN
 	tristate "Mellanox Technologies 10Gbit Ethernet support"
-	depends on PCI && INET
+	depends on PCI
 	select MLX4_CORE
-	select INET_LRO
 	---help---
 	  This driver supports Mellanox Technologies ConnectX Ethernet
 	  devices.
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 5aba5ec..f76c967 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -630,7 +630,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
 			    (cqe->checksum == cpu_to_be16(0xffff))) {
 				ring->csum_ok++;
-				/* This packet is eligible for LRO if it is:
+				/* This packet is eligible for GRO if it is:
 				 * - DIX Ethernet (type interpretation)
 				 * - TCP/IP (v4)
 				 * - without IP options
@@ -667,7 +667,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 					goto next;
 				}
 
-				/* LRO not possible, complete processing here */
+				/* GRO not possible, complete processing here */
 				ip_summed = CHECKSUM_UNNECESSARY;
 			} else {
 				ip_summed = CHECKSUM_NONE;
@@ -710,11 +710,8 @@ next:
 		++cq->mcq.cons_index;
 		index = (cq->mcq.cons_index) & ring->size_mask;
 		cqe = &cq->buf[index];
-		if (++polled == budget) {
-			/* We are here because we reached the NAPI budget -
-			 * flush only pending LRO sessions */
+		if (++polled == budget)
 			goto out;
-		}
 	}
 
 out:
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 9d27e42..1809a8b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -95,8 +95,6 @@
 #define MLX4_EN_ALLOC_SIZE	PAGE_ALIGN(16384)
 #define MLX4_EN_ALLOC_ORDER	get_order(MLX4_EN_ALLOC_SIZE)
 
-#define MLX4_EN_MAX_LRO_DESCRIPTORS	32
-
 /* Receive fragment sizes; we use at most 4 fragments (for 9600 byte MTU
  * and 4K allocations) */
 enum {
@@ -290,21 +288,6 @@ struct mlx4_en_rx_ring {
 	unsigned long csum_none;
 };
 
-
-static inline int mlx4_en_can_lro(__be16 status)
-{
-	return (status & cpu_to_be16(MLX4_CQE_STATUS_IPV4	|
-				     MLX4_CQE_STATUS_IPV4F	|
-				     MLX4_CQE_STATUS_IPV6	|
-				     MLX4_CQE_STATUS_IPV4OPT	|
-				     MLX4_CQE_STATUS_TCP	|
-				     MLX4_CQE_STATUS_UDP	|
-				     MLX4_CQE_STATUS_IPOK)) ==
-		cpu_to_be16(MLX4_CQE_STATUS_IPV4 |
-			    MLX4_CQE_STATUS_IPOK |
-			    MLX4_CQE_STATUS_TCP);
-}
-
 struct mlx4_en_cq {
 	struct mlx4_cq          mcq;
 	struct mlx4_hwq_resources wqres;
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 0/3] Remove inet_lro remnants
From: Ben Hutchings @ 2012-11-16 22:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet

Eric's comments on LRO/GRO conversion prompted me to check what was
still using inet_lro.  Not very many drivers, but there are a few
unnecessary references outside the code.

Ben.

Ben Hutchings (3):
  mlx4_en: Remove remnants of LRO support
  ehea: Remove remnants of LRO support
  net: Remove bogus dependencies on INET

 drivers/net/ethernet/chelsio/Kconfig         |    4 ++--
 drivers/net/ethernet/cisco/Kconfig           |    2 +-
 drivers/net/ethernet/cisco/enic/Kconfig      |    2 +-
 drivers/net/ethernet/emulex/Kconfig          |    2 +-
 drivers/net/ethernet/emulex/benet/Kconfig    |    2 +-
 drivers/net/ethernet/ibm/Kconfig             |    5 ++---
 drivers/net/ethernet/ibm/ehea/ehea_main.c    |    2 +-
 drivers/net/ethernet/intel/Kconfig           |    2 +-
 drivers/net/ethernet/mellanox/Kconfig        |    2 +-
 drivers/net/ethernet/mellanox/mlx4/Kconfig   |    3 +--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |    9 +++------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |   17 -----------------
 drivers/net/ethernet/neterion/Kconfig        |    2 +-
 drivers/net/ethernet/sfc/Kconfig             |    2 +-
 14 files changed, 17 insertions(+), 39 deletions(-)

-- 
1.7.7.6


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2012-11-16 22:07 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) tx_filtered/ps_tx_buf queues need to be accessed with the SKB queue
   lock, from Arik Nemtsov.

2) Don't call 802.11 driver's filter configure method until it's
   actually open, from Felix Fietkau.

3) Use ieee80211_free_txskb otherwise we leak control information.
   From Johannes Berg.

4) Fix memory leak in bluetooth UUID removal,f rom Johan Hedberg.

5) The shift mask trick doesn't work properly when 'optname' is
   out of range in do_ip_setsockopt().  Use a straightforward
   switch statement instead, the compiler emits essentially the
   same code but without the missing range check.  From Xi Wang.

6) Fix when we call tcp_replace_ts_recent() otherwise we can
   erroneously accept a too-high tsval.  From Eric Dumazet.

7) VXLAN bug fixes, mostly to do with VLAN header length handling,
   from Alexander Duyck.

8) Missing return value initialization for IPV6_MINHOPCOUNT socket
   option handling.  From Hannes Frederic.

9) Fix regression in tasklet handling in jme/ksz884x/xilinx drivers,
   from Xiaotian Feng.

10) At smsc911x driver init time, we don't know if the chip is in word
    swap mode or not.  However we do need to wait for the control
    register's ready bit to be set before we program any other part of
    the chip.  Adjust the wait loop to account for this.  From
    Kamlakant Patel.

11) Revert erroneous MDIO bus unregister change to mdio-bitbang.c

12) Fix memory leak in /proc/net/sctp/, from Tommi Rantala.

13) tilegx driver registers IRQ with NULL name, oops, from Simon
    Marchi.

14) TCP metrics hash table kzalloc() based allocation can fail, back
    down to using vmalloc() if it does.  From Eric Dumazet.

15) Fix packet steering out-of-order delivery regression, from Tom
    Herbert.

Please pull, thanks a lot!

The following changes since commit 77b67063bb6bce6d475e910d3b886a606d0d91f7:

  Linux 3.7-rc5 (2012-11-11 13:44:33 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net master

for you to fetch changes up to a8203d3ccd7796675739aaa50f708a047601eed2:

  Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless (2012-11-16 14:37:18 -0500)

----------------------------------------------------------------

Alexander Duyck (2):
      vxlan: Fix error that was resulting in VXLAN MTU size being 10 bytes too large
      vxlan: Update hard_header_len based on lowerdev when instantiating VXLAN

Andrew Vagin (1):
      tcp: fix retransmission in repair mode

Antonio Quartulli (4):
      batman-adv: fix tt_global_entries flags update
      batman-adv: correctly pass the client flag on tt_response
      batman-adv: don't add TEMP clients belonging to other backbone nodes
      batman-adv: process broadcast packets in BLA earlier

Arik Nemtsov (1):
      mac80211: sync acccess to tx_filtered/ps_tx_buf queues

Bjørn Mork (1):
      net: cdc_ncm: add Huawei devices

David S. Miller (3):
      Revert "drivers/net/phy/mdio-bitbang.c: Call mdiobus_unregister before mdiobus_free"
      Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
      Merge branch 'for-davem' of git://git.kernel.org/.../linville/wireless

David Spinadel (1):
      mac80211: init sched_scan_ies

Eric Dumazet (2):
      tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()
      tcp: handle tcp_net_metrics_init() order-5 memory allocation failures

Felix Fietkau (2):
      mac80211: do not call ieee80211_configure_filter if no interfaces are up
      mac80211: call skb_dequeue/ieee80211_free_txskb instead of __skb_queue_purge

Hannes Frederic Sowa (1):
      ipv6: setsockopt(IPIPPROTO_IPV6, IPV6_MINHOPCOUNT) forgot to set return value

Hauke Mehrtens (1):
      brcmfmac: fix typo in CONFIG_BRCMISCAN

Jiri Pirko (1):
      net: correct check in dev_addr_del()

Johan Hedberg (2):
      Bluetooth: Fix having bogus entries in mgmt_read_index_list reply
      Bluetooth: Fix memory leak when removing a UUID

Johannes Berg (5):
      iwlwifi: handle DMA mapping failures
      iwlwifi: use ieee80211_free_txskb
      mac80211: fix memory leak in device registration error path
      mac80211: don't send null data packet when not associated
      wireless: allow 40 MHz on world roaming channels 12/13

John W. Linville (4):
      Merge branch 'for-john' of git://git.kernel.org/.../jberg/mac80211
      Merge branch 'for-john' of git://git.kernel.org/.../iwlwifi/iwlwifi-fixes
      Merge branch 'master' of git://git.kernel.org/.../bluetooth/bluetooth
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

Kamlakant Patel (1):
      net/smsc911x: Fix ready check in cases where WORD_SWAP is needed

Kirill Smelkov (1):
      doc/net: Fix typo in netdev-features.txt

Marcel Holtmann (1):
      Bluetooth: Notify about device registration before power on

Marcos Chaparro (1):
      Bluetooth: ath3k: Add support for VAIO VPCEH [0489:e027]

Paulo Sérgio (1):
      Bluetooth: Fix error status when pairing fails

Rami Rosen (1):
      vxlan: fix a typo.

Saurabh Mohan (1):
      ipv4/ip_vti.c: VTI fix post-decryption forwarding

Simon Marchi (1):
      tilegx: request_irq with a non-null device name

Stefan Raspl (1):
      qeth: Fix IPA_CMD_QIPASSIST return code handling

Steve Glendinning (1):
      smsc95xx: set MII_BUSY bit to read/write PHY regs

Tom Herbert (1):
      net-rps: Fix brokeness causing OOO packets

Tommi Rantala (1):
      sctp: fix /proc/net/sctp/ memory leak

Ursula Braun (1):
      qeth: set new mac even if old mac is gone

Xi Wang (1):
      ipv4: avoid undefined behavior in do_ip_setsockopt()

Xiaotian Feng (1):
      drivers/net: fix tasklet misuse issue

 Documentation/networking/netdev-features.txt          |  2 +-
 drivers/bluetooth/ath3k.c                             |  1 +
 drivers/bluetooth/btusb.c                             |  1 +
 drivers/net/ethernet/jme.c                            | 28 ++++++++--------------------
 drivers/net/ethernet/micrel/ksz884x.c                 | 16 ++++------------
 drivers/net/ethernet/smsc/smsc911x.c                  | 17 +++++++++++++++--
 drivers/net/ethernet/tile/tilegx.c                    |  2 +-
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c     | 12 ++++++------
 drivers/net/phy/mdio-bitbang.c                        |  1 -
 drivers/net/usb/cdc_ncm.c                             | 22 ++++++++++++++++++----
 drivers/net/usb/smsc95xx.c                            |  4 ++--
 drivers/net/vxlan.c                                   | 10 +++++++---
 drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c |  2 +-
 drivers/net/wireless/iwlwifi/dvm/mac80211.c           |  2 +-
 drivers/net/wireless/iwlwifi/dvm/main.c               |  2 +-
 drivers/net/wireless/iwlwifi/pcie/rx.c                | 23 +++++++++++++++++++++--
 drivers/s390/net/qeth_core_main.c                     | 24 ++++++++++++++++++++++--
 drivers/s390/net/qeth_l2_main.c                       | 13 ++++++++-----
 net/batman-adv/soft-interface.c                       | 12 ++++++------
 net/batman-adv/translation-table.c                    | 15 ++++++++++++++-
 net/bluetooth/hci_core.c                              |  4 ++--
 net/bluetooth/mgmt.c                                  | 12 +++++++-----
 net/bluetooth/smp.c                                   |  2 +-
 net/core/dev.c                                        |  4 +++-
 net/core/dev_addr_lists.c                             |  3 ++-
 net/ipv4/ip_sockglue.c                                | 35 ++++++++++++++++++++++-------------
 net/ipv4/ip_vti.c                                     |  5 +++++
 net/ipv4/tcp.c                                        |  4 ++--
 net/ipv4/tcp_input.c                                  | 15 ++++++++++-----
 net/ipv4/tcp_metrics.c                                | 12 +++++++++---
 net/ipv4/tcp_output.c                                 |  4 ++++
 net/ipv6/ipv6_sockglue.c                              |  1 +
 net/mac80211/cfg.c                                    |  3 +++
 net/mac80211/ieee80211_i.h                            |  2 ++
 net/mac80211/main.c                                   |  6 ++++--
 net/mac80211/scan.c                                   |  2 +-
 net/mac80211/sta_info.c                               | 11 ++++++++---
 net/mac80211/status.c                                 |  9 +++++++++
 net/mac80211/tx.c                                     |  9 ++++++---
 net/mac80211/util.c                                   |  2 ++
 net/sctp/proc.c                                       |  8 ++++----
 net/wireless/reg.c                                    |  5 ++---
 42 files changed, 247 insertions(+), 120 deletions(-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox