* Re: [PATCHv5 net-next] vxlan: virtual extensible lan
From: David Miller @ 2012-09-27 23:12 UTC (permalink / raw)
To: shemminger; +Cc: jesse, chrisw, netdev
In-Reply-To: <20120927160054.105f4711@nehalam.linuxnetplumber.net>
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 27 Sep 2012 16:00:54 -0700
> On Thu, 27 Sep 2012 18:47:40 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
>
>> But I wonder if this matters at all, the administrator controls
>> the contents of this table, rather than external entitites.
>
> The table includes values learned from packets received. Like a bridge,
> a malicious attacker who can forge MAC sourc addresses can overload one
> chain by swamping the table with bogus values. Probably needs a table limit.
Ok.
^ permalink raw reply
* Re: gre: Support GRE over IPv6
From: David Miller @ 2012-09-27 23:07 UTC (permalink / raw)
To: eric.dumazet; +Cc: dan.carpenter, xeb, netdev
In-Reply-To: <1348659560.5093.337.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 26 Sep 2012 13:39:20 +0200
> From: Eric Dumazet <edumazet@google.com>
...
> [PATCH net-next] ipv6: gre: remove ip6gre_header_parse()
>
> dev_parse_header() callers provide 8 bytes of storage,
> so it's not possible to store an IPv6 address.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: [PATCHv5 net-next] vxlan: virtual extensible lan
From: Stephen Hemminger @ 2012-09-27 23:00 UTC (permalink / raw)
To: David Miller; +Cc: jesse, chrisw, netdev
In-Reply-To: <20120927.184740.975766966558033959.davem@davemloft.net>
On Thu, 27 Sep 2012 18:47:40 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Tue, 25 Sep 2012 15:09:57 -0700
>
> > +/* Hash Ethernet address */
> > +static u32 eth_hash(const unsigned char *addr)
> > +{
> > + /* could be optimized for unaligned access */
> > + u32 a = addr[5] << 8 | addr[4];
> > + u32 b = addr[3] << 24 | addr[2] << 16 | addr[1] << 8 | addr[0];
> > +
> > + return jhash_2words(a, b, vxlan_salt);
> > +}
> > +
> > +/* Hash chain to use given mac address */
> > +static inline struct hlist_head *vxlan_fdb_head(struct vxlan_dev *vxlan,
> > + const u8 *mac)
> > +{
> > + return &vxlan->fdb_head[hash_32(eth_hash(mac), FDB_HASH_BITS)];
> > +}
>
> This hashing is way overkill, and in fact is a mis-use of Jenkins.
>
> Jenkins can work fine with power-of-two hash sizes, so using hash_32()
> on the Jenkins hash result is nothing short of silly.
>
> I would go one step further and change this to:
>
> (a ^ b) * vxlan_salt
Ok, but doubt that it makes much difference.
>
> much like how we hash IP addresses in the ipv4 ARP tables. That
> function is a universal hash, and therefore provides whatever kind
> of protection you're trying to obtain using the salt.
>
> But I wonder if this matters at all, the administrator controls
> the contents of this table, rather than external entitites.
The table includes values learned from packets received. Like a bridge,
a malicious attacker who can forge MAC sourc addresses can overload one
chain by swamping the table with bogus values. Probably needs a table limit.
^ permalink raw reply
* Re: [PATCHv5 net-next] vxlan: virtual extensible lan
From: David Miller @ 2012-09-27 22:47 UTC (permalink / raw)
To: shemminger; +Cc: jesse, chrisw, netdev
In-Reply-To: <20120925150957.78591205@nehalam.linuxnetplumber.net>
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 25 Sep 2012 15:09:57 -0700
> +/* Hash Ethernet address */
> +static u32 eth_hash(const unsigned char *addr)
> +{
> + /* could be optimized for unaligned access */
> + u32 a = addr[5] << 8 | addr[4];
> + u32 b = addr[3] << 24 | addr[2] << 16 | addr[1] << 8 | addr[0];
> +
> + return jhash_2words(a, b, vxlan_salt);
> +}
> +
> +/* Hash chain to use given mac address */
> +static inline struct hlist_head *vxlan_fdb_head(struct vxlan_dev *vxlan,
> + const u8 *mac)
> +{
> + return &vxlan->fdb_head[hash_32(eth_hash(mac), FDB_HASH_BITS)];
> +}
This hashing is way overkill, and in fact is a mis-use of Jenkins.
Jenkins can work fine with power-of-two hash sizes, so using hash_32()
on the Jenkins hash result is nothing short of silly.
I would go one step further and change this to:
(a ^ b) * vxlan_salt
much like how we hash IP addresses in the ipv4 ARP tables. That
function is a universal hash, and therefore provides whatever kind
of protection you're trying to obtain using the salt.
But I wonder if this matters at all, the administrator controls
the contents of this table, rather than external entitites.
^ permalink raw reply
* [RFC v4 1/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-09-27 22:42 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, linux-pci, Dave Jiang, Nicholas Bellinger
A PCI-Express non-transparent bridge (NTB) is a point-to-point PCIe bus
connecting 2 systems, providing electrical isolation between the two subsystems.
A non-transparent bridge is functionally similar to a transparent bridge except
that both sides of the bridge have their own independent address domains. The
host on one side of the bridge will not have the visibility of the complete
memory or I/O space on the other side of the bridge. To communicate across the
non-transparent bridge, each NTB endpoint has one (or more) apertures exposed to
the local system. Writes to these apertures are mirrored to memory on the
remote system. Communications can also occur through the use of doorbell
registers that initiate interrupts to the alternate domain, and scratch-pad
registers accessible from both sides.
The NTB device driver is needed to configure these memory windows, doorbell, and
scratch-pad registers as well as use them in such a way as they can be turned
into a viable communication channel to the remote system. ntb_hw.[ch]
determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
the underlying hardware to provide access and a common interface to the doorbell
registers, scratch pads, and memory windows. These hardware interfaces are
exported so that other, non-mainlined kernel drivers can access these.
ntb_transport.[ch] also uses the exported interfaces in ntb_hw.[ch] to setup a
communication channel(s) and provide a reliable way of transferring data from
one side to the other, which it then exports so that "client" drivers can access
them. These client drivers are used to provide a standard kernel interface
(i.e., Ethernet device) to NTB, such that Linux can transfer data from one
system to the other in a standard way.
Signed-off-by: Jon Mason <jon.mason@intel.com>
---
MAINTAINERS | 6 +
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/ntb/Kconfig | 13 +
drivers/ntb/Makefile | 3 +
drivers/ntb/ntb_hw.c | 1167 +++++++++++++++++++++++++++++++++++
drivers/ntb/ntb_hw.h | 195 ++++++
drivers/ntb/ntb_regs.h | 139 +++++
drivers/ntb/ntb_transport.c | 1401 +++++++++++++++++++++++++++++++++++++++++++
include/linux/ntb.h | 81 +++
10 files changed, 3008 insertions(+)
create mode 100644 drivers/ntb/Kconfig
create mode 100644 drivers/ntb/Makefile
create mode 100644 drivers/ntb/ntb_hw.c
create mode 100644 drivers/ntb/ntb_hw.h
create mode 100644 drivers/ntb/ntb_regs.h
create mode 100644 drivers/ntb/ntb_transport.c
create mode 100644 include/linux/ntb.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 53cc13c..a634314 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4849,6 +4849,12 @@ S: Maintained
F: Documentation/scsi/NinjaSCSI.txt
F: drivers/scsi/nsp32*
+NTB DRIVER
+M: Jon Mason <jon.mason@intel.com>
+S: Supported
+F: drivers/ntb/
+F: include/linux/ntb.h
+
NTFS FILESYSTEM
M: Anton Altaparmakov <anton@tuxera.com>
L: linux-ntfs-dev@lists.sourceforge.net
diff --git a/drivers/Kconfig b/drivers/Kconfig
index ece958d..dccaa52 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -152,4 +152,6 @@ source "drivers/vme/Kconfig"
source "drivers/pwm/Kconfig"
+source "drivers/ntb/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 5b42184..dfa5978 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -139,3 +139,4 @@ obj-$(CONFIG_EXTCON) += extcon/
obj-$(CONFIG_MEMORY) += memory/
obj-$(CONFIG_IIO) += iio/
obj-$(CONFIG_VME_BUS) += vme/
+obj-$(CONFIG_NTB) += ntb/
diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
new file mode 100644
index 0000000..f69df793
--- /dev/null
+++ b/drivers/ntb/Kconfig
@@ -0,0 +1,13 @@
+config NTB
+ tristate "Intel Non-Transparent Bridge support"
+ depends on PCI
+ depends on X86
+ help
+ The PCI-E Non-transparent bridge hardware is a point-to-point PCI-E bus
+ connecting 2 systems. When configured, writes to the device's PCI
+ mapped memory will be mirrored to a buffer on the remote system. The
+ ntb Linux driver uses this point-to-point communication as a method to
+ transfer data from one system to the other.
+
+ If unsure, say N.
+
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
new file mode 100644
index 0000000..15cb59f
--- /dev/null
+++ b/drivers/ntb/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_NTB) += ntb.o
+
+ntb-objs := ntb_hw.o ntb_transport.o
diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
new file mode 100644
index 0000000..d6671f9
--- /dev/null
+++ b/drivers/ntb/ntb_hw.c
@@ -0,0 +1,1167 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copy
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/debugfs.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include "ntb_hw.h"
+#include "ntb_regs.h"
+
+#define NTB_NAME "Intel(R) PCI-E Non-Transparent Bridge Driver"
+#define NTB_VER "0.22"
+
+MODULE_DESCRIPTION(NTB_NAME);
+MODULE_VERSION(NTB_VER);
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Intel Corporation");
+
+enum {
+ NTB_CONN_CLASSIC = 0,
+ NTB_CONN_B2B,
+ NTB_CONN_RP,
+};
+
+enum {
+ NTB_DEV_USD = 0,
+ NTB_DEV_DSD,
+};
+
+enum {
+ SNB_HW = 0,
+ BWD_HW,
+};
+
+/* Translate memory window 0,1 to BAR 2,4 */
+#define MW_TO_BAR(mw) (mw * 2 + 2)
+
+static DEFINE_PCI_DEVICE_TABLE(ntb_pci_tbl) = {
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_BWD)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_JSF)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_JSF)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_SNB)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_SNB)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB)},
+ {0}
+};
+MODULE_DEVICE_TABLE(pci, ntb_pci_tbl);
+
+/**
+ * ntb_register_event_callback() - register event callback
+ * @ndev: pointer to ntb_device instance
+ * @func: callback function to register
+ *
+ * This function registers a callback for any HW driver events such as link
+ * up/down, power management notices and etc.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_register_event_callback(struct ntb_device *ndev,
+ void (*func)(void *handle, unsigned int event))
+{
+ if (ndev->event_cb)
+ return -EINVAL;
+
+ ndev->event_cb = func;
+
+ return 0;
+}
+
+/**
+ * ntb_unregister_event_callback() - unregisters the event callback
+ * @ndev: pointer to ntb_device instance
+ *
+ * This function unregisters the existing callback from transport
+ */
+void ntb_unregister_event_callback(struct ntb_device *ndev)
+{
+ ndev->event_cb = NULL;
+}
+
+/**
+ * ntb_register_db_callback() - register a callback for doorbell interrupt
+ * @ndev: pointer to ntb_device instance
+ * @idx: doorbell index to register callback, zero based
+ * @func: callback function to register
+ *
+ * This function registers a callback function for the doorbell interrupt
+ * on the primary side. The function will unmask the doorbell as well to
+ * allow interrupt.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_register_db_callback(struct ntb_device *ndev, unsigned int idx,
+ void *data, void (*func)(void *data, int db_num))
+{
+ unsigned long mask;
+
+ if (idx >= ndev->max_cbs || ndev->db_cb[idx].callback) {
+ dev_warn(&ndev->pdev->dev, "Invalid Index.\n");
+ return -EINVAL;
+ }
+
+ ndev->db_cb[idx].callback = func;
+ ndev->db_cb[idx].data = data;
+
+ /* unmask interrupt */
+ mask = readw(ndev->reg_ofs.pdb_mask);
+ clear_bit(idx * ndev->bits_per_vector, &mask);
+ writew(mask, ndev->reg_ofs.pdb_mask);
+
+ return 0;
+}
+
+/**
+ * ntb_unregister_db_callback() - unregister a callback for doorbell interrupt
+ * @ndev: pointer to ntb_device instance
+ * @idx: doorbell index to register callback, zero based
+ *
+ * This function unregisters a callback function for the doorbell interrupt
+ * on the primary side. The function will also mask the said doorbell.
+ */
+void ntb_unregister_db_callback(struct ntb_device *ndev, unsigned int idx)
+{
+ unsigned long mask;
+
+ if (idx >= ndev->max_cbs || !ndev->db_cb[idx].callback)
+ return;
+
+ mask = readw(ndev->reg_ofs.pdb_mask);
+ set_bit(idx * ndev->bits_per_vector, &mask);
+ writew(mask, ndev->reg_ofs.pdb_mask);
+
+ ndev->db_cb[idx].callback = NULL;
+}
+
+/**
+ * ntb_find_transport() - find the transport pointer
+ * @transport: pointer to pci device
+ *
+ * Given the pci device pointer, return the transport pointer passed in when
+ * the transport attached when it was inited.
+ *
+ * RETURNS: pointer to transport.
+ */
+void *ntb_find_transport(struct pci_dev *pdev)
+{
+ struct ntb_device *ndev = pci_get_drvdata(pdev);
+ return ndev->ntb_transport;
+}
+
+/**
+ * ntb_register_transport() - Register NTB transport with NTB HW driver
+ * @transport: transport identifier
+ *
+ * This function allows a transport to reserve the hardware driver for
+ * NTB usage.
+ *
+ * RETURNS: pointer to ntb_device, NULL on error.
+ */
+struct ntb_device *ntb_register_transport(struct pci_dev *pdev, void *transport)
+{
+ struct ntb_device *ndev = pci_get_drvdata(pdev);
+
+ if (ndev->ntb_transport)
+ return NULL;
+
+ ndev->ntb_transport = transport;
+ return ndev;
+}
+
+/**
+ * ntb_unregister_transport() - Unregister the transport with the NTB HW driver
+ * @ndev - ntb_device of the transport to be freed
+ *
+ * This function unregisters the transport from the HW driver and performs any
+ * necessary cleanups.
+ */
+void ntb_unregister_transport(struct ntb_device *ndev)
+{
+ int i;
+
+ if (!ndev->ntb_transport)
+ return;
+
+ for (i = 0; i < ndev->max_cbs; i++)
+ ntb_unregister_db_callback(ndev, i);
+
+ ntb_unregister_event_callback(ndev);
+ ndev->ntb_transport = NULL;
+}
+
+/**
+ * ntb_get_max_spads() - get the total scratch regs usable
+ * @ndev: pointer to ntb_device instance
+ *
+ * This function returns the max 32bit scratchpad registers usable by the
+ * upper layer.
+ *
+ * RETURNS: total number of scratch pad registers available
+ */
+int ntb_get_max_spads(struct ntb_device *ndev)
+{
+ return ndev->limits.max_spads;
+}
+
+/**
+ * ntb_write_local_spad() - write to the secondary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to the scratchpad register, 0 based
+ * @val: the data value to put into the register
+ *
+ * This function allows writing of a 32bit value to the indexed scratchpad
+ * register. The register resides on the secondary (external) side.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val)
+{
+ if (idx >= ndev->limits.max_spads)
+ return -EINVAL;
+
+ dev_dbg(&ndev->pdev->dev, "Writing %x to local scratch pad index %d\n",
+ val, idx);
+ writel(val, ndev->reg_ofs.spad_read + idx * 4);
+
+ return 0;
+}
+
+/**
+ * ntb_read_local_spad() - read from the primary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to scratchpad register, 0 based
+ * @val: pointer to 32bit integer for storing the register value
+ *
+ * This function allows reading of the 32bit scratchpad register on
+ * the primary (internal) side.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
+{
+ if (idx >= ndev->limits.max_spads)
+ return -EINVAL;
+
+ *val = readl(ndev->reg_ofs.spad_write + idx * 4);
+ dev_dbg(&ndev->pdev->dev,
+ "Reading %x from local scratch pad index %d\n", *val, idx);
+
+ return 0;
+}
+
+/**
+ * ntb_write_remote_spad() - write to the secondary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to the scratchpad register, 0 based
+ * @val: the data value to put into the register
+ *
+ * This function allows writing of a 32bit value to the indexed scratchpad
+ * register. The register resides on the secondary (external) side.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val)
+{
+ if (idx >= ndev->limits.max_spads)
+ return -EINVAL;
+
+ dev_dbg(&ndev->pdev->dev, "Writing %x to remote scratch pad index %d\n",
+ val, idx);
+ writel(val, ndev->reg_ofs.spad_write + idx * 4);
+
+ return 0;
+}
+
+/**
+ * ntb_read_remote_spad() - read from the primary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to scratchpad register, 0 based
+ * @val: pointer to 32bit integer for storing the register value
+ *
+ * This function allows reading of the 32bit scratchpad register on
+ * the primary (internal) side.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
+{
+ if (idx >= ndev->limits.max_spads)
+ return -EINVAL;
+
+ *val = readl(ndev->reg_ofs.spad_read + idx * 4);
+ dev_dbg(&ndev->pdev->dev,
+ "Reading %x from remote scratch pad index %d\n", *val, idx);
+
+ return 0;
+}
+
+/**
+ * ntb_get_mw_vbase() - get virtual addr for the NTB memory window
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ *
+ * This function provides the base virtual address of the memory window
+ * specified.
+ *
+ * RETURNS: pointer to virtual address, or NULL on error.
+ */
+void *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw)
+{
+ if (mw > NTB_NUM_MW)
+ return NULL;
+
+ return ndev->mw[mw].vbase;
+}
+
+/**
+ * ntb_get_mw_size() - return size of NTB memory window
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ *
+ * This function provides the physical size of the memory window specified
+ *
+ * RETURNS: the size of the memory window or zero on error
+ */
+resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw)
+{
+ if (mw > NTB_NUM_MW)
+ return 0;
+
+ return ndev->mw[mw].bar_sz;
+}
+
+/**
+ * ntb_set_mw_addr - set the memory window address
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ * @addr: base address for data
+ *
+ * This function sets the base physical address of the memory window. This
+ * memory address is where data from the remote system will be transfered into
+ * or out of depending on how the transport is configured.
+ */
+void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr)
+{
+ if (mw > NTB_NUM_MW)
+ return;
+
+ dev_dbg(&ndev->pdev->dev, "Writing addr %Lx to BAR %d\n", addr,
+ MW_TO_BAR(mw));
+
+ ndev->mw[mw].phys_addr = addr;
+
+ switch (MW_TO_BAR(mw)) {
+ case NTB_BAR_23:
+ writeq(addr, ndev->reg_ofs.sbar2_xlat);
+ break;
+ case NTB_BAR_45:
+ writeq(addr, ndev->reg_ofs.sbar4_xlat);
+ break;
+ }
+}
+
+/**
+ * ntb_ring_sdb() - Set the doorbell on the secondary/external side
+ * @ndev: pointer to ntb_device instance
+ * @db: doorbell to ring
+ *
+ * This function allows triggering of a doorbell on the secondary/external
+ * side that will initiate an interrupt on the remote host
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+void ntb_ring_sdb(struct ntb_device *ndev, unsigned int db)
+{
+ dev_dbg(&ndev->pdev->dev, "%s: ringing doorbell %d\n", __func__, db);
+
+ if (ndev->hw_type == BWD_HW)
+ writeq((u64) 1 << db, ndev->reg_ofs.sdb);
+ else
+ writew(((1 << ndev->bits_per_vector) - 1) <<
+ (db * ndev->bits_per_vector), ndev->reg_ofs.sdb);
+}
+
+static void ntb_link_event(struct ntb_device *ndev, int link_state)
+{
+ unsigned int event;
+
+ if (ndev->link_status == link_state)
+ return;
+
+ if (link_state == NTB_LINK_UP) {
+ u16 status;
+
+ dev_info(&ndev->pdev->dev, "Link Up\n");
+ ndev->link_status = NTB_LINK_UP;
+ event = NTB_EVENT_HW_LINK_UP;
+
+ if (ndev->hw_type == BWD_HW)
+ status = readw(ndev->reg_ofs.lnk_stat);
+ else {
+ int rc = pci_read_config_word(ndev->pdev,
+ SNB_LINK_STATUS_OFFSET,
+ &status);
+ if (rc)
+ return;
+ }
+ dev_info(&ndev->pdev->dev, "Link Width %d, Link Speed %d\n",
+ (status & NTB_LINK_WIDTH_MASK) >> 4,
+ (status & NTB_LINK_SPEED_MASK));
+ } else {
+ dev_info(&ndev->pdev->dev, "Link Down\n");
+ ndev->link_status = NTB_LINK_DOWN;
+ event = NTB_EVENT_HW_LINK_DOWN;
+ }
+
+ /* notify the upper layer if we have an event change */
+ if (ndev->event_cb)
+ ndev->event_cb(ndev->ntb_transport, event);
+}
+
+static int ntb_link_status(struct ntb_device *ndev)
+{
+ int link_state;
+
+ if (ndev->hw_type == BWD_HW) {
+ u32 ntb_cntl;
+
+ ntb_cntl = readl(ndev->reg_ofs.lnk_cntl);
+ if (ntb_cntl & BWD_CNTL_LINK_DOWN)
+ link_state = NTB_LINK_DOWN;
+ else
+ link_state = NTB_LINK_UP;
+ } else {
+ u16 status;
+ int rc;
+
+ rc = pci_read_config_word(ndev->pdev, SNB_LINK_STATUS_OFFSET,
+ &status);
+ if (rc)
+ return rc;
+
+ if (status & NTB_LINK_STATUS_ACTIVE)
+ link_state = NTB_LINK_UP;
+ else
+ link_state = NTB_LINK_DOWN;
+ }
+
+ ntb_link_event(ndev, link_state);
+
+ return 0;
+}
+
+/* BWD doesn't have link status interrupt, poll on that platform */
+static void bwd_link_poll(struct work_struct *work)
+{
+ struct ntb_device *ndev = container_of(work, struct ntb_device,
+ hb_timer.work);
+ unsigned long ts = jiffies;
+
+ /* If we haven't gotten an interrupt in a while, check the BWD link
+ * status bit
+ */
+ if (ts > ndev->last_ts + NTB_HB_TIMEOUT) {
+ int rc = ntb_link_status(ndev);
+ if (rc)
+ dev_err(&ndev->pdev->dev,
+ "Error determining link status\n");
+ }
+
+ schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);
+}
+
+static int ntb_xeon_setup(struct ntb_device *ndev)
+{
+ int rc;
+ u8 val;
+
+ ndev->hw_type = SNB_HW;
+
+ rc = pci_read_config_byte(ndev->pdev, NTB_PPD_OFFSET, &val);
+ if (rc)
+ return rc;
+
+ switch (val & SNB_PPD_CONN_TYPE) {
+ case NTB_CONN_B2B:
+ ndev->conn_type = NTB_CONN_B2B;
+ break;
+ case NTB_CONN_CLASSIC:
+ case NTB_CONN_RP:
+ default:
+ dev_err(&ndev->pdev->dev, "Only B2B supported at this time\n");
+ return -EINVAL;
+ }
+
+ if (val & SNB_PPD_DEV_TYPE)
+ ndev->dev_type = NTB_DEV_DSD;
+ else
+ ndev->dev_type = NTB_DEV_USD;
+
+ ndev->reg_ofs.pdb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
+ ndev->reg_ofs.pdb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
+ ndev->reg_ofs.sbar2_xlat = ndev->reg_base + SNB_SBAR2XLAT_OFFSET;
+ ndev->reg_ofs.sbar4_xlat = ndev->reg_base + SNB_SBAR4XLAT_OFFSET;
+ ndev->reg_ofs.lnk_cntl = ndev->reg_base + SNB_NTBCNTL_OFFSET;
+ ndev->reg_ofs.lnk_stat = ndev->reg_base + SNB_LINK_STATUS_OFFSET;
+ ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET;
+ ndev->reg_ofs.spci_cmd = ndev->reg_base + SNB_PCICMD_OFFSET;
+
+ if (ndev->conn_type == NTB_CONN_B2B) {
+ ndev->reg_ofs.sdb = ndev->reg_base + SNB_B2B_DOORBELL_OFFSET;
+ ndev->reg_ofs.spad_write = ndev->reg_base + SNB_B2B_SPAD_OFFSET;
+ ndev->limits.max_spads = SNB_MAX_SPADS;
+ } else {
+ ndev->reg_ofs.sdb = ndev->reg_base + SNB_SDOORBELL_OFFSET;
+ ndev->reg_ofs.spad_write = ndev->reg_base + SNB_SPAD_OFFSET;
+ ndev->limits.max_spads = SNB_MAX_COMPAT_SPADS;
+ }
+
+ ndev->limits.max_db_bits = SNB_MAX_DB_BITS;
+ ndev->limits.msix_cnt = SNB_MSIX_CNT;
+ ndev->bits_per_vector = SNB_DB_BITS_PER_VEC;
+
+ return 0;
+}
+
+static int ntb_bwd_setup(struct ntb_device *ndev)
+{
+ int rc;
+ u32 val;
+
+ ndev->hw_type = BWD_HW;
+
+ rc = pci_read_config_dword(ndev->pdev, NTB_PPD_OFFSET, &val);
+ if (rc)
+ return rc;
+
+ switch ((val & BWD_PPD_CONN_TYPE) >> 8) {
+ case NTB_CONN_B2B:
+ ndev->conn_type = NTB_CONN_B2B;
+ break;
+ case NTB_CONN_RP:
+ default:
+ dev_err(&ndev->pdev->dev, "Only B2B supported at this time\n");
+ return -EINVAL;
+ }
+
+ if (val & BWD_PPD_DEV_TYPE)
+ ndev->dev_type = NTB_DEV_DSD;
+ else
+ ndev->dev_type = NTB_DEV_USD;
+
+ /* Initiate PCI-E link training */
+ rc = pci_write_config_dword(ndev->pdev, NTB_PPD_OFFSET,
+ val | BWD_PPD_INIT_LINK);
+ if (rc)
+ return rc;
+
+ ndev->reg_ofs.pdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
+ ndev->reg_ofs.pdb_mask = ndev->reg_base + BWD_PDBMSK_OFFSET;
+ ndev->reg_ofs.sbar2_xlat = ndev->reg_base + BWD_SBAR2XLAT_OFFSET;
+ ndev->reg_ofs.sbar4_xlat = ndev->reg_base + BWD_SBAR4XLAT_OFFSET;
+ ndev->reg_ofs.lnk_cntl = ndev->reg_base + BWD_NTBCNTL_OFFSET;
+ ndev->reg_ofs.lnk_stat = ndev->reg_base + BWD_LINK_STATUS_OFFSET;
+ ndev->reg_ofs.spad_read = ndev->reg_base + BWD_SPAD_OFFSET;
+ ndev->reg_ofs.spci_cmd = ndev->reg_base + BWD_PCICMD_OFFSET;
+
+ if (ndev->conn_type == NTB_CONN_B2B) {
+ ndev->reg_ofs.sdb = ndev->reg_base + BWD_B2B_DOORBELL_OFFSET;
+ ndev->reg_ofs.spad_write = ndev->reg_base + BWD_B2B_SPAD_OFFSET;
+ ndev->limits.max_spads = BWD_MAX_SPADS;
+ } else {
+ ndev->reg_ofs.sdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
+ ndev->reg_ofs.spad_write = ndev->reg_base + BWD_SPAD_OFFSET;
+ ndev->limits.max_spads = BWD_MAX_COMPAT_SPADS;
+ }
+
+ ndev->limits.max_db_bits = BWD_MAX_DB_BITS;
+ ndev->limits.msix_cnt = BWD_MSIX_CNT;
+ ndev->bits_per_vector = BWD_DB_BITS_PER_VEC;
+
+ /* Since bwd doesn't have a link interrupt, setup a poll timer */
+ INIT_DELAYED_WORK(&ndev->hb_timer, bwd_link_poll);
+ schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);
+
+ return 0;
+}
+
+static int __devinit ntb_device_setup(struct ntb_device *ndev)
+{
+ int rc;
+
+ switch (ndev->pdev->device) {
+ case PCI_DEVICE_ID_INTEL_NTB_2ND_SNB:
+ case PCI_DEVICE_ID_INTEL_NTB_RP_JSF:
+ case PCI_DEVICE_ID_INTEL_NTB_RP_SNB:
+ case PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF:
+ case PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB:
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_JSF:
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_SNB:
+ rc = ntb_xeon_setup(ndev);
+ break;
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_BWD:
+ rc = ntb_bwd_setup(ndev);
+ break;
+ default:
+ rc = -ENODEV;
+ }
+
+ /* Enable Bus Master and Memory Space on the secondary side */
+ writew(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER, ndev->reg_ofs.spci_cmd);
+
+ return rc;
+}
+
+static void ntb_device_free(struct ntb_device *ndev)
+{
+ if (ndev->hw_type == BWD_HW)
+ cancel_delayed_work_sync(&ndev->hb_timer);
+}
+
+static irqreturn_t bwd_callback_msix_irq(int irq, void *data)
+{
+ struct ntb_db_cb *db_cb = data;
+ struct ntb_device *ndev = db_cb->ndev;
+
+ dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for DB %d\n", irq,
+ db_cb->db_num);
+
+ if (db_cb->callback)
+ db_cb->callback(db_cb->data, db_cb->db_num);
+
+ /* No need to check for the specific HB irq, any interrupt means
+ * we're connected.
+ */
+ ndev->last_ts = jiffies;
+
+ writeq((u64) 1 << db_cb->db_num, ndev->reg_ofs.pdb);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t xeon_callback_msix_irq(int irq, void *data)
+{
+ struct ntb_db_cb *db_cb = data;
+ struct ntb_device *ndev = db_cb->ndev;
+
+ dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for DB %d\n", irq,
+ db_cb->db_num);
+
+ if (db_cb->callback)
+ db_cb->callback(db_cb->data, db_cb->db_num);
+
+ /* On Sandybridge, there are 16 bits in the interrupt register
+ * but only 4 vectors. So, 5 bits are assigned to the first 3
+ * vectors, with the 4th having a single bit for link
+ * interrupts.
+ */
+ writew(((1 << ndev->bits_per_vector) - 1) <<
+ (db_cb->db_num * ndev->bits_per_vector), ndev->reg_ofs.pdb);
+
+ return IRQ_HANDLED;
+}
+
+/* Since we do not have a HW doorbell in BWD, this is only used in JF/JT */
+static irqreturn_t xeon_event_msix_irq(int irq, void *dev)
+{
+ struct ntb_device *ndev = dev;
+ int rc;
+
+ dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for Events\n", irq);
+
+ rc = ntb_link_status(ndev);
+ if (rc)
+ dev_err(&ndev->pdev->dev, "Error determining link status\n");
+
+ /* bit 15 is always the link bit */
+ writew(1 << ndev->limits.max_db_bits, ndev->reg_ofs.pdb);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t ntb_interrupt(int irq, void *dev)
+{
+ struct ntb_device *ndev = dev;
+ unsigned int i = 0;
+
+ if (ndev->hw_type == BWD_HW) {
+ u64 pdb = readq(ndev->reg_ofs.pdb);
+
+ dev_dbg(&ndev->pdev->dev, "irq %d - pdb = %Lx\n", irq, pdb);
+
+ while (pdb) {
+ i = __ffs(pdb);
+ pdb &= pdb - 1;
+ bwd_callback_msix_irq(irq, &ndev->db_cb[i]);
+ }
+ } else {
+ u16 pdb = readw(ndev->reg_ofs.pdb);
+
+ dev_dbg(&ndev->pdev->dev, "irq %d - pdb = %x sdb %x\n", irq,
+ pdb, readw(ndev->reg_ofs.sdb));
+
+ if (pdb & SNB_DB_HW_LINK) {
+ xeon_event_msix_irq(irq, dev);
+ pdb &= ~SNB_DB_HW_LINK;
+ }
+
+ while (pdb) {
+ i = __ffs(pdb);
+ pdb &= pdb - 1;
+ xeon_callback_msix_irq(irq, &ndev->db_cb[i]);
+ }
+ }
+
+ return IRQ_HANDLED;
+}
+
+static int ntb_setup_msix(struct ntb_device *ndev)
+{
+ struct pci_dev *pdev = ndev->pdev;
+ struct msix_entry *msix;
+ int msix_entries;
+ int rc, i, pos;
+ u16 val;
+
+ pos = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
+ if (!pos) {
+ rc = -EIO;
+ goto err;
+ }
+
+ rc = pci_read_config_word(pdev, pos + PCI_MSIX_FLAGS, &val);
+ if (rc)
+ goto err;
+
+ msix_entries = msix_table_size(val);
+ if (msix_entries > ndev->limits.msix_cnt) {
+ rc = -EINVAL;
+ goto err;
+ }
+
+ ndev->msix_entries = kmalloc(sizeof(struct msix_entry) * msix_entries,
+ GFP_KERNEL);
+ if (!ndev->msix_entries) {
+ rc = -ENOMEM;
+ goto err;
+ }
+
+ for (i = 0; i < msix_entries; i++)
+ ndev->msix_entries[i].entry = i;
+
+ rc = pci_enable_msix(pdev, ndev->msix_entries, msix_entries);
+ if (rc < 0)
+ goto err1;
+ if (rc > 0) {
+ /* On SNB, the link interrupt is always tied to 4th vector. If
+ * we can't get all 4, then we can't use MSI-X.
+ */
+ if (ndev->hw_type != BWD_HW) {
+ rc = -EIO;
+ goto err1;
+ }
+
+ dev_warn(&pdev->dev,
+ "Only %d MSI-X vectors. Limiting the number of queues to that number.\n",
+ rc);
+ msix_entries = rc;
+ }
+
+ for (i = 0; i < msix_entries; i++) {
+ msix = &ndev->msix_entries[i];
+ WARN_ON(!msix->vector);
+
+ /* Use the last MSI-X vector for Link status */
+ if (ndev->hw_type == BWD_HW) {
+ rc = request_irq(msix->vector, bwd_callback_msix_irq, 0,
+ "ntb-callback-msix", &ndev->db_cb[i]);
+ if (rc)
+ goto err2;
+ } else {
+ if (i == msix_entries - 1) {
+ rc = request_irq(msix->vector,
+ xeon_event_msix_irq, 0,
+ "ntb-event-msix", ndev);
+ if (rc)
+ goto err2;
+ } else {
+ rc = request_irq(msix->vector,
+ xeon_callback_msix_irq, 0,
+ "ntb-callback-msix",
+ &ndev->db_cb[i]);
+ if (rc)
+ goto err2;
+ }
+ }
+ }
+
+ ndev->num_msix = msix_entries;
+ if (ndev->hw_type == BWD_HW)
+ ndev->max_cbs = msix_entries;
+ else
+ ndev->max_cbs = msix_entries - 1;
+
+ return 0;
+
+err2:
+ while (--i >= 0) {
+ msix = &ndev->msix_entries[i];
+ if (ndev->hw_type != BWD_HW && i == ndev->num_msix - 1)
+ free_irq(msix->vector, ndev);
+ else
+ free_irq(msix->vector, &ndev->db_cb[i]);
+ }
+ pci_disable_msix(pdev);
+err1:
+ kfree(ndev->msix_entries);
+ dev_err(&pdev->dev, "Error allocating MSI-X interrupt\n");
+err:
+ ndev->num_msix = 0;
+ return rc;
+}
+
+static int ntb_setup_msi(struct ntb_device *ndev)
+{
+ struct pci_dev *pdev = ndev->pdev;
+ int rc;
+
+ rc = pci_enable_msi(pdev);
+ if (rc)
+ return rc;
+
+ rc = request_irq(pdev->irq, ntb_interrupt, 0, "ntb-msi", ndev);
+ if (rc) {
+ pci_disable_msi(pdev);
+ dev_err(&pdev->dev, "Error allocating MSI interrupt\n");
+ return rc;
+ }
+
+ return 0;
+}
+
+static int ntb_setup_intx(struct ntb_device *ndev)
+{
+ struct pci_dev *pdev = ndev->pdev;
+ int rc;
+
+ pci_msi_off(pdev);
+
+ /* Verify intx is enabled */
+ pci_intx(pdev, 1);
+
+ rc = request_irq(pdev->irq, ntb_interrupt, IRQF_SHARED, "ntb-intx",
+ ndev);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int __devinit ntb_setup_interrupts(struct ntb_device *ndev)
+{
+ int rc;
+
+ /* On BWD, disable all interrupts. On SNB, disable all but Link
+ * Interrupt. The rest will be unmasked as callbacks are registered.
+ */
+ if (ndev->hw_type == BWD_HW)
+ writeq(~0, ndev->reg_ofs.pdb_mask);
+ else
+ writew(~(1 << ndev->limits.max_db_bits),
+ ndev->reg_ofs.pdb_mask);
+
+ rc = ntb_setup_msix(ndev);
+ if (!rc)
+ goto done;
+
+ ndev->bits_per_vector = 1;
+ ndev->max_cbs = ndev->limits.max_db_bits;
+
+ rc = ntb_setup_msi(ndev);
+ if (!rc)
+ goto done;
+
+ rc = ntb_setup_intx(ndev);
+ if (rc) {
+ dev_err(&ndev->pdev->dev, "no usable interrupts\n");
+ return rc;
+ }
+
+done:
+ return 0;
+}
+
+static void __devexit ntb_free_interrupts(struct ntb_device *ndev)
+{
+ struct pci_dev *pdev = ndev->pdev;
+
+ /* mask interrupts */
+ if (ndev->hw_type == BWD_HW)
+ writeq(~0, ndev->reg_ofs.pdb_mask);
+ else
+ writew(~0, ndev->reg_ofs.pdb_mask);
+
+ if (ndev->num_msix) {
+ struct msix_entry *msix;
+ u32 i;
+
+ for (i = 0; i < ndev->num_msix; i++) {
+ msix = &ndev->msix_entries[i];
+ if (ndev->hw_type != BWD_HW && i == ndev->num_msix - 1)
+ free_irq(msix->vector, ndev);
+ else
+ free_irq(msix->vector, &ndev->db_cb[i]);
+ }
+ pci_disable_msix(pdev);
+ } else {
+ free_irq(pdev->irq, ndev);
+
+ if (pci_dev_msi_enabled(pdev))
+ pci_disable_msi(pdev);
+ }
+}
+
+static int __devinit ntb_create_callbacks(struct ntb_device *ndev)
+{
+ int i;
+
+ /* Checken-egg issue. We won't know how many callbacks are necessary
+ * until we see how many MSI-X vectors we get, but these pointers need
+ * to be passed into the MSI-X register fucntion. So, we allocate the
+ * max, knowing that they might not all be used, to work around this.
+ */
+ ndev->db_cb = kcalloc(ndev->limits.max_db_bits,
+ sizeof(struct ntb_db_cb),
+ GFP_KERNEL);
+ if (!ndev->db_cb)
+ return -ENOMEM;
+
+ for (i = 0; i < ndev->limits.max_db_bits; i++) {
+ ndev->db_cb[i].db_num = i;
+ ndev->db_cb[i].ndev = ndev;
+ }
+
+ return 0;
+}
+
+static void ntb_free_callbacks(struct ntb_device *ndev)
+{
+ int i;
+
+ for (i = 0; i < ndev->limits.max_db_bits; i++)
+ ntb_unregister_db_callback(ndev, i);
+
+ kfree(ndev->db_cb);
+}
+
+static int __devinit
+ntb_pci_probe(struct pci_dev *pdev,
+ __attribute__((unused)) const struct pci_device_id *id)
+{
+ struct ntb_device *ndev;
+ int rc, i;
+
+ ndev = kzalloc(sizeof(struct ntb_device), GFP_KERNEL);
+ if (!ndev)
+ return -ENOMEM;
+
+ ndev->pdev = pdev;
+ ndev->link_status = NTB_LINK_DOWN;
+ pci_set_drvdata(pdev, ndev);
+
+ rc = pci_enable_device(pdev);
+ if (rc)
+ goto err;
+
+ pci_set_master(ndev->pdev);
+
+ rc = pci_request_selected_regions(pdev, NTB_BAR_MASK, KBUILD_MODNAME);
+ if (rc)
+ goto err1;
+
+ ndev->reg_base = pci_ioremap_bar(pdev, NTB_BAR_MMIO);
+ if (!ndev->reg_base) {
+ dev_warn(&pdev->dev, "Cannot remap BAR 0\n");
+ rc = -EIO;
+ goto err2;
+ }
+
+ for (i = 0; i < NTB_NUM_MW; i++) {
+ ndev->mw[i].bar_sz = pci_resource_len(pdev, MW_TO_BAR(i));
+ ndev->mw[i].vbase =
+ ioremap_wc(pci_resource_start(pdev, MW_TO_BAR(i)),
+ ndev->mw[i].bar_sz);
+ dev_info(&pdev->dev, "MW %d size %d\n", i,
+ (u32) pci_resource_len(pdev, MW_TO_BAR(i)));
+ if (!ndev->mw[i].vbase) {
+ dev_warn(&pdev->dev, "Cannot remap BAR %d\n",
+ MW_TO_BAR(i));
+ rc = -EIO;
+ goto err3;
+ }
+ }
+
+ rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+ if (rc) {
+ rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+ if (rc)
+ goto err3;
+
+ dev_warn(&pdev->dev, "Cannot DMA highmem\n");
+ }
+
+ rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+ if (rc) {
+ rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+ if (rc)
+ goto err3;
+
+ dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
+ }
+
+ rc = ntb_device_setup(ndev);
+ if (rc)
+ goto err3;
+
+ rc = ntb_create_callbacks(ndev);
+ if (rc)
+ goto err4;
+
+ rc = ntb_setup_interrupts(ndev);
+ if (rc)
+ goto err5;
+
+ /* The scratchpad registers keep the values between rmmod/insmod,
+ * blast them now
+ */
+ for (i = 0; i < ndev->limits.max_spads; i++) {
+ ntb_write_local_spad(ndev, i, 0);
+ ntb_write_remote_spad(ndev, i, 0);
+ }
+
+ rc = ntb_transport_init(pdev);
+ if (rc)
+ goto err6;
+
+ /* Let's bring the NTB link up */
+ writel(NTB_CNTL_BAR23_SNOOP | NTB_CNTL_BAR45_SNOOP,
+ ndev->reg_ofs.lnk_cntl);
+
+ return 0;
+
+err6:
+ ntb_free_interrupts(ndev);
+err5:
+ ntb_free_callbacks(ndev);
+err4:
+ ntb_device_free(ndev);
+err3:
+ for (i--; i >= 0; i--)
+ iounmap(ndev->mw[i].vbase);
+ iounmap(ndev->reg_base);
+err2:
+ pci_release_selected_regions(pdev, NTB_BAR_MASK);
+err1:
+ pci_disable_device(pdev);
+err:
+ kfree(ndev);
+
+ dev_err(&pdev->dev, "Error loading %s module\n", KBUILD_MODNAME);
+ return rc;
+}
+
+static void __devexit ntb_pci_remove(struct pci_dev *pdev)
+{
+ struct ntb_device *ndev = pci_get_drvdata(pdev);
+ int i;
+ u32 ntb_cntl;
+
+ /* Bring NTB link down */
+ ntb_cntl = readl(ndev->reg_ofs.lnk_cntl);
+ ntb_cntl |= NTB_LINK_DISABLE;
+ writel(ntb_cntl, ndev->reg_ofs.lnk_cntl);
+
+ ntb_transport_free(ndev->ntb_transport);
+
+ ntb_free_interrupts(ndev);
+ ntb_free_callbacks(ndev);
+ ntb_device_free(ndev);
+
+ for (i = 0; i < NTB_NUM_MW; i++)
+ iounmap(ndev->mw[i].vbase);
+
+ iounmap(ndev->reg_base);
+ pci_release_selected_regions(pdev, NTB_BAR_MASK);
+ pci_disable_device(pdev);
+ kfree(ndev);
+}
+
+static struct pci_driver ntb_pci_driver = {
+ .name = KBUILD_MODNAME,
+ .id_table = ntb_pci_tbl,
+ .probe = ntb_pci_probe,
+ .remove = __devexit_p(ntb_pci_remove),
+};
+
+static int __init ntb_init_module(void)
+{
+ pr_info("%s: %s, version %s\n", KBUILD_MODNAME, NTB_NAME, NTB_VER);
+
+ return pci_register_driver(&ntb_pci_driver);
+}
+module_init(ntb_init_module);
+
+static void __exit ntb_exit_module(void)
+{
+ pci_unregister_driver(&ntb_pci_driver);
+
+ pr_info("%s: Driver removed\n", KBUILD_MODNAME);
+}
+module_exit(ntb_exit_module);
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
new file mode 100644
index 0000000..5b60b76
--- /dev/null
+++ b/drivers/ntb/ntb_hw.h
@@ -0,0 +1,195 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copy
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_JSF 0x3725
+#define PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF 0x3726
+#define PCI_DEVICE_ID_INTEL_NTB_RP_JSF 0x3727
+#define PCI_DEVICE_ID_INTEL_NTB_RP_SNB 0x3C08
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_SNB 0x3C0D
+#define PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB 0x3C0E
+#define PCI_DEVICE_ID_INTEL_NTB_2ND_SNB 0x3C0F
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_BWD 0x0C4E
+
+#define msix_table_size(control) ((control & PCI_MSIX_FLAGS_QSIZE)+1)
+
+#define NTB_BAR_MMIO 0
+#define NTB_BAR_23 2
+#define NTB_BAR_45 4
+#define NTB_BAR_MASK ((1 << NTB_BAR_MMIO) | (1 << NTB_BAR_23) |\
+ (1 << NTB_BAR_45))
+
+#define NTB_LINK_DOWN 0
+#define NTB_LINK_UP 1
+
+#define NTB_HB_TIMEOUT msecs_to_jiffies(1000)
+
+#define NTB_NUM_MW 2
+
+enum ntb_hw_event {
+ NTB_EVENT_SW_EVENT0 = (1 << 0),
+ NTB_EVENT_SW_EVENT1 = (1 << 1),
+ NTB_EVENT_SW_EVENT2 = (1 << 2),
+ NTB_EVENT_HW_ERROR = (1 << 3),
+ NTB_EVENT_HW_LINK_UP = (1 << 4),
+ NTB_EVENT_HW_LINK_DOWN = (1 << 5),
+};
+
+struct ntb_mw {
+ dma_addr_t phys_addr;
+ void __iomem *vbase;
+ resource_size_t bar_sz;
+};
+
+struct ntb_db_cb {
+ void (*callback) (void *data, int db_num);
+ unsigned int db_num;
+ void *data;
+ struct ntb_device *ndev;
+};
+
+struct ntb_device {
+ struct pci_dev *pdev;
+ struct msix_entry *msix_entries;
+ void __iomem *reg_base;
+ struct ntb_mw mw[NTB_NUM_MW];
+ struct {
+ unsigned int max_spads;
+ unsigned int max_db_bits;
+ unsigned int msix_cnt;
+ } limits;
+ struct {
+ void __iomem *pdb;
+ void __iomem *pdb_mask;
+ void __iomem *sdb;
+ void __iomem *sbar2_xlat;
+ void __iomem *sbar4_xlat;
+ void __iomem *spad_write;
+ void __iomem *spad_read;
+ void __iomem *lnk_cntl;
+ void __iomem *lnk_stat;
+ void __iomem *spci_cmd;
+ } reg_ofs;
+ void *ntb_transport;
+ void (*event_cb)(void *handle, enum ntb_hw_event event);
+
+ struct ntb_db_cb *db_cb;
+ unsigned char hw_type;
+ unsigned char conn_type;
+ unsigned char dev_type;
+ unsigned char num_msix;
+ unsigned char bits_per_vector;
+ unsigned char max_cbs;
+ unsigned char link_status;
+ struct delayed_work hb_timer;
+ unsigned long last_ts;
+};
+
+/**
+ * ntb_hw_link_status() - return the hardware link status
+ * @ndev: pointer to ntb_device instance
+ *
+ * Returns true if the hardware is connected to the remote system
+ *
+ * RETURNS: true or false based on the hardware link state
+ */
+static inline bool ntb_hw_link_status(struct ntb_device *ndev)
+{
+ return ndev->link_status == NTB_LINK_UP;
+}
+
+/**
+ * ntb_query_pdev() - return the pci_dev pointer
+ * @ndev: pointer to ntb_device instance
+ *
+ * Given the ntb pointer return the pci_dev pointerfor the NTB hardware device
+ *
+ * RETURNS: a pointer to the ntb pci_dev
+ */
+static inline struct pci_dev *ntb_query_pdev(struct ntb_device *ndev)
+{
+ return ndev->pdev;
+}
+
+/**
+ * ntb_query_max_cbs() - return the maximum number of callback tuples
+ * @ndev: pointer to ntb_device instance
+ *
+ * The number of callbacks can vary depending on the platform and MSI-X/MSI
+ * enablement
+ *
+ * RETURNS: the maximum number of callback tuples (3, 15, or 33)
+ */
+static inline unsigned int ntb_query_max_cbs(struct ntb_device *ndev)
+{
+ return ndev->max_cbs;
+}
+
+struct ntb_device *ntb_register_transport(struct pci_dev *pdev,
+ void *transport);
+void ntb_unregister_transport(struct ntb_device *ndev);
+void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr);
+int ntb_register_db_callback(struct ntb_device *ndev, unsigned int idx,
+ void *data, void (*db_cb_func) (void *data,
+ int db_num));
+void ntb_unregister_db_callback(struct ntb_device *ndev, unsigned int idx);
+int ntb_register_event_callback(struct ntb_device *ndev,
+ void (*event_cb_func) (void *handle,
+ unsigned int event));
+void ntb_unregister_event_callback(struct ntb_device *ndev);
+int ntb_get_max_spads(struct ntb_device *ndev);
+int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
+int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
+int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
+int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
+void *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
+resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
+void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
+void *ntb_find_transport(struct pci_dev *pdev);
+
+int ntb_transport_init(struct pci_dev *pdev);
+void ntb_transport_free(void *transport);
diff --git a/drivers/ntb/ntb_regs.h b/drivers/ntb/ntb_regs.h
new file mode 100644
index 0000000..0f97b03
--- /dev/null
+++ b/drivers/ntb/ntb_regs.h
@@ -0,0 +1,139 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copy
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+#define NTB_LINK_ENABLE 0x0000
+#define NTB_LINK_DISABLE 0x0002
+#define NTB_LINK_STATUS_ACTIVE 0x2000
+#define NTB_LINK_SPEED_MASK 0x000f
+#define NTB_LINK_WIDTH_MASK 0x03f0
+
+#define SNB_MSIX_CNT 4
+#define SNB_MAX_SPADS 16
+#define SNB_MAX_COMPAT_SPADS 8
+/* Reserve the uppermost bit for link interrupt */
+#define SNB_MAX_DB_BITS 15
+#define SNB_DB_BITS_PER_VEC 5
+
+#define SNB_DB_HW_LINK 0x8000
+
+#define SNB_PCICMD_OFFSET 0x0504
+#define SNB_DEVCTRL_OFFSET 0x0598
+#define SNB_LINK_STATUS_OFFSET 0x01A2
+
+#define SNB_PBAR2LMT_OFFSET 0x0000
+#define SNB_PBAR4LMT_OFFSET 0x0008
+#define SNB_PBAR2XLAT_OFFSET 0x0010
+#define SNB_PBAR4XLAT_OFFSET 0x0018
+#define SNB_SBAR2LMT_OFFSET 0x0020
+#define SNB_SBAR4LMT_OFFSET 0x0028
+#define SNB_SBAR2XLAT_OFFSET 0x0030
+#define SNB_SBAR4XLAT_OFFSET 0x0038
+#define SNB_SBAR0BASE_OFFSET 0x0040
+#define SNB_SBAR2BASE_OFFSET 0x0048
+#define SNB_SBAR4BASE_OFFSET 0x0050
+#define SNB_NTBCNTL_OFFSET 0x0058
+#define SNB_SBDF_OFFSET 0x005C
+#define SNB_PDOORBELL_OFFSET 0x0060
+#define SNB_PDBMSK_OFFSET 0x0062
+#define SNB_SDOORBELL_OFFSET 0x0064
+#define SNB_SDBMSK_OFFSET 0x0066
+#define SNB_USMEMMISS 0x0070
+#define SNB_SPAD_OFFSET 0x0080
+#define SNB_SPADSEMA4_OFFSET 0x00c0
+#define SNB_WCCNTRL_OFFSET 0x00e0
+#define SNB_B2B_SPAD_OFFSET 0x0100
+#define SNB_B2B_DOORBELL_OFFSET 0x0140
+#define SNB_B2B_XLAT_OFFSET 0x0144
+
+#define BWD_MSIX_CNT 34
+#define BWD_MAX_SPADS 16
+#define BWD_MAX_COMPAT_SPADS 16
+#define BWD_MAX_DB_BITS 34
+#define BWD_DB_BITS_PER_VEC 1
+
+#define BWD_PCICMD_OFFSET 0xb004
+#define BWD_MBAR23_OFFSET 0xb018
+#define BWD_MBAR45_OFFSET 0xb020
+#define BWD_DEVCTRL_OFFSET 0xb048
+#define BWD_LINK_STATUS_OFFSET 0xb052
+
+#define BWD_SBAR2XLAT_OFFSET 0x0008
+#define BWD_SBAR4XLAT_OFFSET 0x0010
+#define BWD_PDOORBELL_OFFSET 0x0020
+#define BWD_PDBMSK_OFFSET 0x0028
+#define BWD_NTBCNTL_OFFSET 0x0060
+#define BWD_EBDF_OFFSET 0x0064
+#define BWD_SPAD_OFFSET 0x0080
+#define BWD_SPADSEMA_OFFSET 0x00c0
+#define BWD_STKYSPAD_OFFSET 0x00c4
+#define BWD_PBAR2XLAT_OFFSET 0x8008
+#define BWD_PBAR4XLAT_OFFSET 0x8010
+#define BWD_B2B_DOORBELL_OFFSET 0x8020
+#define BWD_B2B_SPAD_OFFSET 0x8080
+#define BWD_B2B_SPADSEMA_OFFSET 0x80c0
+#define BWD_B2B_STKYSPAD_OFFSET 0x80c4
+
+#define NTB_CNTL_BAR23_SNOOP (1 << 2)
+#define NTB_CNTL_BAR45_SNOOP (1 << 6)
+#define BWD_CNTL_LINK_DOWN (1 << 16)
+
+#define NTB_PPD_OFFSET 0x00D4
+#define SNB_PPD_CONN_TYPE 0x0003
+#define SNB_PPD_DEV_TYPE 0x0010
+#define BWD_PPD_INIT_LINK 0x0004
+#define BWD_PPD_CONN_TYPE 0x0300
+#define BWD_PPD_DEV_TYPE 0x1000
+
+#define BWD_PBAR2XLAT_USD_ADDR 0x0000004000000000
+#define BWD_PBAR4XLAT_USD_ADDR 0x0000008000000000
+#define BWD_MBAR23_USD_ADDR 0x000000410000000C
+#define BWD_MBAR45_USD_ADDR 0x000000810000000C
+#define BWD_PBAR2XLAT_DSD_ADDR 0x0000004100000000
+#define BWD_PBAR4XLAT_DSD_ADDR 0x0000008100000000
+#define BWD_MBAR23_DSD_ADDR 0x000000400000000C
+#define BWD_MBAR45_DSD_ADDR 0x000000800000000C
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
new file mode 100644
index 0000000..118a852
--- /dev/null
+++ b/drivers/ntb/ntb_transport.c
@@ -0,0 +1,1401 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copy
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/errno.h>
+#include <linux/export.h>
+#include <linux/interrupt.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/ntb.h>
+#include "ntb_hw.h"
+
+#define NTB_TRANSPORT_VERSION 1
+
+static int transport_mtu = 0x401E;
+module_param(transport_mtu, uint, 0644);
+MODULE_PARM_DESC(transport_mtu, "Maximum size of NTB transport packets");
+
+static unsigned int max_num_conns = 2;
+module_param(max_num_conns, uint, 0644);
+MODULE_PARM_DESC(max_num_conns, "Maximum number of NTB transport connections");
+
+struct ntb_queue_entry {
+ /* ntb_queue list reference */
+ struct list_head entry;
+ /* pointers to data to be transfered */
+ void *callback_data;
+ void *buf;
+ unsigned int len;
+ unsigned int flags;
+};
+
+struct ntb_transport;
+
+struct ntb_transport_qp {
+ struct ntb_transport *transport;
+ struct ntb_device *ndev;
+ void *cb_data;
+
+ bool client_ready;
+ bool qp_link;
+ u8 qp_num; /* Only 64 QP's are allowed. 0-63 */
+
+ void (*tx_handler) (void *data, struct ntb_transport_qp *qp);
+ struct tasklet_struct tx_work;
+ struct list_head tx_comp_q;
+ struct list_head tx_free_q;
+ spinlock_t ntb_tx_comp_q_lock;
+ spinlock_t ntb_tx_free_q_lock;
+ void *tx_mw_begin;
+ void *tx_mw_end;
+ void *tx_offset;
+
+ void (*rx_handler) (void *data, struct ntb_transport_qp *qp);
+ struct tasklet_struct rx_work;
+ struct list_head rx_pend_q;
+ struct list_head rx_comp_q;
+ struct list_head rx_free_q;
+ spinlock_t ntb_rx_pend_q_lock;
+ spinlock_t ntb_rx_comp_q_lock;
+ spinlock_t ntb_rx_free_q_lock;
+ void *rx_buff_begin;
+ void *rx_buff_end;
+ void *rx_offset;
+
+ void (*event_handler) (void *data, int status);
+ struct delayed_work link_work;
+
+ struct dentry *debugfs_dir;
+ struct dentry *debugfs_stats;
+
+ /* Stats */
+ u64 rx_bytes;
+ u64 rx_pkts;
+ u64 rx_ring_empty;
+ u64 rx_err_no_buf;
+ u64 rx_err_oflow;
+ u64 rx_err_ver;
+ u64 tx_bytes;
+ u64 tx_pkts;
+ u64 tx_ring_full;
+};
+
+struct ntb_transport_mw {
+ size_t size;
+ void *virt_addr;
+ dma_addr_t dma_addr;
+};
+
+struct ntb_transport {
+ struct ntb_device *ndev;
+ struct ntb_transport_mw mw[NTB_NUM_MW];
+ struct ntb_transport_qp *qps;
+ unsigned int max_qps;
+ unsigned long qp_bitmap;
+ bool transport_link;
+ struct delayed_work link_work;
+ struct dentry *debugfs_dir;
+
+ struct device netdev;
+};
+
+enum {
+ DESC_DONE_FLAG = 1 << 0,
+ LINK_DOWN_FLAG = 1 << 1,
+ HW_ERROR_FLAG = 1 << 2,
+};
+
+struct ntb_payload_header {
+ u64 ver;
+ unsigned int len;
+ unsigned int flags;
+};
+
+enum {
+ VERSION = 0,
+ MW0_SZ,
+ MW1_SZ,
+ NUM_QPS,
+ QP_LINKS,
+ MAX_SPAD,
+};
+
+#define QP_TO_MW(qp) ((qp) % NTB_NUM_MW)
+#define NTB_QP_DEF_NUM_ENTRIES 100
+#define NTB_LINK_DOWN_TIMEOUT 10
+
+static int ntb_match_bus(struct device *dev, struct device_driver *drv)
+{
+ return !strncmp(dev_name(dev), drv->name, strlen(drv->name));
+}
+
+static int ntb_client_probe(struct device *dev)
+{
+ const struct ntb_client *drv = container_of(dev->driver,
+ struct ntb_client, driver);
+ struct pci_dev *pdev = container_of(dev->parent, struct pci_dev, dev);
+ int rc = -EINVAL;
+
+ get_device(dev);
+ if (drv && drv->probe)
+ rc = drv->probe(pdev);
+ if (rc)
+ put_device(dev);
+
+ return rc;
+}
+
+static int ntb_client_remove(struct device *dev)
+{
+ const struct ntb_client *drv = container_of(dev->driver,
+ struct ntb_client, driver);
+ struct pci_dev *pdev = container_of(dev->parent, struct pci_dev, dev);
+
+ if (drv && drv->remove)
+ drv->remove(pdev);
+
+ put_device(dev);
+
+ return 0;
+}
+
+static void ntb_client_release(struct device *dev)
+{
+}
+
+struct bus_type ntb_bus_type = {
+ .name = "ntb_bus",
+ .match = ntb_match_bus,
+ .probe = ntb_client_probe,
+ .remove = ntb_client_remove,
+};
+
+static int __devinit ntb_bus_init(struct ntb_transport *nt)
+{
+ struct device *dev = &nt->netdev;
+ int rc;
+
+ rc = bus_register(&ntb_bus_type);
+ if (rc)
+ goto err;
+
+ /* setup and register client devices */
+ dev_set_name(dev, "ntb_netdev");
+ dev->bus = &ntb_bus_type;
+ dev->release = ntb_client_release;
+ dev->parent = &ntb_query_pdev(nt->ndev)->dev;
+
+ rc = device_register(dev);
+ if (rc)
+ goto err1;
+
+ return 0;
+
+err1:
+ bus_unregister(&ntb_bus_type);
+err:
+ return rc;
+}
+
+static void __devexit ntb_bus_remove(struct ntb_transport *nt)
+{
+ device_unregister(&nt->netdev);
+
+ bus_unregister(&ntb_bus_type);
+}
+
+/**
+ * ntb_register_client - Register NTB client driver
+ * @drv: NTB client driver to be registered
+ *
+ * Register an NTB client driver with the NTB transport layer
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_register_client(struct ntb_client *drv)
+{
+ drv->driver.bus = &ntb_bus_type;
+
+ return driver_register(&drv->driver);
+}
+EXPORT_SYMBOL_GPL(ntb_register_client);
+
+/**
+ * ntb_unregister_client - Unregister NTB client driver
+ * @drv: NTB client driver to be unregistered
+ *
+ * Unregister an NTB client driver with the NTB transport layer
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+void ntb_unregister_client(struct ntb_client *drv)
+{
+ driver_unregister(&drv->driver);
+}
+EXPORT_SYMBOL_GPL(ntb_unregister_client);
+
+static int debugfs_open(struct inode *inode, struct file *filp)
+{
+ filp->private_data = inode->i_private;
+ return 0;
+}
+
+static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
+ loff_t *offp)
+{
+ struct ntb_transport_qp *qp;
+ char buf[512];
+ ssize_t ret, out_offset, out_count;
+
+ out_count = 512;
+
+ qp = filp->private_data;
+ out_offset = 0;
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "NTB Transport stats\n");
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_bytes - %llu\n", qp->rx_bytes);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_pkts - %llu\n", qp->rx_pkts);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_ring_empty - %llu\n", qp->rx_ring_empty);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_er_oflow - %llu\n", qp->rx_err_oflow);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_err_ver - %llu\n", qp->rx_err_ver);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_offset - %p\n", qp->rx_offset);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_bytes - %llu\n", qp->tx_bytes);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_pkts - %llu\n", qp->tx_pkts);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_ring_full - %llu\n", qp->tx_ring_full);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_offset - %p\n", qp->tx_offset);
+
+ ret = simple_read_from_buffer(ubuf, count, offp, buf, out_offset);
+ return ret;
+}
+
+static const struct file_operations ntb_qp_debugfs_stats = {
+ .owner = THIS_MODULE,
+ .open = debugfs_open,
+ .read = debugfs_read,
+};
+
+static void ntb_list_add(spinlock_t *lock, struct list_head *entry,
+ struct list_head *list)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(lock, flags);
+ list_add_tail(entry, list);
+ spin_unlock_irqrestore(lock, flags);
+}
+
+static struct ntb_queue_entry *ntb_list_rm(spinlock_t *lock,
+ struct list_head *list)
+{
+ struct ntb_queue_entry *entry;
+ unsigned long flags;
+
+ spin_lock_irqsave(lock, flags);
+ if (list_empty(list)) {
+ entry = NULL;
+ goto out;
+ }
+ entry = list_first_entry(list, struct ntb_queue_entry, entry);
+ list_del(&entry->entry);
+out:
+ spin_unlock_irqrestore(lock, flags);
+
+ return entry;
+}
+
+static int ntb_transport_setup_qp_mw(struct ntb_transport *nt,
+ unsigned int qp_num)
+{
+ struct ntb_transport_qp *qp = &nt->qps[qp_num];
+ u8 mw_num = QP_TO_MW(qp_num);
+ unsigned int size, num_qps_mw;
+
+ WARN_ON(nt->mw[mw_num].virt_addr == 0);
+
+ if (nt->max_qps % NTB_NUM_MW && !mw_num)
+ num_qps_mw = nt->max_qps / NTB_NUM_MW +
+ (nt->max_qps % NTB_NUM_MW - mw_num);
+ else
+ num_qps_mw = nt->max_qps / NTB_NUM_MW;
+
+ size = nt->mw[mw_num].size / num_qps_mw;
+ pr_debug("orig size = %d, num qps = %d, size = %d\n",
+ (int) nt->mw[mw_num].size, nt->max_qps, size);
+
+ qp->rx_buff_begin = nt->mw[mw_num].virt_addr +
+ (qp_num / NTB_NUM_MW * size);
+ qp->rx_buff_end = qp->rx_buff_begin + size;
+ pr_info("QP %d - RX Buff start %p end %p\n", qp->qp_num,
+ qp->rx_buff_begin, qp->rx_buff_end);
+ qp->rx_offset = qp->rx_buff_begin;
+
+ qp->tx_mw_begin = ntb_get_mw_vbase(nt->ndev, mw_num) +
+ (qp_num / NTB_NUM_MW * size);
+ qp->tx_mw_end = qp->tx_mw_begin + size;
+ pr_info("QP %d - TX MW start %p end %p\n", qp->qp_num, qp->tx_mw_begin,
+ qp->tx_mw_end);
+ qp->tx_offset = qp->tx_mw_begin;
+
+ qp->rx_pkts = 0;
+ qp->tx_pkts = 0;
+
+ return 0;
+}
+
+static int ntb_set_mw(struct ntb_transport *nt, int num_mw, unsigned int size)
+{
+ struct ntb_transport_mw *mw = &nt->mw[num_mw];
+ struct pci_dev *pdev = ntb_query_pdev(nt->ndev);
+ void *offset;
+
+ /* Alloc memory for receiving data. Must be 4k aligned */
+ mw->size = ALIGN(size, 4096);
+
+ mw->virt_addr = dma_alloc_coherent(&pdev->dev, mw->size, &mw->dma_addr,
+ GFP_KERNEL);
+ if (!mw->virt_addr) {
+ pr_err("Unable to allocate MW buffer of size %d\n",
+ (int) mw->size);
+ return -ENOMEM;
+ }
+
+ /* setup the hdr offsets with 0's */
+ for (offset = mw->virt_addr + transport_mtu -
+ sizeof(struct ntb_payload_header);
+ offset < mw->virt_addr + size; offset += transport_mtu)
+ memset(offset, 0, sizeof(struct ntb_payload_header));
+
+ /* Notify HW the memory location of the receive buffer */
+ ntb_set_mw_addr(nt->ndev, num_mw, mw->dma_addr);
+
+ return 0;
+}
+
+static void ntb_transport_event_callback(void *data, enum ntb_hw_event event)
+{
+ struct ntb_transport *nt = data;
+
+ if (event == NTB_EVENT_HW_ERROR)
+ BUG();
+
+ if (event == NTB_EVENT_HW_LINK_UP)
+ schedule_delayed_work(&nt->link_work, 0);
+
+ if (event == NTB_EVENT_HW_LINK_DOWN) {
+ int i;
+
+ nt->transport_link = NTB_LINK_DOWN;
+
+ /* Pass along the info to any clients */
+ for (i = 0; i < nt->max_qps; i++)
+ if (!test_bit(i, &nt->qp_bitmap)) {
+ struct ntb_transport_qp *qp = &nt->qps[i];
+
+ if (qp->event_handler &&
+ qp->qp_link != NTB_LINK_DOWN)
+ qp->event_handler(qp->cb_data,
+ NTB_LINK_DOWN);
+
+ qp->qp_link = NTB_LINK_DOWN;
+ }
+
+ /* The scratchpad registers keep the values if the remote side
+ * goes down, blast them now to give them a sane value the next
+ * time they are accessed
+ */
+ for (i = 0; i < MAX_SPAD; i++) {
+ ntb_write_local_spad(nt->ndev, i, 0);
+ ntb_write_remote_spad(nt->ndev, i, 0);
+ }
+ }
+}
+
+static void ntb_transport_link_work(struct work_struct *work)
+{
+ struct ntb_transport *nt = container_of(work, struct ntb_transport,
+ link_work.work);
+ struct ntb_device *ndev = nt->ndev;
+ u32 val;
+ int rc, i;
+
+ /* send the local info */
+ rc = ntb_write_remote_spad(ndev, VERSION, NTB_TRANSPORT_VERSION);
+ if (rc) {
+ pr_err("Error writing %x to remote spad %d\n",
+ 0, VERSION);
+ goto out;
+ }
+
+ rc = ntb_write_remote_spad(ndev, MW0_SZ, ntb_get_mw_size(ndev, 0));
+ if (rc) {
+ pr_err("Error writing %x to remote spad %d\n",
+ (u32) ntb_get_mw_size(ndev, 0), MW0_SZ);
+ goto out;
+ }
+
+ rc = ntb_write_remote_spad(ndev, MW1_SZ, ntb_get_mw_size(ndev, 1));
+ if (rc) {
+ pr_err("Error writing %x to remote spad %d\n",
+ (u32) ntb_get_mw_size(ndev, 1), MW1_SZ);
+ goto out;
+ }
+
+ rc = ntb_write_remote_spad(ndev, NUM_QPS, nt->max_qps);
+ if (rc) {
+ pr_err("Error writing %x to remote spad %d\n",
+ nt->max_qps, NUM_QPS);
+ goto out;
+ }
+
+ rc = ntb_write_remote_spad(ndev, QP_LINKS, 0);
+ if (rc) {
+ pr_err("Error writing %x to remote spad %d\n", 0, QP_LINKS);
+ goto out;
+ }
+
+ /* Query the remote side for its info */
+ rc = ntb_read_remote_spad(ndev, VERSION, &val);
+ if (rc) {
+ pr_err("Error reading remote spad %d\n", VERSION);
+ goto out;
+ }
+
+ if (val != NTB_TRANSPORT_VERSION)
+ goto out;
+ pr_info("Remote version = %d\n", val);
+
+ rc = ntb_read_remote_spad(ndev, NUM_QPS, &val);
+ if (rc) {
+ pr_err("Error reading remote spad %d\n", NUM_QPS);
+ goto out;
+ }
+
+ if (val != nt->max_qps)
+ goto out;
+ pr_info("Remote max number of qps = %d\n", val);
+
+ rc = ntb_read_remote_spad(ndev, MW0_SZ, &val);
+ if (rc) {
+ pr_err("Error reading remote spad %d\n", MW0_SZ);
+ goto out;
+ }
+
+ if (!val)
+ goto out;
+ pr_info("Remote MW0 size = %d\n", val);
+
+ rc = ntb_set_mw(nt, 0, val);
+ if (rc)
+ goto out;
+
+ rc = ntb_read_remote_spad(ndev, MW1_SZ, &val);
+ if (rc) {
+ pr_err("Error reading remote spad %d\n", MW1_SZ);
+ goto out;
+ }
+
+ if (!val)
+ goto out;
+ pr_info("Remote MW1 size = %d\n", val);
+
+ rc = ntb_set_mw(nt, 1, val);
+ if (rc)
+ goto out;
+
+ for (i = 0; i < nt->max_qps; i++) {
+ struct ntb_transport_qp *qp = &nt->qps[i];
+
+ rc = ntb_transport_setup_qp_mw(nt, i);
+ if (rc)
+ goto out;
+
+ if (qp->client_ready)
+ schedule_delayed_work(&qp->link_work, 0);
+ }
+
+ nt->transport_link = NTB_LINK_UP;
+
+ return;
+
+out:
+ if (ntb_hw_link_status(ndev))
+ schedule_delayed_work(&nt->link_work,
+ msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+}
+
+static void ntb_qp_link_work(struct work_struct *work)
+{
+ struct ntb_transport_qp *qp;
+ struct ntb_transport *nt;
+ int rc, val;
+
+ qp = container_of(work, struct ntb_transport_qp, link_work.work);
+ nt = qp->transport;
+
+ WARN_ON(nt->transport_link != NTB_LINK_UP);
+
+ rc = ntb_read_local_spad(nt->ndev, QP_LINKS, &val);
+ if (rc) {
+ pr_err("Error reading spad %d\n", QP_LINKS);
+ return;
+ }
+
+ rc = ntb_write_remote_spad(nt->ndev, QP_LINKS, val | 1 << qp->qp_num);
+ if (rc)
+ pr_err("Error writing %x to remote spad %d\n",
+ val | 1 << qp->qp_num, QP_LINKS);
+
+ /* query remote spad for qp ready bits */
+ rc = ntb_read_remote_spad(nt->ndev, QP_LINKS, &val);
+ if (rc)
+ pr_err("Error reading remote spad %d\n", QP_LINKS);
+
+ pr_debug("Remote QP link status = %x\n", val);
+
+ /* See if the remote side is up */
+ if (1 << qp->qp_num & val) {
+ qp->qp_link = NTB_LINK_UP;
+
+ if (qp->event_handler)
+ qp->event_handler(qp->cb_data, NTB_LINK_UP);
+ } else if (ntb_hw_link_status(nt->ndev))
+ schedule_delayed_work(&qp->link_work,
+ msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+}
+
+static void ntb_transport_init_queue(struct ntb_transport *nt,
+ unsigned int qp_num)
+{
+ struct ntb_transport_qp *qp;
+
+ qp = &nt->qps[qp_num];
+ qp->qp_num = qp_num;
+ qp->transport = nt;
+ qp->ndev = nt->ndev;
+ qp->qp_link = NTB_LINK_DOWN;
+
+ if (nt->debugfs_dir) {
+ char debugfs_name[4];
+
+ snprintf(debugfs_name, 4, "qp%d", qp_num);
+ qp->debugfs_dir = debugfs_create_dir(debugfs_name,
+ nt->debugfs_dir);
+
+ qp->debugfs_stats = debugfs_create_file("stats", S_IRUSR,
+ qp->debugfs_dir, qp,
+ &ntb_qp_debugfs_stats);
+ }
+
+ INIT_DELAYED_WORK(&qp->link_work, ntb_qp_link_work);
+
+ spin_lock_init(&qp->ntb_rx_comp_q_lock);
+ spin_lock_init(&qp->ntb_rx_pend_q_lock);
+ spin_lock_init(&qp->ntb_rx_free_q_lock);
+ spin_lock_init(&qp->ntb_tx_comp_q_lock);
+ spin_lock_init(&qp->ntb_tx_free_q_lock);
+
+ INIT_LIST_HEAD(&qp->rx_pend_q);
+ INIT_LIST_HEAD(&qp->rx_comp_q);
+ INIT_LIST_HEAD(&qp->rx_free_q);
+ INIT_LIST_HEAD(&qp->tx_comp_q);
+ INIT_LIST_HEAD(&qp->tx_free_q);
+}
+
+int ntb_transport_init(struct pci_dev *pdev)
+{
+ struct ntb_transport *nt;
+ int rc, i;
+
+ nt = kzalloc(sizeof(struct ntb_transport), GFP_KERNEL);
+ if (!nt)
+ return -ENOMEM;
+
+ if (debugfs_initialized())
+ nt->debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
+ else
+ nt->debugfs_dir = NULL;
+
+ nt->ndev = ntb_register_transport(pdev, nt);
+ if (!nt->ndev) {
+ rc = -EIO;
+ goto err;
+ }
+
+ nt->max_qps = ntb_query_max_cbs(nt->ndev);
+ if (!nt->max_qps) {
+ rc = -EIO;
+ goto err1;
+ }
+ nt->max_qps = min(nt->max_qps, max_num_conns);
+
+ nt->qps = kcalloc(nt->max_qps, sizeof(struct ntb_transport_qp),
+ GFP_KERNEL);
+ if (!nt->qps) {
+ rc = -ENOMEM;
+ goto err1;
+ }
+
+ nt->qp_bitmap = ((u64) 1 << nt->max_qps) - 1;
+
+ for (i = 0; i < nt->max_qps; i++)
+ ntb_transport_init_queue(nt, i);
+
+ rc = ntb_register_event_callback(nt->ndev,
+ ntb_transport_event_callback);
+ if (rc)
+ goto err2;
+
+ ntb_bus_init(nt);
+
+ INIT_DELAYED_WORK(&nt->link_work, ntb_transport_link_work);
+
+ if (ntb_hw_link_status(nt->ndev))
+ schedule_delayed_work(&nt->link_work, 0);
+
+ return 0;
+
+err2:
+ kfree(nt->qps);
+err1:
+ ntb_unregister_transport(nt->ndev);
+err:
+ debugfs_remove_recursive(nt->debugfs_dir);
+ kfree(nt);
+ return rc;
+}
+
+void ntb_transport_free(void *transport)
+{
+ struct ntb_transport *nt = transport;
+ struct pci_dev *pdev;
+ int i;
+
+ nt->transport_link = NTB_LINK_DOWN;
+
+ /* verify that all the qp's are freed */
+ while (!nt->qp_bitmap) {
+ i = ffz(nt->qp_bitmap);
+ if (i == ((u64) 1 << nt->max_qps))
+ break;
+ ntb_transport_free_queue(&nt->qps[i]);
+ }
+
+ ntb_bus_remove(nt);
+
+ cancel_delayed_work_sync(&nt->link_work);
+
+ debugfs_remove_recursive(nt->debugfs_dir);
+
+ ntb_unregister_event_callback(nt->ndev);
+
+ pdev = ntb_query_pdev(nt->ndev);
+
+ for (i = 0; i < NTB_NUM_MW; i++)
+ if (nt->mw[i].virt_addr)
+ dma_free_coherent(&pdev->dev, nt->mw[i].size,
+ nt->mw[i].virt_addr,
+ nt->mw[i].dma_addr);
+
+ kfree(nt->qps);
+ ntb_unregister_transport(nt->ndev);
+ kfree(nt);
+}
+
+static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry, void *offset)
+{
+
+ struct ntb_payload_header *hdr;
+
+ hdr = offset + transport_mtu - sizeof(struct ntb_payload_header);
+ entry->len = hdr->len;
+
+ memcpy(entry->buf, offset, entry->len);
+
+ /* Ensure that the data is fully copied out before clearing the flag */
+ wmb();
+ hdr->flags = 0;
+ ntb_list_add(&qp->ntb_rx_comp_q_lock, &entry->entry,
+ &qp->rx_comp_q);
+
+ if (qp->rx_handler && qp->client_ready)
+ qp->rx_handler(qp->cb_data, qp);
+}
+
+static int ntb_process_rxc(struct ntb_transport_qp *qp)
+{
+ struct ntb_payload_header *hdr;
+ struct ntb_queue_entry *entry;
+ void *offset;
+
+ entry = ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q);
+ if (!entry) {
+ hdr = qp->rx_offset;
+ pr_debug("no buffer - HDR ver %llu, len %d, flags %x\n",
+ hdr->ver, hdr->len, hdr->flags);
+ qp->rx_err_no_buf++;
+ return -ENOMEM;
+ }
+
+ offset = qp->rx_offset;
+ hdr = offset + transport_mtu - sizeof(struct ntb_payload_header);
+
+ if (!(hdr->flags & DESC_DONE_FLAG)) {
+ ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+ &qp->rx_pend_q);
+ qp->rx_ring_empty++;
+ return -EAGAIN;
+ }
+
+ if (hdr->ver != qp->rx_pkts) {
+ pr_debug("qp %d: version mismatch, expected %llu - got %llu\n",
+ qp->qp_num, qp->rx_pkts, hdr->ver);
+ ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+ &qp->rx_pend_q);
+ qp->rx_err_ver++;
+ return -EIO;
+ }
+
+ if (hdr->flags & NTB_LINK_DOWN) {
+ pr_info("qp %d: Link Down\n", qp->qp_num);
+ qp->qp_link = NTB_LINK_DOWN;
+ schedule_delayed_work(&qp->link_work,
+ msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+
+ if (qp->event_handler)
+ qp->event_handler(qp->cb_data, NTB_LINK_DOWN);
+
+ ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+ &qp->rx_pend_q);
+
+ /* Ensure that the data is fully copied out before clearing the
+ * done flag
+ */
+ wmb();
+ hdr->flags = 0;
+ goto out;
+ }
+
+ pr_debug("rx offset %p, ver %llu - %d payload received, buf size %d\n",
+ qp->rx_offset, hdr->ver, hdr->len, entry->len);
+
+ if (hdr->len <= entry->len)
+ ntb_rx_copy_task(qp, entry, offset);
+ else {
+ ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+ &qp->rx_pend_q);
+
+ /* Ensure that the data is fully copied out before clearing the
+ * done flag
+ */
+ wmb();
+ hdr->flags = 0;
+ qp->rx_err_oflow++;
+ pr_err("RX overflow! Wanted %d got %d\n", hdr->len, entry->len);
+ }
+
+ qp->rx_bytes += hdr->len;
+ qp->rx_pkts++;
+
+out:
+ qp->rx_offset += transport_mtu;
+ if (qp->rx_offset + transport_mtu >= qp->rx_buff_end)
+ qp->rx_offset = qp->rx_buff_begin;
+
+ return 0;
+}
+
+static void ntb_transport_rx(unsigned long data)
+{
+ struct ntb_transport_qp *qp = (struct ntb_transport_qp *)data;
+ int rc;
+
+ do {
+ rc = ntb_process_rxc(qp);
+ } while (!rc);
+}
+
+static void ntb_transport_rxc_db(void *data, int db_num)
+{
+ struct ntb_transport_qp *qp = data;
+
+ pr_debug("%s: doorbell %d received\n", __func__, db_num);
+
+ tasklet_schedule(&qp->rx_work);
+}
+
+static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry,
+ void *offset)
+{
+ struct ntb_payload_header *hdr;
+
+ memcpy_toio(offset, entry->buf, entry->len);
+
+ hdr = offset + transport_mtu - sizeof(struct ntb_payload_header);
+ hdr->len = entry->len;
+ hdr->ver = qp->tx_pkts;
+
+ /* Ensure that the data is fully copied out before setting the flag */
+ wmb();
+ hdr->flags = entry->flags | DESC_DONE_FLAG;
+
+ ntb_ring_sdb(qp->ndev, qp->qp_num);
+
+ /* The entry length can only be zero if the packet is intended to be a
+ * "link down" or similar. Since no payload is being sent in these
+ * cases, there is nothing to add to the completion queue.
+ */
+ if (entry->len > 0) {
+ qp->tx_bytes += entry->len;
+
+ /* Add fully transmitted data to completion queue */
+ ntb_list_add(&qp->ntb_tx_comp_q_lock, &entry->entry,
+ &qp->tx_comp_q);
+
+ if (qp->tx_handler)
+ qp->tx_handler(qp->cb_data, qp);
+ } else
+ ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
+ &qp->tx_free_q);
+}
+
+static int ntb_process_tx(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry)
+{
+ struct ntb_payload_header *hdr;
+ void *offset;
+
+ offset = qp->tx_offset;
+ hdr = offset + transport_mtu - sizeof(struct ntb_payload_header);
+
+ pr_debug("%lld - offset %p, tx %p, entry len %d flags %x buff %p\n",
+ qp->tx_pkts, offset, qp->tx_offset, entry->len, entry->flags,
+ entry->buf);
+ if (hdr->flags) {
+ qp->tx_ring_full++;
+ return -EAGAIN;
+ }
+
+ if (entry->len > transport_mtu) {
+ pr_err("Trying to send pkt size of %d\n", entry->len);
+ entry->flags = HW_ERROR_FLAG;
+
+ if (qp->tx_handler)
+ qp->tx_handler(qp->cb_data, qp);
+
+ return 0;
+ }
+
+ ntb_tx_copy_task(qp, entry, offset);
+
+ qp->tx_offset += transport_mtu;
+ if (qp->tx_offset + transport_mtu >= qp->tx_mw_end)
+ qp->tx_offset = qp->tx_mw_begin;
+
+ qp->tx_pkts++;
+
+ return 0;
+}
+
+static void ntb_send_link_down(struct ntb_transport_qp *qp)
+{
+ struct ntb_queue_entry *entry;
+ int i;
+
+ if (qp->qp_link == NTB_LINK_DOWN)
+ return;
+
+ qp->qp_link = NTB_LINK_DOWN;
+
+ /* Give Tx enough time to quiesce */
+ msleep(1000);
+
+ for (i = 0; i < NTB_LINK_DOWN_TIMEOUT; i++) {
+ entry = ntb_list_rm(&qp->ntb_tx_free_q_lock,
+ &qp->tx_free_q);
+ if (entry)
+ break;
+ msleep(100);
+ }
+
+ entry->callback_data = NULL;
+ entry->buf = NULL;
+ entry->len = 0;
+ entry->flags = LINK_DOWN_FLAG;
+
+ ntb_process_tx(qp, entry);
+}
+
+/**
+ * ntb_transport_create_queue - Create a new NTB transport layer queue
+ * @rx_handler: receive callback function
+ * @tx_handler: transmit callback function
+ * @event_handler: event callback function
+ *
+ * Create a new NTB transport layer queue and provide the queue with a callback
+ * routine for both transmit and receive. The receive callback routine will be
+ * used to pass up data when the transport has received it on the queue. The
+ * transmit callback routine will be called when the transport has completed the
+ * transmission of the data on the queue and the data is ready to be freed.
+ *
+ * RETURNS: pointer to newly created ntb_queue, NULL on error.
+ */
+struct ntb_transport_qp *
+ntb_transport_create_queue(void *data, struct pci_dev *pdev,
+ const struct ntb_queue_handlers *handlers)
+{
+ struct ntb_queue_entry *entry;
+ struct ntb_transport_qp *qp;
+ struct ntb_transport *nt;
+ unsigned int free_queue;
+ int rc, i;
+
+ nt = ntb_find_transport(pdev);
+ if (!nt)
+ goto err;
+
+ free_queue = ffs(nt->qp_bitmap);
+ if (!free_queue)
+ goto err;
+
+ /* decrement free_queue to make it zero based */
+ free_queue--;
+
+ clear_bit(free_queue, &nt->qp_bitmap);
+
+ qp = &nt->qps[free_queue];
+ qp->cb_data = data;
+ qp->rx_handler = handlers->rx_handler;
+ qp->tx_handler = handlers->tx_handler;
+ qp->event_handler = handlers->event_handler;
+
+ for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
+ entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
+ if (!entry)
+ goto err1;
+
+ ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
+ &qp->rx_free_q);
+ }
+
+ for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
+ entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
+ if (!entry)
+ goto err2;
+
+ ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
+ &qp->tx_free_q);
+ }
+
+ tasklet_init(&qp->rx_work, ntb_transport_rx, (unsigned long) qp);
+
+ rc = ntb_register_db_callback(qp->ndev, free_queue, qp,
+ ntb_transport_rxc_db);
+ if (rc)
+ goto err3;
+
+ pr_info("NTB Transport QP %d created\n", qp->qp_num);
+
+ return qp;
+
+err3:
+ tasklet_disable(&qp->rx_work);
+ tasklet_disable(&qp->tx_work);
+err2:
+ while ((entry =
+ ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
+ kfree(entry);
+err1:
+ while ((entry =
+ ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
+ kfree(entry);
+ set_bit(free_queue, &nt->qp_bitmap);
+err:
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_create_queue);
+
+/**
+ * ntb_transport_free_queue - Frees NTB transport queue
+ * @qp: NTB queue to be freed
+ *
+ * Frees NTB transport queue
+ */
+void ntb_transport_free_queue(struct ntb_transport_qp *qp)
+{
+ struct ntb_queue_entry *entry;
+
+ if (!qp)
+ return;
+
+ cancel_delayed_work_sync(&qp->link_work);
+
+ ntb_unregister_db_callback(qp->ndev, qp->qp_num);
+ tasklet_disable(&qp->rx_work);
+ tasklet_disable(&qp->tx_work);
+
+ while ((entry =
+ ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
+ kfree(entry);
+
+ while ((entry =
+ ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q))) {
+ pr_warn("Freeing item from a non-empty queue\n");
+ kfree(entry);
+ }
+
+ while ((entry =
+ ntb_list_rm(&qp->ntb_rx_comp_q_lock, &qp->rx_comp_q))) {
+ pr_warn("Freeing item from a non-empty queue\n");
+ kfree(entry);
+ }
+
+ while ((entry =
+ ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q)))
+ kfree(entry);
+
+ while ((entry =
+ ntb_list_rm(&qp->ntb_tx_comp_q_lock, &qp->tx_comp_q))) {
+ pr_warn("Freeing item from a non-empty queue\n");
+ kfree(entry);
+ }
+
+ set_bit(qp->qp_num, &qp->transport->qp_bitmap);
+
+ pr_info("NTB Transport QP %d freed\n", qp->qp_num);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_free_queue);
+
+/**
+ * ntb_transport_rx_remove - Dequeues enqueued rx packet
+ * @qp: NTB queue to be freed
+ * @len: pointer to variable to write enqueued buffers length
+ *
+ * Dequeues unused buffers from receive queue. Should only be used during
+ * shutdown of qp.
+ *
+ * RETURNS: NULL error value on error, or void* for success.
+ */
+void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len)
+{
+ struct ntb_queue_entry *entry;
+ void *buf;
+
+ if (!qp || qp->client_ready == NTB_LINK_UP)
+ return NULL;
+
+ entry = ntb_list_rm(&qp->ntb_rx_pend_q_lock, &qp->rx_pend_q);
+ if (!entry)
+ return NULL;
+
+ buf = entry->callback_data;
+ *len = entry->len;
+
+ ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
+ &qp->rx_free_q);
+
+ return buf;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_rx_remove);
+
+/**
+ * ntb_transport_rx_enqueue - Enqueue a new NTB queue entry
+ * @qp: NTB transport layer queue the entry is to be enqueued on
+ * @cb: per buffer pointer for callback function to use
+ * @data: pointer to data buffer that incoming packets will be copied into
+ * @len: length of the data buffer
+ *
+ * Enqueue a new receive buffer onto the transport queue into which a NTB
+ * payload can be received into.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+ unsigned int len)
+{
+ struct ntb_queue_entry *entry;
+
+ if (!qp)
+ return -EINVAL;
+
+ entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q);
+ if (!entry)
+ return -ENOMEM;
+
+ entry->callback_data = cb;
+ entry->buf = data;
+ entry->len = len;
+
+ ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+ &qp->rx_pend_q);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_rx_enqueue);
+
+/**
+ * ntb_transport_tx_enqueue - Enqueue a new NTB queue entry
+ * @qp: NTB transport layer queue the entry is to be enqueued on
+ * @cb: per buffer pointer for callback function to use
+ * @data: pointer to data buffer that will be sent
+ * @len: length of the data buffer
+ *
+ * Enqueue a new transmit buffer onto the transport queue from which a NTB
+ * payload will be transmitted.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+ unsigned int len)
+{
+ struct ntb_queue_entry *entry;
+
+ if (!qp || qp->qp_link != NTB_LINK_UP || !len)
+ return -EINVAL;
+
+ entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
+ if (!entry) {
+ /* ring full, kick it */
+ tasklet_schedule(&qp->tx_work);
+ return -ENOMEM;
+ }
+
+ entry->callback_data = cb;
+ entry->buf = data;
+ entry->len = len;
+ entry->flags = 0;
+
+ return ntb_process_tx(qp, entry);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_tx_enqueue);
+
+/**
+ * ntb_transport_tx_dequeue - Dequeue a NTB queue entry
+ * @qp: NTB transport layer queue to be dequeued from
+ * @len: length of the data buffer
+ *
+ * This function will dequeue a buffer from the transmit complete queue.
+ * Entries will only be enqueued on this queue after having been
+ * transfered to the remote side.
+ *
+ * RETURNS: callback pointer of the buffer from the transport queue, or NULL
+ * on empty
+ */
+void *ntb_transport_tx_dequeue(struct ntb_transport_qp *qp, unsigned int *len)
+{
+ struct ntb_queue_entry *entry;
+ void *buf;
+
+ if (!qp)
+ return NULL;
+
+ entry = ntb_list_rm(&qp->ntb_tx_comp_q_lock, &qp->tx_comp_q);
+ if (!entry)
+ return NULL;
+
+ buf = entry->callback_data;
+ if (entry->flags != HW_ERROR_FLAG)
+ *len = entry->len;
+ else
+ *len = -EIO;
+
+ ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
+ &qp->tx_free_q);
+
+ return buf;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_tx_dequeue);
+
+/**
+ * ntb_transport_rx_dequeue - Dequeue a NTB queue entry
+ * @qp: NTB transport layer queue to be dequeued from
+ * @len: length of the data buffer
+ *
+ * This function will dequeue a buffer from the receive complete queue.
+ * Entries will only be enqueued on this queue after having been fully received.
+ *
+ * RETURNS: callback pointer of the buffer from the transport queue, or NULL
+ * on empty
+ */
+void *ntb_transport_rx_dequeue(struct ntb_transport_qp *qp, unsigned int *len)
+{
+ struct ntb_queue_entry *entry;
+ void *buf;
+
+ if (!qp)
+ return NULL;
+
+ entry = ntb_list_rm(&qp->ntb_rx_comp_q_lock, &qp->rx_comp_q);
+ if (!entry)
+ return NULL;
+
+ buf = entry->callback_data;
+ *len = entry->len;
+
+ ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
+ &qp->rx_free_q);
+
+ return buf;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_rx_dequeue);
+
+/**
+ * ntb_transport_link_up - Notify NTB transport of client readiness to use queue
+ * @qp: NTB transport layer queue to be enabled
+ *
+ * Notify NTB transport layer of client readiness to use queue
+ */
+void ntb_transport_link_up(struct ntb_transport_qp *qp)
+{
+ if (!qp)
+ return;
+
+ qp->client_ready = NTB_LINK_UP;
+
+ if (qp->transport->transport_link == NTB_LINK_UP)
+ schedule_delayed_work(&qp->link_work, 0);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_link_up);
+
+/**
+ * ntb_transport_link_down - Notify NTB transport to no longer enqueue data
+ * @qp: NTB transport layer queue to be disabled
+ *
+ * Notify NTB transport layer of client's desire to no longer receive data on
+ * transport queue specified. It is the client's responsibility to ensure all
+ * entries on queue are purged or otherwise handled appropraitely.
+ */
+void ntb_transport_link_down(struct ntb_transport_qp *qp)
+{
+ int rc, val;
+
+ if (!qp)
+ return;
+
+ qp->client_ready = NTB_LINK_DOWN;
+
+ cancel_delayed_work_sync(&qp->link_work);
+ qp->qp_link = NTB_LINK_DOWN;
+
+ rc = ntb_read_local_spad(qp->ndev, QP_LINKS, &val);
+ if (rc) {
+ pr_err("Error reading spad %d\n", QP_LINKS);
+ return;
+ }
+
+ rc = ntb_write_remote_spad(qp->ndev, QP_LINKS,
+ val & ~(1 << qp->qp_num));
+ if (rc)
+ pr_err("Error writing %x to remote spad %d\n",
+ val & ~(1 << qp->qp_num), QP_LINKS);
+
+ if (qp->transport->transport_link == NTB_LINK_UP)
+ ntb_send_link_down(qp);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_link_down);
+
+/**
+ * ntb_transport_link_query - Query transport link state
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query connectivity to the remote system of the NTB transport queue
+ *
+ * RETURNS: true for link up or false for link down
+ */
+bool ntb_transport_link_query(struct ntb_transport_qp *qp)
+{
+ return qp->qp_link == NTB_LINK_UP;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_link_query);
+
+/**
+ * ntb_transport_qp_num - Query the qp number
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query qp number of the NTB transport queue
+ *
+ * RETURNS: a zero based number specifying the qp number
+ */
+unsigned char ntb_transport_qp_num(struct ntb_transport_qp *qp)
+{
+ return qp->qp_num;
+}
+EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
+
+/**
+ * ntb_transport_max_size - Query the max payload size of a qp
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query the maximum payload size permissible on the given qp
+ *
+ * RETURNS: the max payload size of a qp
+ */
+unsigned int
+ntb_transport_max_size(__attribute__((unused)) struct ntb_transport_qp *qp)
+{
+ return transport_mtu - sizeof(struct ntb_payload_header);
+}
+EXPORT_SYMBOL_GPL(ntb_transport_max_size);
diff --git a/include/linux/ntb.h b/include/linux/ntb.h
new file mode 100644
index 0000000..01c2651
--- /dev/null
+++ b/include/linux/ntb.h
@@ -0,0 +1,81 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copy
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+struct ntb_transport_qp;
+
+struct ntb_client {
+ struct device_driver driver;
+ int (*probe) (struct pci_dev *pdev);
+ void (*remove) (struct pci_dev *pdev);
+};
+
+int ntb_register_client(struct ntb_client *drvr);
+void ntb_unregister_client(struct ntb_client *drvr);
+
+struct ntb_queue_handlers {
+ void (*tx_handler) (void *data, struct ntb_transport_qp *qp);
+ void (*rx_handler) (void *data, struct ntb_transport_qp *qp);
+ void (*event_handler) (void *data, int status);
+};
+
+unsigned char ntb_transport_qp_num(struct ntb_transport_qp *qp);
+unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp);
+struct ntb_transport_qp *
+ntb_transport_create_queue(void *data, struct pci_dev *pdev,
+ const struct ntb_queue_handlers *handlers);
+void ntb_transport_free_queue(struct ntb_transport_qp *qp);
+int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+ unsigned int len);
+int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+ unsigned int len);
+void *ntb_transport_tx_dequeue(struct ntb_transport_qp *qp, unsigned int *len);
+void *ntb_transport_rx_dequeue(struct ntb_transport_qp *qp, unsigned int *len);
+void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len);
+void ntb_transport_link_up(struct ntb_transport_qp *qp);
+void ntb_transport_link_down(struct ntb_transport_qp *qp);
+bool ntb_transport_link_query(struct ntb_transport_qp *qp);
--
1.7.9.5
^ permalink raw reply related
* [RFC v4 2/2] net: Add support for NTB virtual ethernet device
From: Jon Mason @ 2012-09-27 22:42 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, linux-pci, Dave Jiang, Nicholas Bellinger
A virtual ethernet device that uses the NTB transport API to send/receive data.
Signed-off-by: Jon Mason <jon.mason@intel.com>
---
drivers/net/Kconfig | 4 +
drivers/net/Makefile | 1 +
drivers/net/ntb_netdev.c | 407 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 412 insertions(+)
create mode 100644 drivers/net/ntb_netdev.c
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0c2bd80..9bf8a71 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -178,6 +178,10 @@ config NETPOLL_TRAP
config NET_POLL_CONTROLLER
def_bool NETPOLL
+config NTB_NETDEV
+ tristate "Virtual Ethernet over NTB"
+ depends on NTB
+
config RIONET
tristate "RapidIO Ethernet over messaging driver support"
depends on RAPIDIO
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3d375ca..9890148 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -69,3 +69,4 @@ obj-$(CONFIG_USB_IPHETH) += usb/
obj-$(CONFIG_USB_CDC_PHONET) += usb/
obj-$(CONFIG_HYPERV_NET) += hyperv/
+obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o
diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
new file mode 100644
index 0000000..bc76c63
--- /dev/null
+++ b/drivers/net/ntb_netdev.c
@@ -0,0 +1,407 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copy
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Network Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/ntb.h>
+
+#define NTB_NETDEV_VER "0.5"
+
+MODULE_DESCRIPTION(KBUILD_MODNAME);
+MODULE_VERSION(NTB_NETDEV_VER);
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Intel Corporation");
+
+struct ntb_netdev {
+ struct list_head list;
+ struct pci_dev *pdev;
+ struct net_device *ndev;
+ struct ntb_transport_qp *qp;
+};
+
+#define NTB_TX_TIMEOUT_MS 1000
+#define NTB_RXQ_SIZE 100
+
+static LIST_HEAD(dev_list);
+
+static void ntb_netdev_event_handler(void *data, int status)
+{
+ struct net_device *ndev = data;
+ struct ntb_netdev *dev = netdev_priv(ndev);
+
+ netdev_dbg(ndev, "Event %x, Link %x\n", status,
+ ntb_transport_link_query(dev->qp));
+
+ /* Currently, only link status event is supported */
+ if (status)
+ netif_carrier_on(ndev);
+ else
+ netif_carrier_off(ndev);
+}
+
+static void ntb_netdev_rx_handler(void *data, struct ntb_transport_qp *qp)
+{
+ struct net_device *ndev = data;
+ struct sk_buff *skb;
+ int len, rc;
+
+ while ((skb = ntb_transport_rx_dequeue(qp, &len))) {
+ netdev_dbg(ndev, "%s: %d byte payload received\n", __func__,
+ len);
+
+ skb_put(skb, len);
+ skb->protocol = eth_type_trans(skb, ndev);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ if (netif_rx(skb) == NET_RX_DROP) {
+ ndev->stats.rx_errors++;
+ ndev->stats.rx_dropped++;
+ } else {
+ ndev->stats.rx_packets++;
+ ndev->stats.rx_bytes += len;
+ }
+
+ skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
+ if (!skb) {
+ ndev->stats.rx_errors++;
+ ndev->stats.rx_frame_errors++;
+ break;
+ }
+
+ rc = ntb_transport_rx_enqueue(qp, skb, skb->data,
+ ndev->mtu + ETH_HLEN);
+ if (rc) {
+ ndev->stats.rx_errors++;
+ ndev->stats.rx_fifo_errors++;
+ break;
+ }
+ }
+}
+
+static void ntb_netdev_tx_handler(void *data, struct ntb_transport_qp *qp)
+{
+ struct net_device *ndev = data;
+ struct sk_buff *skb;
+ int len;
+
+ while ((skb = ntb_transport_tx_dequeue(qp, &len))) {
+ ndev->stats.tx_packets++;
+ ndev->stats.tx_bytes += skb->len;
+ dev_kfree_skb(skb);
+ }
+
+ if (ndev && netif_queue_stopped(ndev))
+ netif_wake_queue(ndev);
+}
+
+static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
+ struct net_device *ndev)
+{
+ struct ntb_netdev *dev = netdev_priv(ndev);
+ int rc;
+
+ netdev_dbg(ndev, "ntb_transport_tx_enqueue\n");
+
+ rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
+ if (rc)
+ goto err;
+
+ return NETDEV_TX_OK;
+
+err:
+ ndev->stats.tx_dropped++;
+ ndev->stats.tx_errors++;
+ netif_stop_queue(ndev);
+ return NETDEV_TX_BUSY;
+}
+
+static int ntb_netdev_open(struct net_device *ndev)
+{
+ struct ntb_netdev *dev = netdev_priv(ndev);
+ struct sk_buff *skb;
+ int rc, i, len;
+
+ /* Add some empty rx bufs */
+ for (i = 0; i < NTB_RXQ_SIZE; i++) {
+ skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
+ if (!skb) {
+ rc = -ENOMEM;
+ goto err;
+ }
+
+ rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
+ ndev->mtu + ETH_HLEN);
+ if (rc == -EINVAL)
+ goto err;
+ }
+
+ netif_carrier_off(ndev);
+ ntb_transport_link_up(dev->qp);
+
+ return 0;
+
+err:
+ while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+ dev_kfree_skb(skb);
+ return rc;
+}
+
+static int ntb_netdev_close(struct net_device *ndev)
+{
+ struct ntb_netdev *dev = netdev_priv(ndev);
+ struct sk_buff *skb;
+ int len;
+
+ ntb_transport_link_down(dev->qp);
+
+ while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+ dev_kfree_skb(skb);
+
+ return 0;
+}
+
+static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
+{
+ struct ntb_netdev *dev = netdev_priv(ndev);
+ struct sk_buff *skb;
+ int len, rc;
+
+ if (new_mtu > ntb_transport_max_size(dev->qp) - ETH_HLEN)
+ return -EINVAL;
+
+ if (!netif_running(ndev)) {
+ ndev->mtu = new_mtu;
+ return 0;
+ }
+
+ /* Bring down the link and dispose of posted rx entries */
+ ntb_transport_link_down(dev->qp);
+
+ if (ndev->mtu < new_mtu) {
+ int i;
+
+ for (i = 0; (skb = ntb_transport_rx_remove(dev->qp, &len)); i++)
+ dev_kfree_skb(skb);
+
+ for (; i; i--) {
+ skb = netdev_alloc_skb(ndev, new_mtu + ETH_HLEN);
+ if (!skb) {
+ rc = -ENOMEM;
+ goto err;
+ }
+
+ rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
+ new_mtu + ETH_HLEN);
+ if (rc) {
+ dev_kfree_skb(skb);
+ goto err;
+ }
+ }
+ }
+
+ ndev->mtu = new_mtu;
+
+ ntb_transport_link_up(dev->qp);
+
+ return 0;
+
+err:
+ ntb_transport_link_down(dev->qp);
+
+ while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+ dev_kfree_skb(skb);
+
+ netdev_err(ndev, "Error changing MTU, device inoperable\n");
+ return rc;
+}
+
+static void ntb_netdev_tx_timeout(struct net_device *ndev)
+{
+ if (netif_running(ndev))
+ netif_wake_queue(ndev);
+}
+
+static const struct net_device_ops ntb_netdev_ops = {
+ .ndo_open = ntb_netdev_open,
+ .ndo_stop = ntb_netdev_close,
+ .ndo_start_xmit = ntb_netdev_start_xmit,
+ .ndo_change_mtu = ntb_netdev_change_mtu,
+ .ndo_tx_timeout = ntb_netdev_tx_timeout,
+ .ndo_set_mac_address = eth_mac_addr,
+};
+
+static void ntb_get_drvinfo(struct net_device *ndev,
+ struct ethtool_drvinfo *info)
+{
+ struct ntb_netdev *dev = netdev_priv(ndev);
+
+ strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
+ strlcpy(info->version, NTB_NETDEV_VER, sizeof(info->version));
+ strlcpy(info->bus_info, pci_name(dev->pdev), sizeof(info->bus_info));
+}
+
+static int ntb_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+ cmd->supported = SUPPORTED_Backplane;
+ cmd->advertising = ADVERTISED_Backplane;
+ cmd->speed = SPEED_UNKNOWN;
+ ethtool_cmd_speed_set(cmd, SPEED_UNKNOWN);
+ cmd->duplex = DUPLEX_FULL;
+ cmd->port = PORT_OTHER;
+ cmd->phy_address = 0;
+ cmd->transceiver = XCVR_DUMMY1;
+ cmd->autoneg = AUTONEG_ENABLE;
+ cmd->maxtxpkt = 0;
+ cmd->maxrxpkt = 0;
+
+ return 0;
+}
+
+static const struct ethtool_ops ntb_ethtool_ops = {
+ .get_drvinfo = ntb_get_drvinfo,
+ .get_link = ethtool_op_get_link,
+ .get_settings = ntb_get_settings,
+};
+
+static const struct ntb_queue_handlers ntb_netdev_handlers = {
+ .tx_handler = ntb_netdev_tx_handler,
+ .rx_handler = ntb_netdev_rx_handler,
+ .event_handler = ntb_netdev_event_handler,
+};
+
+static int __devinit ntb_netdev_probe(struct pci_dev *pdev)
+{
+ struct net_device *ndev;
+ struct ntb_netdev *dev;
+ int rc;
+
+ ndev = alloc_etherdev(sizeof(struct ntb_netdev));
+ if (!ndev)
+ return -ENOMEM;
+
+ dev = netdev_priv(ndev);
+ dev->ndev = ndev;
+ dev->pdev = pdev;
+ BUG_ON(!dev->pdev);
+ ndev->features = NETIF_F_HIGHDMA;
+
+ ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+
+ ndev->hw_features = ndev->features;
+ ndev->watchdog_timeo = msecs_to_jiffies(NTB_TX_TIMEOUT_MS);
+
+ random_ether_addr(ndev->perm_addr);
+ memcpy(ndev->dev_addr, ndev->perm_addr, ndev->addr_len);
+
+ ndev->netdev_ops = &ntb_netdev_ops;
+ SET_ETHTOOL_OPS(ndev, &ntb_ethtool_ops);
+
+ dev->qp = ntb_transport_create_queue(ndev, pdev, &ntb_netdev_handlers);
+ if (!dev->qp) {
+ rc = -EIO;
+ goto err;
+ }
+
+ ndev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
+
+ rc = register_netdev(ndev);
+ if (rc)
+ goto err1;
+
+ list_add(&dev->list, &dev_list);
+ pr_info("%s: %s created\n", KBUILD_MODNAME, ndev->name);
+ return 0;
+
+err1:
+ ntb_transport_free_queue(dev->qp);
+err:
+ free_netdev(ndev);
+ return rc;
+}
+
+static void __exit ntb_netdev_remove(struct pci_dev *pdev)
+{
+ struct net_device *ndev;
+ struct ntb_netdev *dev;
+
+ list_for_each_entry(dev, &dev_list, list) {
+ if (dev->pdev == pdev)
+ break;
+ }
+ if (dev == NULL)
+ return;
+
+ ndev = dev->ndev;
+
+ unregister_netdev(ndev);
+ ntb_transport_free_queue(dev->qp);
+ free_netdev(ndev);
+}
+
+static struct ntb_client ntb_netdev_client = {
+ .driver.name = KBUILD_MODNAME,
+ .driver.owner = THIS_MODULE,
+ .probe = ntb_netdev_probe,
+ .remove = ntb_netdev_remove,
+};
+
+static int __init ntb_netdev_init_module(void)
+{
+ return ntb_register_client(&ntb_netdev_client);
+}
+
+module_init(ntb_netdev_init_module);
+
+static void __exit ntb_netdev_exit_module(void)
+{
+ ntb_unregister_client(&ntb_netdev_client);
+ pr_info("%s: Driver removed\n", KBUILD_MODNAME);
+}
+
+module_exit(ntb_netdev_exit_module);
--
1.7.9.5
^ permalink raw reply related
* [RFC v4 0/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-09-27 22:42 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, linux-pci, Dave Jiang, Nicholas Bellinger
Version 4 of the patch that adds support for PCI-Express Non-Transparent
Bridges in Linux. This version removes the transport transmit tasklet
(per Dave Miller's request).
Thanks,
Jon
^ permalink raw reply
* Re: [PATCH net-next] net: remove sk_init() helper
From: David Miller @ 2012-09-27 22:42 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1348608733.26828.3492.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 25 Sep 2012 23:32:13 +0200
> From: Eric Dumazet <edumazet@google.com>
>
> It seems sk_init() has no value today and even does strange things :
>
> # grep . /proc/sys/net/core/?mem_*
> /proc/sys/net/core/rmem_default:212992
> /proc/sys/net/core/rmem_max:131071
> /proc/sys/net/core/wmem_default:212992
> /proc/sys/net/core/wmem_max:131071
>
> We can remove it completely.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: Possible bug with r8169 driver
From: Francois Romieu @ 2012-09-27 22:21 UTC (permalink / raw)
To: Nolwenn; +Cc: hayeswang, netdev
In-Reply-To: <2197741.c8FeV4Xbis@bureau>
Nolwenn <donolwenn@gmail.com> :
[...]
> I'm on Archlinux (kernel 3.5.4), I need to set eth0 to promiscuous to get a
> stable IPv6 connectivity. My internet access provider is Free (France) and
> IPv6 is enable on the box.
> If eth0 is not set to promiscuous, I need to intercep packets with
> "tcpdump -nvvv ip6" to get an IPv6 connectivity and it's lost, a few later,
> when interception is stopped.
Can you send an 'ip -s link' before any ipv6 traffic flows, then
after ?
> % dmesg | grep r8169
>
> [ 1.810423] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [ 1.810548] r8169 0000:03:00.0: irq 44 for MSI/MSI-X
> [ 1.810671] r8169 0000:03:00.0: eth0: RTL8168f/8111f at 0xffffc900057ae000, 30:85:a9:ee:43:fa, XID 08000800 IRQ 44
> [ 1.810672] r8169 0000:03:00.0: eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Hayes, is there by any luck something different for the 8168f regarding
the layout of the multicast filtering registers ?
Thanks.
--
Ueimor
^ permalink raw reply
* Re: [PATCH] 8139too: add 1013:1211 PCI ID for a strange SMC1211TX.
From: David Miller @ 2012-09-27 22:40 UTC (permalink / raw)
To: wking; +Cc: netdev, jasowang, greearb, joe, linux-kernel
In-Reply-To: <eaafda75f4966acad08dd10e2627d7615b61206c.1348599606.git.wking@tremily.us>
From: "W. Trevor King" <wking@tremily.us>
Date: Tue, 25 Sep 2012 16:35:54 -0400
> The FCC ID on the board is HEDEN1207DTXR01, which belongs to Accton
> Technology Corporation. This matches the expected 1113 ID. Perhaps
> my board just has a dying EEPROM?
>
> Signed-off-by: W. Trevor King <wking@tremily.us>
> ---
> I'm not sure if this qualifies as a patch-able issue, but I thought
> I'd send it in in case someone else gets bitten by this.
That indeed looks like a bit flip to me, 0x1013 is Cirrus Logic who,
last time I checked, hasn't made any 8139too network cards :-)
I think I'll pass on this patch until we have someone else hit it.
^ permalink raw reply
* RE: [PATCH] qlcnic - Fix scheduling while atomic bug
From: Jitendra Kalsaria @ 2012-09-27 22:40 UTC (permalink / raw)
To: Narendra_K@Dell.com, netdev; +Cc: Sony Chacko, john.r.fastabend@intel.com
In-Reply-To: <20120925175212.GA1324@fedora-17-guest.blr.amer.dell.com>
>-----Original Message-----
>From: Narendra_K@Dell.com [mailto:Narendra_K@Dell.com]
>Sent: Tuesday, September 25, 2012 10:53 AM
>To: netdev
>Cc: Sony Chacko; Jitendra Kalsaria; john.r.fastabend@intel.com
>Subject: [PATCH] qlcnic - Fix scheduling while atomic bug
>
>From: Narendra K <narendra_k@dell.com>
>
>In the device close path, 'qlcnic_fw_destroy_ctx' and
>'qlcnic_poll_rsp' call msleep. But 'qlcnic_fw_destroy_ctx' and
>'qlcnic_poll_rsp' are called with 'adapter->tx_clean_lock' spin lock
>held resulting in scheduling while atomic bug causing the following
>trace.
>
>I observed that the commit 012dc19a45b2b9cc2ebd14aaa401cf782c2abba4
>from John Fastabend addresses a similar issue in ixgbevf driver.
>Adopting the same approach used in the commit, this patch uses mdelay
>to address the issue.
>
>[79884.999115] BUG: scheduling while atomic: ip/30846/0x00000002
>[79885.005562] INFO: lockdep is turned off.
>[79885.009958] Modules linked in: qlcnic fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE bnep bluetooth rfkill ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat >nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables dcdbas coretemp kvm_intel kvm iTCO_wdt ixgbe iTCO_vendor_support crc32c_intel ghash_clmulni_intel nfsd microcode >sb_edac pcspkr edac_core dca bnx2x shpchp auth_rpcgss nfs_acl lpc_ich mfd_core mdio lockd libcrc32c wmi acpi_pad acpi_power_meter sunrpc uinput sd_mod sr_mod cdrom crc_t10dif ahci libahci libata megaraid_sas usb_storage dm_mirror >dm_region_hash dm_log dm_mod [last unloaded: qlcnic]
>[79885.083608] Pid: 30846, comm: ip Tainted: G W O 3.6.0-rc7+ #1
>[79885.090805] Call Trace:
>[79885.093569] [<ffffffff816764d8>] __schedule_bug+0x68/0x76
>[79885.099699] [<ffffffff8168358e>] __schedule+0x99e/0xa00
>[79885.105634] [<ffffffff81683929>] schedule+0x29/0x70
>[79885.111186] [<ffffffff81680def>] schedule_timeout+0x16f/0x350
>[79885.117724] [<ffffffff811afb7a>] ? init_object+0x4a/0x90
>[79885.123770] [<ffffffff8107c190>] ? __internal_add_timer+0x140/0x140
>[79885.130873] [<ffffffff81680fee>] schedule_timeout_uninterruptible+0x1e/0x20
>[79885.138773] [<ffffffff8107e830>] msleep+0x20/0x30
>[79885.144159] [<ffffffffa04c7fbf>] qlcnic_issue_cmd+0xef/0x290 [qlcnic]
>[79885.151478] [<ffffffffa04c8265>] qlcnic_fw_cmd_destroy_rx_ctx+0x55/0x90 [qlcnic]
>[79885.159868] [<ffffffffa04c92fd>] qlcnic_fw_destroy_ctx+0x2d/0xa0 [qlcnic]
>[79885.167576] [<ffffffffa04bf2ed>] __qlcnic_down+0x11d/0x180 [qlcnic]
>[79885.174708] [<ffffffffa04bf6f8>] qlcnic_close+0x18/0x20 [qlcnic]
>[79885.181547] [<ffffffff8153b4c5>] __dev_close_many+0x95/0xe0
>[79885.187899] [<ffffffff8153b548>] __dev_close+0x38/0x50
>[79885.193761] [<ffffffff81545101>] __dev_change_flags+0xa1/0x180
>[79885.200419] [<ffffffff81545298>] dev_change_flags+0x28/0x70
>[79885.206779] [<ffffffff815531b8>] do_setlink+0x378/0xa00
>[79885.212731] [<ffffffff81354fe1>] ? nla_parse+0x31/0xe0
>[79885.218612] [<ffffffff815558ee>] rtnl_newlink+0x37e/0x560
>[79885.224768] [<ffffffff812cfa19>] ? selinux_capable+0x39/0x50
>[79885.231217] [<ffffffff812cbf98>] ? security_capable+0x18/0x20
>[79885.237765] [<ffffffff81555114>] rtnetlink_rcv_msg+0x114/0x2f0
>[79885.244412] [<ffffffff81551f87>] ? rtnl_lock+0x17/0x20
>[79885.250280] [<ffffffff81551f87>] ? rtnl_lock+0x17/0x20
>[79885.256148] [<ffffffff81555000>] ? __rtnl_unlock+0x20/0x20
>[79885.262413] [<ffffffff81570fc1>] netlink_rcv_skb+0xa1/0xb0
>[79885.268661] [<ffffffff81551fb5>] rtnetlink_rcv+0x25/0x40
>[79885.274727] [<ffffffff815708bd>] netlink_unicast+0x19d/0x220
>[79885.281146] [<ffffffff81570c45>] netlink_sendmsg+0x305/0x3f0
>[79885.287595] [<ffffffff8152b188>] ? sock_update_classid+0x148/0x2e0
>[79885.294650] [<ffffffff81525c2c>] sock_sendmsg+0xbc/0xf0
>[79885.300600] [<ffffffff8152600c>] __sys_sendmsg+0x3ac/0x3c0
>[79885.306853] [<ffffffff8109be23>] ? up_read+0x23/0x40
>[79885.312510] [<ffffffff816896cc>] ? do_page_fault+0x2bc/0x570
>[79885.318968] [<ffffffff81191854>] ? sys_brk+0x44/0x150
>[79885.324715] [<ffffffff811c458c>] ? fget_light+0x24c/0x520
>[79885.330875] [<ffffffff815286f9>] sys_sendmsg+0x49/0x90
>[79885.336707] [<ffffffff8168e429>] system_call_fastpath+0x16/0x1b
>
>Signed-off-by: Narendra K <narendra_k@dell.com>
>---
>The patch applies to latest mainline kernel version 3.6-rc7
>
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c
>index b8ead69..2a179d0 100644
>--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c
>+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c
>@@ -15,7 +15,7 @@ qlcnic_poll_rsp(struct qlcnic_adapter *adapter)
>
> do {
> /* give atleast 1ms for firmware to respond */
>- msleep(1);
>+ mdelay(1);
>
> if (++timeout > QLCNIC_OS_CRB_RETRY_COUNT)
> return QLCNIC_CDRP_RSP_TIMEOUT;
>@@ -601,7 +601,7 @@ void qlcnic_fw_destroy_ctx(struct qlcnic_adapter *adapter)
> qlcnic_fw_cmd_destroy_tx_ctx(adapter);
>
> /* Allow dma queues to drain after context reset */
>- msleep(20);
>+ mdelay(20);
> }
> }
Looks good to me.
--
1.7.10.2
--
With regards,
Narendra K
^ permalink raw reply
* [PATCH] pkt_sched: Fix warning false positives.
From: David Miller @ 2012-09-27 22:36 UTC (permalink / raw)
To: netdev
GCC refuses to recognize that all error control flows do in fact
set err to something.
Add an explicit initialization to shut it up.
net/sched/sch_drr.c: In function ‘drr_enqueue’:
net/sched/sch_drr.c:359:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/sched/sch_qfq.c: In function ‘qfq_enqueue’:
net/sched/sch_qfq.c:885:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/sched/sch_drr.c | 2 +-
net/sched/sch_qfq.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 9ce0b4f..71e50c8 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -352,7 +352,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
{
struct drr_sched *q = qdisc_priv(sch);
struct drr_class *cl;
- int err;
+ int err = 0;
cl = drr_classify(skb, sch, &err);
if (cl == NULL) {
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index e4723d3..2556620 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -878,7 +878,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
{
struct qfq_sched *q = qdisc_priv(sch);
struct qfq_class *cl;
- int err;
+ int err = 0;
cl = qfq_classify(skb, sch, &err);
if (cl == NULL) {
--
1.7.11.4
^ permalink raw reply related
* You have to fix this
From: David Miller @ 2012-09-27 22:34 UTC (permalink / raw)
To: vipul; +Cc: netdev
You cannot put such monster sized local data objects on the stack:
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function ‘t4_memory_rw.constprop.6’:
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:484:1: warning: the frame size of 2056 bytes is larger than 2048 bytes [-Wframe-larger-than=]
That's because of this thing:
__be32 data[MEMWIN0_APERTURE/sizeof(__be32)];
I'm really surprised this didn't show up in any of your test builds.
Or are you ignoring warnings that your changes add?
^ permalink raw reply
* [PATCH] bna: Fix warning false positive.
From: David Miller @ 2012-09-27 22:32 UTC (permalink / raw)
To: netdev
GCC can't see that in all non-error-return paths we do in fact
set *using_dac to something.
Add an explicit initialization to remove this warning:
drivers/net/ethernet/brocade/bna/bnad.c: In function ‘bnad_pci_probe’:
drivers/net/ethernet/brocade/bna/bnad.c:3079:5: warning: ‘using_dac’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/net/ethernet/brocade/bna/bnad.c:3233:7: note: ‘using_dac’ was declared here
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/brocade/bna/bnad.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index b441f33..ce1eac5 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -3268,6 +3268,7 @@ bnad_pci_probe(struct pci_dev *pdev,
* Output : using_dac = 1 for 64 bit DMA
* = 0 for 32 bit DMA
*/
+ using_dac = false;
err = bnad_pci_init(bnad, pdev, &using_dac);
if (err)
goto unlock_mutex;
--
1.7.11.4
^ permalink raw reply related
* Re: [PATCH] net: phy: smsc: Implement PHY config_init for LAN87xx
From: David Miller @ 2012-09-27 22:28 UTC (permalink / raw)
To: otavio; +Cc: marex, netdev, chohnstaedt, fabio.estevam, peppe.cavallaro
In-Reply-To: <CAP9ODKqC5PhkWxuMsFgA019-wZuygr_hhEme_xyP9dy4C6bLmw@mail.gmail.com>
From: Otavio Salvador <otavio@ossystems.com.br>
Date: Thu, 27 Sep 2012 15:21:37 -0300
> On Tue, Sep 25, 2012 at 5:17 PM, Marek Vasut <marex@denx.de> wrote:
>> The LAN8710/LAN8720 chips do have broken the "FlexPWR" smart power-saving
>> capability. Enabling it leads to the PHY not being able to detect Link when
>> cold-started without cable connected. Thus, make sure this is disabled.
>>
>> Signed-off-by: Marek Vasut <marex@denx.de>
>
> Acked-by: Otavio Salvador <otavio@ossystems.com.br>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] rtlwifi: use %*ph[C] to dump small buffers
From: Larry Finger @ 2012-09-27 22:28 UTC (permalink / raw)
To: Joe Perches
Cc: Andy Shevchenko, Chaoming Li, David S. Miller,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1348677946.15175.8.camel@joe-AO722>
On 09/26/2012 11:45 AM, Joe Perches wrote:
> On Wed, 2012-09-26 at 16:57 +0300, Andy Shevchenko wrote:
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>
> Hi Andy.
>
> (adding Larry and Chaoming to cc's)
>
> Please use scripts/get_maintainer.pl on your patches to
> see who maintains these files so you can cc them.
>
> One comment below, it looks like a possible endian bug.
> (not in your patch)
>
>> ---
>> drivers/net/wireless/rtlwifi/cam.c | 7 ++-----
>> drivers/net/wireless/rtlwifi/rtl8192ce/hw.c | 6 ++----
>> drivers/net/wireless/rtlwifi/rtl8192cu/hw.c | 6 ++----
>> 3 files changed, 6 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/net/wireless/rtlwifi/cam.c b/drivers/net/wireless/rtlwifi/cam.c
>> index 5b4b4d4..6ba9f80 100644
>> --- a/drivers/net/wireless/rtlwifi/cam.c
>> +++ b/drivers/net/wireless/rtlwifi/cam.c
>> @@ -52,11 +52,8 @@ static void rtl_cam_program_entry(struct ieee80211_hw *hw, u32 entry_no,
>> u32 target_content = 0;
>> u8 entry_i;
>>
>> - RT_TRACE(rtlpriv, COMP_SEC, DBG_LOUD,
>> - "key_cont_128:\n %x:%x:%x:%x:%x:%x\n",
>> - key_cont_128[0], key_cont_128[1],
>> - key_cont_128[2], key_cont_128[3],
>> - key_cont_128[4], key_cont_128[5]);
>> + RT_TRACE(rtlpriv, COMP_SEC, DBG_LOUD, "key_cont_128: %*ph\n",
>> + 6, key_cont_128);
>>
>> for (entry_i = 0; entry_i < CAM_CONTENT_COUNT; entry_i++) {
>> target_command = entry_i + CAM_CONTENT_COUNT * entry_no;
>> diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c b/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
>> index 86d73b3..932780d 100644
>> --- a/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
>> +++ b/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
>> @@ -1918,10 +1918,8 @@ static void rtl92ce_update_hal_rate_mask(struct ieee80211_hw *hw,
>> (ratr_index << 28);
>> rate_mask[4] = macid | (shortgi ? 0x20 : 0x00) | 0x80;
>> RT_TRACE(rtlpriv, COMP_RATR, DBG_DMESG,
>> - "Rate_index:%x, ratr_val:%x, %x:%x:%x:%x:%x\n",
>> - ratr_index, ratr_bitmap,
>> - rate_mask[0], rate_mask[1], rate_mask[2], rate_mask[3],
>> - rate_mask[4]);
>> + "Rate_index:%x, ratr_val:%x, %*phC\n",
>> + ratr_index, ratr_bitmap, 5, rate_mask);
>> rtl92c_fill_h2c_cmd(hw, H2C_RA_MASK, 5, rate_mask);
>>
>> if (macid != 0)
>> diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/hw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/hw.c
>> index 4bbb711..7554501 100644
>> --- a/drivers/net/wireless/rtlwifi/rtl8192cu/hw.c
>> +++ b/drivers/net/wireless/rtlwifi/rtl8192cu/hw.c
>> @@ -2169,10 +2169,8 @@ void rtl92cu_update_hal_rate_mask(struct ieee80211_hw *hw, u8 rssi_level)
>> ratr_index << 28);
>> rate_mask[4] = macid | (shortgi ? 0x20 : 0x00) | 0x80;
>> RT_TRACE(rtlpriv, COMP_RATR, DBG_DMESG,
>> - "Rate_index:%x, ratr_val:%x, %x:%x:%x:%x:%x\n",
>> - ratr_index, ratr_bitmap,
>> - rate_mask[0], rate_mask[1], rate_mask[2], rate_mask[3],
>> - rate_mask[4]);
>> + "Rate_index:%x, ratr_val:%x, %*phC\n",
>> + ratr_index, ratr_bitmap, 5, rate_mask);
>> rtl92c_fill_h2c_cmd(hw, H2C_RA_MASK, 5, rate_mask);
>> }
>>
>
> rate_mask uses:
>
> u32 ratr_bitmap = (u32) mac->basic_rates;
> ...
> u8 rate_mask[5];
> ...
> [sets ratr_bitmap as u32]
> ...
> *(u32 *)&rate_mask = ((ratr_bitmap & 0x0fffffff) |
> ratr_index << 28);
> ...
> rtl92c_fill_h2c_cmd(hw, H2C_RA_MASK, 5, rate_mask);
>
> Looks like a possible endian misuse to me.
After a second look at the code, I don't think this is an endian issue; however,
I am proposing the following changes to remove any doubt:
Index: wireless-testing-new/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
===================================================================
--- wireless-testing-new.orig/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
+++ wireless-testing-new/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
@@ -1964,8 +1965,9 @@ static void rtl92ce_update_hal_rate_mask
RT_TRACE(rtlpriv, COMP_RATR, DBG_DMESG,
"ratr_bitmap :%x\n", ratr_bitmap);
- *(u32 *)&rate_mask = (ratr_bitmap & 0x0fffffff) |
- (ratr_index << 28);
+ for (i = 0; i < 3; i++)
+ rate_mask[i] = ratr_bitmap & (0xff << (i * 4));
+ rate_mask[3] = (ratr_bitmap & 0x0f000000) | (ratr_index << 28);
rate_mask[4] = macid | (shortgi ? 0x20 : 0x00) | 0x80;
RT_TRACE(rtlpriv, COMP_RATR, DBG_DMESG,
"Rate_index:%x, ratr_val:%x, %*phC\n",
The downstream routine uses this info as an array of u8, thus it is OK. A proper
patch will be sent later. At least we avoid that ugly cast.
Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Netfilter lacks ability to filter packets via Application-origin
From: richard -rw- weinberger @ 2012-09-27 22:25 UTC (permalink / raw)
To: Chad Gray; +Cc: netdev@vger.kernel.org
In-Reply-To: <COL002-W881A56C786C9972B7654E3F3830@phx.gbl>
On Thu, Sep 27, 2012 at 11:04 PM, Chad Gray <chad938@hotmail.com> wrote:
> Users need the ability for Linux firewall to filter packets based on what
> Application they are originating from. This ability is present in Mac and
> Windows firewalls, but not Linux.
>
> For example, users would like ability to open Port 80 for Firefox, but keep
> Port 80 closed for other applications.
You can implement such filters using LSM like SELinux.
> This ability enhances Privacy & Security of the user but also helps to better
> inform the user about the comings and goings of internet traffic and what
> application/s are causing the traffic.
I seriously doubt that.
--
Thanks,
//richard
^ permalink raw reply
* Re: [PATCH] qlcnic - Fix scheduling while atomic bug
From: David Miller @ 2012-09-27 22:21 UTC (permalink / raw)
To: Narendra_K; +Cc: netdev, sony.chacko, jitendra.kalsaria, john.r.fastabend
In-Reply-To: <20120925175212.GA1324@fedora-17-guest.blr.amer.dell.com>
Is it really the case that not one Qlogic person feels like ACK'ing
this patch that fixes an obvious bug in their driver for 2 full days?
For real?
^ permalink raw reply
* Re: [PATCH linux-next] nf_defrag_ipv6: fix oops on module unloading
From: David Miller @ 2012-09-27 22:18 UTC (permalink / raw)
To: khlebnikov; +Cc: netdev, amwang
In-Reply-To: <20120925160750.30475.77562.stgit@zurg>
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
Date: Tue, 25 Sep 2012 20:07:50 +0400
> fix copy-paste error introduced in linux-next commit
> "ipv6: add a new namespace for nf_conntrack_reasm"
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH net-next] be2net: fix vfs enumeration
From: David Miller @ 2012-09-27 22:14 UTC (permalink / raw)
To: ivecera; +Cc: netdev, sathya.perla, ajit.khaparde
In-Reply-To: <1348577442-7720-1-git-send-email-ivecera@redhat.com>
From: Ivan Vecera <ivecera@redhat.com>
Date: Tue, 25 Sep 2012 14:50:42 +0200
> Current VFs enumeration algorithm used in be_find_vfs does not take domain
> number into the match. The match found in igb/ixgbe is more elegant and
> safe.
>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Applied, thanks Ivan.
^ permalink raw reply
* Re: [PATCH net-next 0/4] Tunnel related patches
From: David Miller @ 2012-09-27 22:13 UTC (permalink / raw)
To: shemminger; +Cc: netdev
In-Reply-To: <20120925041222.056704869@vyatta.com>
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 24 Sep 2012 21:12:22 -0700
> These are a set of small tunnel related patches.
Series applied, with the updated version of patch #4.
Thanks.
^ permalink raw reply
* Re: [PATCH 1/2] Fix build error caused by broken PCH_PTP module dependency.
From: David Miller @ 2012-09-27 22:09 UTC (permalink / raw)
To: haicheng.li; +Cc: netdev, tshimizu818, linux-kernel, haicheng.lee
In-Reply-To: <5060F97F.3040809@linux.intel.com>
From: Haicheng Li <haicheng.li@linux.intel.com>
Date: Tue, 25 Sep 2012 08:23:27 +0800
> From 898e3214b3406c620571cedf704719784b0df049 Mon Sep 17 00:00:00 2001
> From: Haicheng Li <haicheng.li@linux.intel.com>
> Date: Mon, 24 Sep 2012 15:52:30 +0800
> Subject: [PATCH 1/2] Fix build error caused by broken PCH_PTP module
> dependency.
>
> The .config is:
> CONFIG_PCH_GBE=y
> CONFIG_PCH_PTP=y
> CONFIG_PTP_1588_CLOCK=m
>
> The build error:
Your patch submissions are of a very low quality.
And the main reason is that you microscopically look at problems and
do not investigate how the same thing might be handled elsewhere.
Therefore you will never become accustomed to the proper way problems
are fixed, and the proper way to submit patches.
Look at how other people submit patches, do any other patch submissions
look like your's having all of this metadata in the message body:
> From 898e3214b3406c620571cedf704719784b0df049 Mon Sep 17 00:00:00 2001
> From: Haicheng Li <haicheng.li@linux.intel.com>
> Date: Mon, 24 Sep 2012 15:52:30 +0800
> Subject: [PATCH 1/2] Fix build error caused by broken PCH_PTP module
> dependency.
No, nobody else does this.
As for this specific patch:
> - depends on PTP_1588_CLOCK_PCH
> + depends on PTP_1588_CLOCK_PCH = PCH_GBE
This is not the correct way to ensure that the module'ness of one
config option meets the module'ness requirements of another.
The correct way is to say something like "&& (PCH_GBE || PCH_GBE=n)"
^ permalink raw reply
* Re: [PATCH 2/2] Fix a typo in PTP_1588_CLOCK_PCH Kconfig help info.
From: David Miller @ 2012-09-27 22:06 UTC (permalink / raw)
To: haicheng.li; +Cc: netdev, tshimizu818, linux-kernel, haicheng.lee
In-Reply-To: <5060F9C4.4050303@linux.intel.com>
From: Haicheng Li <haicheng.li@linux.intel.com>
Date: Tue, 25 Sep 2012 08:24:36 +0800
> From 5911413366d37aafcc19ddfc9c0f2db31855431e Mon Sep 17 00:00:00 2001
> From: Haicheng Li <haicheng.li@linux.intel.com>
> Date: Mon, 24 Sep 2012 15:55:27 +0800
> Subject: [PATCH 2/2] Fix a typo in PTP_1588_CLOCK_PCH Kconfig help
> info.
>
> Signed-off-by: Haicheng Li <haicheng.lee@gmail.com>
> ---
> drivers/ptp/Kconfig | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
> index ffdf712..82c4a26 100644
> --- a/drivers/ptp/Kconfig
> +++ b/drivers/ptp/Kconfig
> @@ -87,6 +87,6 @@ config PTP_1588_CLOCK_PCH
> SO_TIMESTAMPING API.
>
> To compile this driver as a module, choose M here: the module
> - will be called ptp_pch.
> + will be called by pch_ptp.
The original sentence is correct, it is stating the name of the module
that will be built not the module that will call it.
Rather, the "pch_ptp" is what might need to be adjusted.
^ permalink raw reply
* [PATCH net-next REPOST] IB/ipoib: Add more rtnl_link_ops callbacks
From: Or Gerlitz @ 2012-09-27 22:06 UTC (permalink / raw)
To: davem; +Cc: roland, netdev, kaber, Or Gerlitz
Add the rtnl_link_ops changelink and fill_info callbacks, through
which the admin can now set/get the driver mode, etc policies.
Maintain the proprietary sysfs entries only for legacy childs.
For child devices, set dev->iflink to point to the parent
device ifindex, such that user space tools can now correctly
show the uplink relation as done for vlan, macvlan, etc
devices. Pointed out by Patrick McHardy <kaber@trash.net>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
removed white space damage that went in by mistake, sorry for that
drivers/infiniband/ulp/ipoib/ipoib.h | 3 +
drivers/infiniband/ulp/ipoib/ipoib_cm.c | 34 ++++++++++----
drivers/infiniband/ulp/ipoib/ipoib_main.c | 16 +++++--
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 60 +++++++++++++++++++++++++-
drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 24 ++++++----
include/linux/if_link.h | 7 +++
6 files changed, 118 insertions(+), 26 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 381f51b..8956f6f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -520,6 +520,9 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
int __init ipoib_netlink_init(void);
void __exit ipoib_netlink_fini(void);
+void ipoib_set_umcast(struct net_device *ndev, int umcast_val);
+int ipoib_set_mode(struct net_device *dev, const char *buf);
+
void ipoib_setup(struct net_device *dev);
void ipoib_pkey_poll(struct work_struct *work);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 24683fd..ef6d5a3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1448,15 +1448,10 @@ static ssize_t show_mode(struct device *d, struct device_attribute *attr,
return sprintf(buf, "datagram\n");
}
-static ssize_t set_mode(struct device *d, struct device_attribute *attr,
- const char *buf, size_t count)
+int ipoib_set_mode(struct net_device *dev, const char *buf)
{
- struct net_device *dev = to_net_dev(d);
struct ipoib_dev_priv *priv = netdev_priv(dev);
- if (!rtnl_trylock())
- return restart_syscall();
-
/* flush paths if we switch modes so that connections are restarted */
if (IPOIB_CM_SUPPORTED(dev->dev_addr) && !strcmp(buf, "connected\n")) {
set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
@@ -1467,7 +1462,8 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM;
ipoib_flush_paths(dev);
- return count;
+ rtnl_lock();
+ return 0;
}
if (!strcmp(buf, "datagram\n")) {
@@ -1476,14 +1472,32 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
dev_set_mtu(dev, min(priv->mcast_mtu, dev->mtu));
rtnl_unlock();
ipoib_flush_paths(dev);
-
- return count;
+ rtnl_lock();
+ return 0;
}
- rtnl_unlock();
return -EINVAL;
}
+static ssize_t set_mode(struct device *d, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct net_device *dev = to_net_dev(d);
+ int ret;
+
+ if (!rtnl_trylock())
+ return restart_syscall();
+
+ ret = ipoib_set_mode(dev, buf);
+
+ rtnl_unlock();
+
+ if (!ret)
+ return count;
+
+ return ret;
+}
+
static DEVICE_ATTR(mode, S_IWUSR | S_IRUGO, show_mode, set_mode);
int ipoib_cm_add_mode_attr(struct net_device *dev)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b3e9709..6c5c771 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1386,12 +1386,9 @@ static ssize_t show_umcast(struct device *dev,
return sprintf(buf, "%d\n", test_bit(IPOIB_FLAG_UMCAST, &priv->flags));
}
-static ssize_t set_umcast(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
+void ipoib_set_umcast(struct net_device *ndev, int umcast_val)
{
- struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(dev));
- unsigned long umcast_val = simple_strtoul(buf, NULL, 0);
+ struct ipoib_dev_priv *priv = netdev_priv(ndev);
if (umcast_val > 0) {
set_bit(IPOIB_FLAG_UMCAST, &priv->flags);
@@ -1399,6 +1396,15 @@ static ssize_t set_umcast(struct device *dev,
"by userspace\n");
} else
clear_bit(IPOIB_FLAG_UMCAST, &priv->flags);
+}
+
+static ssize_t set_umcast(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned long umcast_val = simple_strtoul(buf, NULL, 0);
+
+ ipoib_set_umcast(to_net_dev(dev), umcast_val);
return count;
}
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index a7dc5ea..42f6756 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -37,8 +37,60 @@
static const struct nla_policy ipoib_policy[IFLA_IPOIB_MAX + 1] = {
[IFLA_IPOIB_PKEY] = { .type = NLA_U16 },
+ [IFLA_IPOIB_MODE] = { .type = NLA_U16 },
+ [IFLA_IPOIB_UMCAST] = { .type = NLA_U16 },
};
+static int ipoib_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+ struct ipoib_dev_priv *priv = netdev_priv(dev);
+ u16 val;
+
+ if (nla_put_u16(skb, IFLA_IPOIB_PKEY, priv->pkey))
+ goto nla_put_failure;
+
+ val = test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+ if (nla_put_u16(skb, IFLA_IPOIB_MODE, val))
+ goto nla_put_failure;
+
+ val = test_bit(IPOIB_FLAG_UMCAST, &priv->flags);
+ if (nla_put_u16(skb, IFLA_IPOIB_UMCAST, val))
+ goto nla_put_failure;
+
+ return 0;
+
+nla_put_failure:
+ return -EMSGSIZE;
+}
+
+static int ipoib_changelink(struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[])
+{
+ u16 mode, umcast;
+ int ret = 0;
+
+ if (data[IFLA_IPOIB_MODE]) {
+ mode = nla_get_u16(data[IFLA_IPOIB_MODE]);
+ if (mode == IPOIB_MODE_DATAGRAM)
+ ret = ipoib_set_mode(dev, "datagram\n");
+ else if (mode == IPOIB_MODE_CONNECTED)
+ ret = ipoib_set_mode(dev, "connected\n");
+ else
+ ret = -EINVAL;
+
+ if (ret < 0)
+ goto out_err;
+ }
+
+ if (data[IFLA_IPOIB_UMCAST]) {
+ umcast = nla_get_u16(data[IFLA_IPOIB_UMCAST]);
+ ipoib_set_umcast(dev, umcast);
+ }
+
+out_err:
+ return ret;
+}
+
static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[])
{
@@ -69,6 +121,8 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
err = __ipoib_vlan_add(ppriv, netdev_priv(dev), child_pkey, IPOIB_RTNL_CHILD);
+ if (!err && data)
+ err = ipoib_changelink(dev, tb, data);
return err;
}
@@ -87,7 +141,9 @@ static void ipoib_unregister_child_dev(struct net_device *dev, struct list_head
static size_t ipoib_get_size(const struct net_device *dev)
{
- return nla_total_size(2); /* IFLA_IPOIB_PKEY */
+ return nla_total_size(2) + /* IFLA_IPOIB_PKEY */
+ nla_total_size(2) + /* IFLA_IPOIB_MODE */
+ nla_total_size(2); /* IFLA_IPOIB_UMCAST */
}
static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
@@ -97,8 +153,10 @@ static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
.priv_size = sizeof(struct ipoib_dev_priv),
.setup = ipoib_setup,
.newlink = ipoib_new_child_link,
+ .changelink = ipoib_changelink,
.dellink = ipoib_unregister_child_dev,
.get_size = ipoib_get_size,
+ .fill_info = ipoib_fill_info,
};
int __init ipoib_netlink_init(void)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 238bbf9..8292554 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -88,17 +88,21 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
ipoib_create_debug_files(priv->dev);
- if (ipoib_cm_add_mode_attr(priv->dev))
- goto sysfs_failed;
- if (ipoib_add_pkey_attr(priv->dev))
- goto sysfs_failed;
- if (ipoib_add_umcast_attr(priv->dev))
- goto sysfs_failed;
-
- if (device_create_file(&priv->dev->dev, &dev_attr_parent))
- goto sysfs_failed;
+ /* RTNL childs don't need proprietary sysfs entries */
+ if (type == IPOIB_LEGACY_CHILD) {
+ if (ipoib_cm_add_mode_attr(priv->dev))
+ goto sysfs_failed;
+ if (ipoib_add_pkey_attr(priv->dev))
+ goto sysfs_failed;
+ if (ipoib_add_umcast_attr(priv->dev))
+ goto sysfs_failed;
+
+ if (device_create_file(&priv->dev->dev, &dev_attr_parent))
+ goto sysfs_failed;
+ }
- priv->child_type = type;
+ priv->child_type = type;
+ priv->dev->iflink = ppriv->dev->ifindex;
list_add_tail(&priv->list, &ppriv->child_intfs);
return 0;
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 24c0dd0..4491177 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -404,9 +404,16 @@ struct ifla_port_vsi {
enum {
IFLA_IPOIB_UNSPEC,
IFLA_IPOIB_PKEY,
+ IFLA_IPOIB_MODE,
+ IFLA_IPOIB_UMCAST,
__IFLA_IPOIB_MAX
};
+enum {
+ IPOIB_MODE_DATAGRAM = 0, /* using unreliable datagram QPs */
+ IPOIB_MODE_CONNECTED = 1, /* using connected QPs */
+};
+
#define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1)
#endif /* _LINUX_IF_LINK_H */
--
1.7.1
^ permalink raw reply related
* Re: [PATCH net-next] sparc: bpf_jit_comp: add XOR instruction for BPF JIT JIT
From: David Miller @ 2012-09-27 22:05 UTC (permalink / raw)
To: dxchgb; +Cc: netdev
In-Reply-To: <20120924215753.GA31312@thinkbox>
From: Daniel Borkmann <dxchgb@gmail.com>
Date: Mon, 24 Sep 2012 23:57:54 +0200
> This patch is a follow-up for patch "filter: add XOR instruction for use
> with X/K" that implements BPF SPARC JIT parts for the BPF XOR operation.
>
> Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Looks good, applied, thanks Daniel.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox