Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership
From: Dmitry Eremin-Solenikov @ 2012-07-14  6:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, David S. Miller, Dmitry Eremin-Solenikov,
	Alexander Smirnov

As the life flows, developers priorities shifts a bit. Reflect actual
changes in the maintainership of IEEE 802.15.4 code: Sergey mostly
stopped cared about this piece of code. Most of the work recently was
done by Alexander, so put him to the MAINTAINERS file to reflect his
status and to ease the life of respective patches.

Also add new net/mac802154/ directory to the list of maintained files.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 MAINTAINERS |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 150a29f..f03c703 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3403,13 +3403,14 @@ S:	Supported
 F:	drivers/idle/i7300_idle.c
 
 IEEE 802.15.4 SUBSYSTEM
+M:	Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
 M:	Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
-M:	Sergey Lapin <slapin@ossfans.org>
 L:	linux-zigbee-devel@lists.sourceforge.net (moderated for non-subscribers)
 W:	http://apps.sourceforge.net/trac/linux-zigbee
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan.git
 S:	Maintained
 F:	net/ieee802154/
+F:	net/mac802154/
 F:	drivers/ieee802154/
 
 IIO SUBSYSTEM AND DRIVERS
-- 
1.7.10.4

^ permalink raw reply related

* Re: pull request: wireless-next 2012-07-12
From: David Miller @ 2012-07-14  6:05 UTC (permalink / raw)
  To: linville-2XuSBdqkA4R54TAoqtyWWQ
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120712181539.GB25494-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Date: Thu, 12 Jul 2012 14:15:40 -0400

> Several drivers see updates: mwifiex, ath9k, iwlwifi, brcmsmac,
> wlcore/wl12xx/wl18xx, and a handful of others.  The bcma bus got a
> lot of attention from Hauke Mehrtens.  The cfg80211 component gets
> a flurry of patches for multi-channel support, and the mac80211
> component gets the first few VHT (11ac) and 60GHz (11ad) patches.
> This also includes the removal of the iwmc3200 drivers, since the
> hardware never became available to normal people.
> 
> Additionally, the NFC subsystem gets a series of updates.  According to
> Samuel, "Here are the interesting bits:
> 
> - A better error management for the HCI stack.
> - An LLCP "late" binding implementation for a better NFC SAP usage. SAPs are
>   now reserved only when there's a client for it.
> - Support for Sony RC-S360 (a.k.a. PaSoRi) pn533 based dongle. We can read and
>   write NFC tags and also establish a p2p link with this dongle now.
> - A few LLCP fixes."
> 
> Finally, this includes another pull of the fixes from the wireless
> tree in order to resolve some merge issues.

Pulled, thanks John.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 2/2] net: Add support for NTB virtual ethernet device
From: Jon Mason @ 2012-07-14  5:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <20120713170826.09210b80@nehalam.linuxnetplumber.net>

On Fri, Jul 13, 2012 at 05:08:26PM -0700, Stephen Hemminger wrote:
> On Fri, 13 Jul 2012 14:45:00 -0700
> Jon Mason <jon.mason@intel.com> wrote:
> 
> > A virtual ethernet device that uses the NTB transport API to send/receive data.
> > 
> > Signed-off-by: Jon Mason <jon.mason@intel.com>
> > ---
> >  drivers/net/Kconfig      |    4 +
> >  drivers/net/Makefile     |    1 +
> >  drivers/net/ntb_netdev.c |  411 ++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 416 insertions(+), 0 deletions(-)
> >  create mode 100644 drivers/net/ntb_netdev.c
> 
> 
> > +static void ntb_get_drvinfo(__attribute__((unused)) struct net_device *dev,
> > +			    struct ethtool_drvinfo *info)
> > +{
> > +	strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
> > +	strlcpy(info->version, NTB_NETDEV_VER, sizeof(info->version));
> > +}
> > +
> > +static const char ntb_nic_stats[][ETH_GSTRING_LEN] = {
> > +	"rx_packets", "rx_bytes", "rx_errors", "rx_dropped", "rx_length_errors",
> > +	"rx_frame_errors", "rx_fifo_errors",
> > +	"tx_packets", "tx_bytes", "tx_errors", "tx_dropped",
> > +};
> > +
> > +static int ntb_get_stats_count(__attribute__((unused)) struct net_device *dev)
> > +{
> > +	return ARRAY_SIZE(ntb_nic_stats);
> > +}
> > +
> > +static int ntb_get_sset_count(struct net_device *dev, int sset)
> > +{
> > +	switch (sset) {
> > +	case ETH_SS_STATS:
> > +		return ntb_get_stats_count(dev);
> > +	default:
> > +		return -EOPNOTSUPP;
> > +	}
> > +}
> > +
> > +static void ntb_get_strings(__attribute__((unused)) struct net_device *dev,
> > +			    u32 sset, u8 *data)
> > +{
> > +	switch (sset) {
> > +	case ETH_SS_STATS:
> > +		memcpy(data, *ntb_nic_stats, sizeof(ntb_nic_stats));
> > +	}
> > +}
> > +
> > +static void
> > +ntb_get_ethtool_stats(struct net_device *dev,
> > +		      __attribute__((unused)) struct ethtool_stats *stats,
> > +		      u64 *data)
> > +{
> > +	int i = 0;
> > +
> > +	data[i++] = dev->stats.rx_packets;
> > +	data[i++] = dev->stats.rx_bytes;
> > +	data[i++] = dev->stats.rx_errors;
> > +	data[i++] = dev->stats.rx_dropped;
> > +	data[i++] = dev->stats.rx_length_errors;
> > +	data[i++] = dev->stats.rx_frame_errors;
> > +	data[i++] = dev->stats.rx_fifo_errors;
> > +	data[i++] = dev->stats.tx_packets;
> > +	data[i++] = dev->stats.tx_bytes;
> > +	data[i++] = dev->stats.tx_errors;
> > +	data[i++] = dev->stats.tx_dropped;
> > +}
> 
> These statistics add no value over existing network stats.
> Don't implement ethtool stats unless device has something more
> interesting to say.

Fair enough

> 
> > +static const struct ethtool_ops ntb_ethtool_ops = {
> > +	.get_drvinfo = ntb_get_drvinfo,
> > +	.get_sset_count = ntb_get_sset_count,
> > +	.get_strings = ntb_get_strings,
> > +	.get_ethtool_stats = ntb_get_ethtool_stats,
> > +	.get_link = ethtool_op_get_link,
> > +};
> 
> If you want to implement bonding or bridging then implementing
> get_settings would help.

Will do.

> > +static int __init ntb_netdev_init_module(void)
> > +{
> > +	struct ntb_netdev *dev;
> > +	int rc;
> > +
> > +	pr_info("%s: Probe\n", KBUILD_MODNAME);
> 
> Useless message

True, will remove.

Thanks for the comments!
 
> > +	netdev = alloc_etherdev(sizeof(struct ntb_netdev));
> > +	if (!netdev)
> > +		return -ENOMEM;
> > +
> > +	dev = netdev_priv(netdev);
> > +	dev->ndev = netdev;
> > +	netdev->features = NETIF_F_HIGHDMA;
> > +
> > +	netdev->hw_features = netdev->features;
> > +	netdev->watchdog_timeo = msecs_to_jiffies(NTB_TX_TIMEOUT_MS);
> > +
> > +	random_ether_addr(netdev->perm_addr);
> > +	memcpy(netdev->dev_addr, netdev->perm_addr, netdev->addr_len);
> > +
> > +	netdev->netdev_ops = &ntb_netdev_ops;
> > +	SET_ETHTOOL_OPS(netdev, &ntb_ethtool_ops);
> > +
> > +	dev->qp = ntb_transport_create_queue(ntb_netdev_rx_handler,
> > +					     ntb_netdev_tx_handler,
> > +					     ntb_netdev_event_handler);
> > +	if (!dev->qp) {
> > +		rc = -EIO;
> > +		goto err;
> > +	}
> > +
> > +	netdev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
> > +
> > +	rc = register_netdev(netdev);
> > +	if (rc)
> > +		goto err1;
> > +
> > +	pr_info("%s: %s created\n", KBUILD_MODNAME, netdev->name);
> > +	return 0;
> > +
> > +err1:
> > +	ntb_transport_free_queue(dev->qp);
> > +err:
> > +	free_netdev(netdev);
> > +	return rc;
> > +}
> > +module_init(ntb_netdev_init_module);
> > +
> > +static void __exit ntb_netdev_exit_module(void)
> > +{
> > +	struct ntb_netdev *dev = netdev_priv(netdev);
> > +
> > +	unregister_netdev(netdev);
> > +	ntb_transport_free_queue(dev->qp);
> > +	free_netdev(netdev);
> > +
> > +	pr_info("%s: Driver removed\n", KBUILD_MODNAME);
> > +}
> > +module_exit(ntb_netdev_exit_module);
> 

^ permalink raw reply

* Re: [RFC 2/2] net: Add support for NTB virtual ethernet device
From: Jon Mason @ 2012-07-14  5:50 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <20120713231403.GA1712@minipsycho.orion>

On Sat, Jul 14, 2012 at 01:14:03AM +0200, Jiri Pirko wrote:
> Fri, Jul 13, 2012 at 11:45:00PM CEST, jon.mason@intel.com wrote:
> >A virtual ethernet device that uses the NTB transport API to send/receive data.
> >
> >Signed-off-by: Jon Mason <jon.mason@intel.com>
> >---
> > drivers/net/Kconfig      |    4 +
> > drivers/net/Makefile     |    1 +
> > drivers/net/ntb_netdev.c |  411 ++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 416 insertions(+), 0 deletions(-)
> > create mode 100644 drivers/net/ntb_netdev.c
> >
> >diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> >index 0c2bd80..9bf8a71 100644
> >--- a/drivers/net/Kconfig
> >+++ b/drivers/net/Kconfig
> >@@ -178,6 +178,10 @@ config NETPOLL_TRAP
> > config NET_POLL_CONTROLLER
> > 	def_bool NETPOLL
> > 
> >+config NTB_NETDEV
> >+	tristate "Virtual Ethernet over NTB"
> >+	depends on NTB
> >+
> > config RIONET
> > 	tristate "RapidIO Ethernet over messaging driver support"
> > 	depends on RAPIDIO
> >diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> >index 3d375ca..9890148 100644
> >--- a/drivers/net/Makefile
> >+++ b/drivers/net/Makefile
> >@@ -69,3 +69,4 @@ obj-$(CONFIG_USB_IPHETH)        += usb/
> > obj-$(CONFIG_USB_CDC_PHONET)   += usb/
> > 
> > obj-$(CONFIG_HYPERV_NET) += hyperv/
> >+obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o
> >diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
> >new file mode 100644
> >index 0000000..bcbd9d4
> >--- /dev/null
> >+++ b/drivers/net/ntb_netdev.c
> >@@ -0,0 +1,411 @@
> >+/*
> >+ * This file is provided under a dual BSD/GPLv2 license.  When using or
> >+ *   redistributing this file, you may do so under either license.
> >+ *
> >+ *   GPL LICENSE SUMMARY
> >+ *
> >+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
> >+ *
> >+ *   This program is free software; you can redistribute it and/or modify
> >+ *   it under the terms of version 2 of the GNU General Public License as
> >+ *   published by the Free Software Foundation.
> >+ *
> >+ *   This program is distributed in the hope that it will be useful, but
> >+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
> >+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >+ *   General Public License for more details.
> >+ *
> >+ *   You should have received a copy of the GNU General Public License
> >+ *   along with this program; if not, write to the Free Software
> >+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
> >+ *   The full GNU General Public License is included in this distribution
> >+ *   in the file called LICENSE.GPL.
> >+ *
> >+ *   BSD LICENSE
> >+ *
> >+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
> >+ *
> >+ *   Redistribution and use in source and binary forms, with or without
> >+ *   modification, are permitted provided that the following conditions
> >+ *   are met:
> >+ *
> >+ *     * Redistributions of source code must retain the above copyright
> >+ *       notice, this list of conditions and the following disclaimer.
> >+ *     * Redistributions in binary form must reproduce the above copy
> >+ *       notice, this list of conditions and the following disclaimer in
> >+ *       the documentation and/or other materials provided with the
> >+ *       distribution.
> >+ *     * Neither the name of Intel Corporation nor the names of its
> >+ *       contributors may be used to endorse or promote products derived
> >+ *       from this software without specific prior written permission.
> >+ *
> >+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> >+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> >+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> >+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> >+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> >+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> >+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> >+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> >+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> >+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> >+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> >+ *
> >+ * Intel PCIe NTB Network Linux driver
> >+ *
> >+ * Contact Information:
> >+ * Jon Mason <jon.mason@intel.com>
> >+ */
> >+#include <linux/etherdevice.h>
> >+#include <linux/ethtool.h>
> >+#include <linux/module.h>
> >+#include <linux/ntb.h>
> >+
> >+#define NTB_NETDEV_VER	"0.4"
> 
> Is it really necessary to provide this in-file versioning? Doesn't
> kernel version itself do the trick?

Not necessarily.  This may be distributed as a package outside of the kernel and the version is useful for debug.

> 
> >+
> >+MODULE_DESCRIPTION(KBUILD_MODNAME);
> >+MODULE_VERSION(NTB_NETDEV_VER);
> >+MODULE_LICENSE("Dual BSD/GPL");
> >+MODULE_AUTHOR("Intel Corporation");
> >+
> >+struct ntb_netdev {
> >+	struct net_device *ndev;
> >+	struct ntb_transport_qp *qp;
> >+};
> >+
> >+#define	NTB_TX_TIMEOUT_MS	1000
> >+#define	NTB_RXQ_SIZE		100
> >+
> >+static struct net_device *netdev;
> >+
> >+static void ntb_netdev_event_handler(int status)
> >+{
> >+	struct ntb_netdev *dev = netdev_priv(netdev);
> >+
> >+	pr_debug("%s: Event %x, Link %x\n", KBUILD_MODNAME, status,
> >+		 ntb_transport_link_query(dev->qp));
> >+
> >+	/* Currently, only link status event is supported */
> >+	if (status)
> >+		netif_carrier_on(netdev);
> >+	else
> >+		netif_carrier_off(netdev);
> >+}
> >+
> >+static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp)
> >+{
> >+	struct net_device *ndev = netdev;
> >+	struct sk_buff *skb;
> >+	int len, rc;
> >+
> >+	while ((skb = ntb_transport_rx_dequeue(qp, &len))) {
> >+		pr_debug("%s: %d byte payload received\n", __func__, len);
> >+
> >+		skb_put(skb, len);
> >+		skb->protocol = eth_type_trans(skb, ndev);
> >+		skb->ip_summed = CHECKSUM_NONE;
> >+
> >+		if (netif_rx(skb) == NET_RX_DROP) {
> >+			ndev->stats.rx_errors++;
> >+			ndev->stats.rx_dropped++;
> >+		} else {
> >+			ndev->stats.rx_packets++;
> >+			ndev->stats.rx_bytes += len;
> >+		}
> >+
> >+		skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
> >+		if (!skb) {
> >+			ndev->stats.rx_errors++;
> >+			ndev->stats.rx_frame_errors++;
> >+			pr_err("%s: No skb\n", __func__);
> >+			break;
> >+		}
> >+
> >+		rc = ntb_transport_rx_enqueue(qp, skb, skb->data,
> >+					      ndev->mtu + ETH_HLEN);
> >+		if (rc) {
> >+			ndev->stats.rx_errors++;
> >+			ndev->stats.rx_fifo_errors++;
> >+			pr_err("%s: error re-enqueuing\n", __func__);
> >+			break;
> >+		}
> >+	}
> >+}
> >+
> >+static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp)
> >+{
> >+	struct net_device *ndev = netdev;
> >+	struct sk_buff *skb;
> >+	int len;
> >+
> >+	while ((skb = ntb_transport_tx_dequeue(qp, &len))) {
> >+		ndev->stats.tx_packets++;
> >+		ndev->stats.tx_bytes += skb->len;
> >+		dev_kfree_skb(skb);
> >+	}
> >+
> >+	if (netif_queue_stopped(ndev))
> >+		netif_wake_queue(ndev);
> >+}
> >+
> >+static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
> >+					 struct net_device *ndev)
> >+{
> >+	struct ntb_netdev *dev = netdev_priv(ndev);
> >+	int rc;
> >+
> >+	pr_debug("%s: ntb_transport_tx_enqueue\n", KBUILD_MODNAME);
> >+
> >+	rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
> >+	if (rc)
> >+		goto err;
> >+
> >+	return NETDEV_TX_OK;
> >+
> >+err:
> >+	ndev->stats.tx_dropped++;
> >+	ndev->stats.tx_errors++;
> >+	netif_stop_queue(ndev);
> >+	return NETDEV_TX_BUSY;
> >+}
> >+
> >+static int ntb_netdev_open(struct net_device *ndev)
> >+{
> >+	struct ntb_netdev *dev = netdev_priv(ndev);
> >+	struct sk_buff *skb;
> >+	int rc, i, len;
> >+
> >+	/* Add some empty rx bufs */
> >+	for (i = 0; i < NTB_RXQ_SIZE; i++) {
> >+		skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
> >+		if (!skb) {
> >+			rc = -ENOMEM;
> >+			goto err;
> >+		}
> >+
> >+		rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
> >+					      ndev->mtu + ETH_HLEN);
> >+		if (rc == -EINVAL)
> >+			goto err;
> >+	}
> >+
> >+	netif_carrier_off(ndev);
> >+	ntb_transport_link_up(dev->qp);
> >+
> >+	return 0;
> >+
> >+err:
> >+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
> >+		kfree(skb);
> >+	return rc;
> >+}
> >+
> >+static int ntb_netdev_close(struct net_device *ndev)
> >+{
> >+	struct ntb_netdev *dev = netdev_priv(ndev);
> >+	struct sk_buff *skb;
> >+	int len;
> >+
> >+	ntb_transport_link_down(dev->qp);
> >+
> >+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
> >+		kfree(skb);
> >+
> >+	return 0;
> >+}
> >+
> >+static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
> >+{
> >+	struct ntb_netdev *dev = netdev_priv(ndev);
> >+	struct sk_buff *skb;
> >+	int len, rc;
> >+
> >+	if (new_mtu > ntb_transport_max_size(dev->qp) - ETH_HLEN)
> >+		return -EINVAL;
> >+
> >+	if (!netif_running(ndev)) {
> >+		ndev->mtu = new_mtu;
> >+		return 0;
> >+	}
> >+
> >+	/* Bring down the link and dispose of posted rx entries */
> >+	ntb_transport_link_down(dev->qp);
> >+
> >+	if (ndev->mtu < new_mtu) {
> >+		int i;
> >+
> >+		for (i = 0; (skb = ntb_transport_rx_remove(dev->qp, &len)); i++)
> >+			kfree(skb);
> >+
> >+		for (; i; i--) {
> >+			skb = netdev_alloc_skb(ndev, new_mtu + ETH_HLEN);
> >+			if (!skb) {
> >+				rc = -ENOMEM;
> >+				goto err;
> >+			}
> >+
> >+			rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
> >+						      new_mtu + ETH_HLEN);
> >+			if (rc) {
> >+				kfree(skb);
> >+				goto err;
> >+			}
> >+		}
> >+	}
> >+
> >+	ndev->mtu = new_mtu;
> >+
> >+	ntb_transport_link_up(dev->qp);
> >+
> >+	return 0;
> >+
> >+err:
> >+	ntb_transport_link_down(dev->qp);
> >+
> >+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
> >+		kfree(skb);
> >+
> >+	pr_err("Error changing MTU, device inoperable\n");
> 
> Would be maybe better to use netdev_err here (and on similar other
> places)

Good point

> 
> Also, it might be good to provide rollback in case any of
> netdev_alloc_skb() fails.
> 
> >+	return rc;
> >+}
> >+
> >+static void ntb_netdev_tx_timeout(struct net_device *ndev)
> >+{
> >+	if (netif_running(ndev))
> >+		netif_wake_queue(ndev);
> >+}
> >+
> >+static const struct net_device_ops ntb_netdev_ops = {
> >+	.ndo_open = ntb_netdev_open,
> >+	.ndo_stop = ntb_netdev_close,
> >+	.ndo_start_xmit = ntb_netdev_start_xmit,
> >+	.ndo_change_mtu = ntb_netdev_change_mtu,
> >+	.ndo_tx_timeout = ntb_netdev_tx_timeout,
> >+	.ndo_set_mac_address = eth_mac_addr,
> 
> Does your device support mac change while it's up and running?

It's virtual ethernet, so there is no hardware limitation, only what is acceptable for the remote side to receive.

> 
> >+};
> >+
> >+static void ntb_get_drvinfo(__attribute__((unused)) struct net_device *dev,
> >+			    struct ethtool_drvinfo *info)
> >+{
> >+	strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
> >+	strlcpy(info->version, NTB_NETDEV_VER, sizeof(info->version));
> >+}
> >+
> >+static const char ntb_nic_stats[][ETH_GSTRING_LEN] = {
> >+	"rx_packets", "rx_bytes", "rx_errors", "rx_dropped", "rx_length_errors",
> >+	"rx_frame_errors", "rx_fifo_errors",
> >+	"tx_packets", "tx_bytes", "tx_errors", "tx_dropped",
> >+};
> >+
> >+static int ntb_get_stats_count(__attribute__((unused)) struct net_device *dev)
> >+{
> >+	return ARRAY_SIZE(ntb_nic_stats);
> >+}
> >+
> >+static int ntb_get_sset_count(struct net_device *dev, int sset)
> >+{
> >+	switch (sset) {
> >+	case ETH_SS_STATS:
> >+		return ntb_get_stats_count(dev);
> >+	default:
> >+		return -EOPNOTSUPP;
> >+	}
> >+}
> >+
> >+static void ntb_get_strings(__attribute__((unused)) struct net_device *dev,
> >+			    u32 sset, u8 *data)
> >+{
> >+	switch (sset) {
> >+	case ETH_SS_STATS:
> >+		memcpy(data, *ntb_nic_stats, sizeof(ntb_nic_stats));
> >+	}
> >+}
> >+
> >+static void
> >+ntb_get_ethtool_stats(struct net_device *dev,
> >+		      __attribute__((unused)) struct ethtool_stats *stats,
> >+		      u64 *data)
> >+{
> >+	int i = 0;
> >+
> >+	data[i++] = dev->stats.rx_packets;
> >+	data[i++] = dev->stats.rx_bytes;
> >+	data[i++] = dev->stats.rx_errors;
> >+	data[i++] = dev->stats.rx_dropped;
> >+	data[i++] = dev->stats.rx_length_errors;
> >+	data[i++] = dev->stats.rx_frame_errors;
> >+	data[i++] = dev->stats.rx_fifo_errors;
> >+	data[i++] = dev->stats.tx_packets;
> >+	data[i++] = dev->stats.tx_bytes;
> >+	data[i++] = dev->stats.tx_errors;
> >+	data[i++] = dev->stats.tx_dropped;
> >+}
> >+
> >+static const struct ethtool_ops ntb_ethtool_ops = {
> >+	.get_drvinfo = ntb_get_drvinfo,
> >+	.get_sset_count = ntb_get_sset_count,
> >+	.get_strings = ntb_get_strings,
> >+	.get_ethtool_stats = ntb_get_ethtool_stats,
> >+	.get_link = ethtool_op_get_link,
> >+};
> >+
> >+static int __init ntb_netdev_init_module(void)
> >+{
> >+	struct ntb_netdev *dev;
> >+	int rc;
> >+
> >+	pr_info("%s: Probe\n", KBUILD_MODNAME);
> >+
> >+	netdev = alloc_etherdev(sizeof(struct ntb_netdev));
> 
> I might be missing something but this place (module init) does not seems
> like a good place to do alloc_etherdev(). Do you want to support only
> one netdevice instance?
> 
> Anyway, I think that using "static netdev" should be avoided in any case.
> 

It would fail the probe if there is no underlying ntb hardware, but it would make sense to check for that before allocing the etherdev.

Thanks for the comments!

 
> >+	if (!netdev)
> >+		return -ENOMEM;
> >+
> >+	dev = netdev_priv(netdev);
> >+	dev->ndev = netdev;
> >+	netdev->features = NETIF_F_HIGHDMA;
> >+
> >+	netdev->hw_features = netdev->features;
> >+	netdev->watchdog_timeo = msecs_to_jiffies(NTB_TX_TIMEOUT_MS);
> >+
> >+	random_ether_addr(netdev->perm_addr);
> >+	memcpy(netdev->dev_addr, netdev->perm_addr, netdev->addr_len);
> >+
> >+	netdev->netdev_ops = &ntb_netdev_ops;
> >+	SET_ETHTOOL_OPS(netdev, &ntb_ethtool_ops);
> >+
> >+	dev->qp = ntb_transport_create_queue(ntb_netdev_rx_handler,
> >+					     ntb_netdev_tx_handler,
> >+					     ntb_netdev_event_handler);
> >+	if (!dev->qp) {
> >+		rc = -EIO;
> >+		goto err;
> >+	}
> >+
> >+	netdev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
> >+
> >+	rc = register_netdev(netdev);
> >+	if (rc)
> >+		goto err1;
> >+
> >+	pr_info("%s: %s created\n", KBUILD_MODNAME, netdev->name);
> >+	return 0;
> >+
> >+err1:
> >+	ntb_transport_free_queue(dev->qp);
> >+err:
> >+	free_netdev(netdev);
> >+	return rc;
> >+}
> >+module_init(ntb_netdev_init_module);
> >+
> >+static void __exit ntb_netdev_exit_module(void)
> >+{
> >+	struct ntb_netdev *dev = netdev_priv(netdev);
> >+
> >+	unregister_netdev(netdev);
> >+	ntb_transport_free_queue(dev->qp);
> >+	free_netdev(netdev);
> >+
> >+	pr_info("%s: Driver removed\n", KBUILD_MODNAME);
> >+}
> >+module_exit(ntb_netdev_exit_module);
> >-- 
> >1.7.5.4
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] mac802154: fix sparse warning for mac802154_slave_get_priv
From: Alexander Smirnov @ 2012-07-14  3:42 UTC (permalink / raw)
  To: Silviu-Mihai Popescu
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net, Silviu-Mihai Popescu
In-Reply-To: <1342211770-4219-1-git-send-email-silviupopescu1990@gmail.com>


> Make sparse happy by fixing the following error:
>    * symbol 'mac802154_slave_get_priv' was not declared. Should it be static?
> 
> Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
> ---
> net/mac802154/mib.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 

Should be already fixed, please try the latest net-next tree.

^ permalink raw reply

* Re: resurrecting tcphealth
From: valdis.kletnieks @ 2012-07-14  1:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Piotr Sawuk, netdev, linux-kernel
In-Reply-To: <20120713165544.6767ea8f@nehalam.linuxnetplumber.net>

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Fri, 13 Jul 2012 16:55:44 -0700, Stephen Hemminger said:

> >+			/* Course retransmit inefficiency- this packet has been received twice. */
> >+			tp->dup_pkts_recv++;
>
> I don't understand that comment, could you use a better sentence please?

I think what was intended was:

/* Curse you, retransmit inefficiency! This packet has been received at least twice */

[-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --]

^ permalink raw reply

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Stephen Hemminger @ 2012-07-14  0:13 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <1342215900-3358-1-git-send-email-jon.mason@intel.com>

On Fri, 13 Jul 2012 14:44:59 -0700
Jon Mason <jon.mason@intel.com> wrote:

> A PCI-Express non-transparent bridge (NTB) is a point-to-point PCIe bus
> connecting 2 systems, providing electrical isolation between the two subsystems.
> A non-transparent bridge is functionally similar to a transparent bridge except
> that both sides of the bridge have their own independent address domains.  The
> host on one side of the bridge will not have the visibility of the complete
> memory or I/O space on the other side of the bridge.  To communicate across the
> non-transparent bridge, each NTB endpoint has one (or more) apertures exposed to
> the local system.  Writes to these apertures are mirrored to memory on the
> remote system.  Communications can also occur through the use of doorbell
> registers that initiate interrupts to the alternate domain, and scratch-pad
> registers accessible from both sides.
> 
> The NTB device driver is needed to configure these memory windows, doorbell, and
> scratch-pad registers as well as use them in such a way as they can be turned
> into a viable communication channel to the remote system.  ntb_hw.[ch]
> determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
> the underlying hardware to provide access and a common interface to the doorbell
> registers, scratch pads, and memory windows.  These hardware interfaces are
> exported so that other, non-mainlined kernel drivers can access these.
> ntb_transport.[ch] also uses the exported interfaces in ntb_hw.[ch] to setup a
> communication channel(s) and provide a reliable way of transferring data from
> one side to the other, which it then exports so that "client" drivers can access
> them.  These client drivers are used to provide a standard kernel interface
> (i.e., Ethernet device) to NTB, such that Linux can transfer data from one
> system to the other in a standard way.
> 
> Signed-off-by: Jon Mason <jon.mason@intel.com>

This driver does some reimplementing of standard type operations is this
because you are trying to use the same code on multiple platforms?

Example:
+
+static void ntb_list_add_head(spinlock_t *lock, struct list_head *entry,
+			      struct list_head *list)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(lock, flags);
+	list_add(entry, list);
+	spin_unlock_irqrestore(lock, flags);
+}
+
+static void ntb_list_add_tail(spinlock_t *lock, struct list_head *entry,
+			      struct list_head *list)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(lock, flags);
+	list_add_tail(entry, list);
+	spin_unlock_irqrestore(lock, flags);
+}

Which are used on skb's and yet we already have sk_buff_head with locking?

I know you probably are committed to this API, but is there some way to
reuse existing shared memory used by virtio-net between two ports?

^ permalink raw reply

* Re: [RFC 2/2] net: Add support for NTB virtual ethernet device
From: Stephen Hemminger @ 2012-07-14  0:08 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <1342215900-3358-2-git-send-email-jon.mason@intel.com>

On Fri, 13 Jul 2012 14:45:00 -0700
Jon Mason <jon.mason@intel.com> wrote:

> A virtual ethernet device that uses the NTB transport API to send/receive data.
> 
> Signed-off-by: Jon Mason <jon.mason@intel.com>
> ---
>  drivers/net/Kconfig      |    4 +
>  drivers/net/Makefile     |    1 +
>  drivers/net/ntb_netdev.c |  411 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 416 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/net/ntb_netdev.c


> +static void ntb_get_drvinfo(__attribute__((unused)) struct net_device *dev,
> +			    struct ethtool_drvinfo *info)
> +{
> +	strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
> +	strlcpy(info->version, NTB_NETDEV_VER, sizeof(info->version));
> +}
> +
> +static const char ntb_nic_stats[][ETH_GSTRING_LEN] = {
> +	"rx_packets", "rx_bytes", "rx_errors", "rx_dropped", "rx_length_errors",
> +	"rx_frame_errors", "rx_fifo_errors",
> +	"tx_packets", "tx_bytes", "tx_errors", "tx_dropped",
> +};
> +
> +static int ntb_get_stats_count(__attribute__((unused)) struct net_device *dev)
> +{
> +	return ARRAY_SIZE(ntb_nic_stats);
> +}
> +
> +static int ntb_get_sset_count(struct net_device *dev, int sset)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:
> +		return ntb_get_stats_count(dev);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +static void ntb_get_strings(__attribute__((unused)) struct net_device *dev,
> +			    u32 sset, u8 *data)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:
> +		memcpy(data, *ntb_nic_stats, sizeof(ntb_nic_stats));
> +	}
> +}
> +
> +static void
> +ntb_get_ethtool_stats(struct net_device *dev,
> +		      __attribute__((unused)) struct ethtool_stats *stats,
> +		      u64 *data)
> +{
> +	int i = 0;
> +
> +	data[i++] = dev->stats.rx_packets;
> +	data[i++] = dev->stats.rx_bytes;
> +	data[i++] = dev->stats.rx_errors;
> +	data[i++] = dev->stats.rx_dropped;
> +	data[i++] = dev->stats.rx_length_errors;
> +	data[i++] = dev->stats.rx_frame_errors;
> +	data[i++] = dev->stats.rx_fifo_errors;
> +	data[i++] = dev->stats.tx_packets;
> +	data[i++] = dev->stats.tx_bytes;
> +	data[i++] = dev->stats.tx_errors;
> +	data[i++] = dev->stats.tx_dropped;
> +}

These statistics add no value over existing network stats.
Don't implement ethtool stats unless device has something more
interesting to say.

> +static const struct ethtool_ops ntb_ethtool_ops = {
> +	.get_drvinfo = ntb_get_drvinfo,
> +	.get_sset_count = ntb_get_sset_count,
> +	.get_strings = ntb_get_strings,
> +	.get_ethtool_stats = ntb_get_ethtool_stats,
> +	.get_link = ethtool_op_get_link,
> +};

If you want to implement bonding or bridging then implementing
get_settings would help.

> +static int __init ntb_netdev_init_module(void)
> +{
> +	struct ntb_netdev *dev;
> +	int rc;
> +
> +	pr_info("%s: Probe\n", KBUILD_MODNAME);

Useless message

> +	netdev = alloc_etherdev(sizeof(struct ntb_netdev));
> +	if (!netdev)
> +		return -ENOMEM;
> +
> +	dev = netdev_priv(netdev);
> +	dev->ndev = netdev;
> +	netdev->features = NETIF_F_HIGHDMA;
> +
> +	netdev->hw_features = netdev->features;
> +	netdev->watchdog_timeo = msecs_to_jiffies(NTB_TX_TIMEOUT_MS);
> +
> +	random_ether_addr(netdev->perm_addr);
> +	memcpy(netdev->dev_addr, netdev->perm_addr, netdev->addr_len);
> +
> +	netdev->netdev_ops = &ntb_netdev_ops;
> +	SET_ETHTOOL_OPS(netdev, &ntb_ethtool_ops);
> +
> +	dev->qp = ntb_transport_create_queue(ntb_netdev_rx_handler,
> +					     ntb_netdev_tx_handler,
> +					     ntb_netdev_event_handler);
> +	if (!dev->qp) {
> +		rc = -EIO;
> +		goto err;
> +	}
> +
> +	netdev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
> +
> +	rc = register_netdev(netdev);
> +	if (rc)
> +		goto err1;
> +
> +	pr_info("%s: %s created\n", KBUILD_MODNAME, netdev->name);
> +	return 0;
> +
> +err1:
> +	ntb_transport_free_queue(dev->qp);
> +err:
> +	free_netdev(netdev);
> +	return rc;
> +}
> +module_init(ntb_netdev_init_module);
> +
> +static void __exit ntb_netdev_exit_module(void)
> +{
> +	struct ntb_netdev *dev = netdev_priv(netdev);
> +
> +	unregister_netdev(netdev);
> +	ntb_transport_free_queue(dev->qp);
> +	free_netdev(netdev);
> +
> +	pr_info("%s: Driver removed\n", KBUILD_MODNAME);
> +}
> +module_exit(ntb_netdev_exit_module);

^ permalink raw reply

* [PATCH net-next v2 3/8] tipc: use standard printk shortcut macros (pr_err etc.)
From: Paul Gortmaker @ 2012-07-13 23:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, joe, Erik Hugne, Jon Maloy, Paul Gortmaker
In-Reply-To: <1342111201-9426-4-git-send-email-paul.gortmaker@windriver.com>

From: Erik Hugne <erik.hugne@ericsson.com>

All messages should go directly to the kernel log.  The TIPC
specific error, warning, info and debug trace macro's are
removed and all references replaced with pr_err, pr_warn,
pr_info and pr_debug.

Commonly used sub-strings are explicitly declared as a const
char to reduce .text size.

Note that this means the debug messages (changed to pr_debug),
are now enabled through dynamic debugging, instead of a TIPC
specific Kconfig option (TIPC_DEBUG).  The latter will be
phased out completely

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: use pr_fmt as suggested by Joe Perches <joe@perches.com>]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---

[just resending the 3/8 patch, since the others are unchanged, aside
 from minimal trivial context refresh.]

 net/tipc/bcast.c       |    2 +-
 net/tipc/bearer.c      |   52 ++++++++++++----------
 net/tipc/config.c      |    6 +--
 net/tipc/core.c        |   13 +++---
 net/tipc/core.h        |   12 +----
 net/tipc/discover.c    |    4 +-
 net/tipc/handler.c     |    4 +-
 net/tipc/link.c        |  116 +++++++++++++++++++++++++-----------------------
 net/tipc/name_distr.c  |   25 ++++++-----
 net/tipc/name_table.c  |   40 ++++++++---------
 net/tipc/net.c         |    8 ++--
 net/tipc/netlink.c     |    2 +-
 net/tipc/node.c        |   22 ++++-----
 net/tipc/node_subscr.c |    3 +-
 net/tipc/port.c        |    8 ++--
 net/tipc/ref.c         |   10 ++---
 net/tipc/socket.c      |   10 ++---
 net/tipc/subscr.c      |   14 +++---
 18 files changed, 177 insertions(+), 174 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index d9df34f..fef3689 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -880,7 +880,7 @@ void tipc_port_list_add(struct tipc_port_list *pl_ptr, u32 port)
 		if (!item->next) {
 			item->next = kmalloc(sizeof(*item), GFP_ATOMIC);
 			if (!item->next) {
-				warn("Incomplete multicast delivery, no memory\n");
+				pr_warn("Incomplete multicast delivery, no memory\n");
 				return;
 			}
 			item->next->next = NULL;
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 86b703f..1840e1f 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -123,7 +123,7 @@ int tipc_register_media(struct tipc_media *m_ptr)
 exit:
 	write_unlock_bh(&tipc_net_lock);
 	if (res)
-		warn("Media <%s> registration error\n", m_ptr->name);
+		pr_warn("Media <%s> registration error\n", m_ptr->name);
 	return res;
 }
 
@@ -418,12 +418,12 @@ int tipc_enable_bearer(const char *name, u32 disc_domain, u32 priority)
 	int res = -EINVAL;
 
 	if (!tipc_own_addr) {
-		warn("Bearer <%s> rejected, not supported in standalone mode\n",
-		     name);
+		pr_warn("Bearer <%s> rejected, not supported in standalone mode\n",
+			name);
 		return -ENOPROTOOPT;
 	}
 	if (!bearer_name_validate(name, &b_names)) {
-		warn("Bearer <%s> rejected, illegal name\n", name);
+		pr_warn("Bearer <%s> rejected, illegal name\n", name);
 		return -EINVAL;
 	}
 	if (tipc_addr_domain_valid(disc_domain) &&
@@ -435,12 +435,13 @@ int tipc_enable_bearer(const char *name, u32 disc_domain, u32 priority)
 			res = 0;   /* accept specified node in own cluster */
 	}
 	if (res) {
-		warn("Bearer <%s> rejected, illegal discovery domain\n", name);
+		pr_warn("Bearer <%s> rejected, illegal discovery domain\n",
+			name);
 		return -EINVAL;
 	}
 	if ((priority > TIPC_MAX_LINK_PRI) &&
 	    (priority != TIPC_MEDIA_LINK_PRI)) {
-		warn("Bearer <%s> rejected, illegal priority\n", name);
+		pr_warn("Bearer <%s> rejected, illegal priority\n", name);
 		return -EINVAL;
 	}
 
@@ -448,8 +449,8 @@ int tipc_enable_bearer(const char *name, u32 disc_domain, u32 priority)
 
 	m_ptr = tipc_media_find(b_names.media_name);
 	if (!m_ptr) {
-		warn("Bearer <%s> rejected, media <%s> not registered\n", name,
-		     b_names.media_name);
+		pr_warn("Bearer <%s> rejected, media <%s> not registered\n",
+			name, b_names.media_name);
 		goto exit;
 	}
 
@@ -465,24 +466,25 @@ restart:
 			continue;
 		}
 		if (!strcmp(name, tipc_bearers[i].name)) {
-			warn("Bearer <%s> rejected, already enabled\n", name);
+			pr_warn("Bearer <%s> rejected, already enabled\n",
+				name);
 			goto exit;
 		}
 		if ((tipc_bearers[i].priority == priority) &&
 		    (++with_this_prio > 2)) {
 			if (priority-- == 0) {
-				warn("Bearer <%s> rejected, duplicate priority\n",
-				     name);
+				pr_warn("Bearer <%s> rejected, duplicate priority\n",
+					name);
 				goto exit;
 			}
-			warn("Bearer <%s> priority adjustment required %u->%u\n",
-			     name, priority + 1, priority);
+			pr_warn("Bearer <%s> priority adjustment required %u->%u\n",
+				name, priority + 1, priority);
 			goto restart;
 		}
 	}
 	if (bearer_id >= MAX_BEARERS) {
-		warn("Bearer <%s> rejected, bearer limit reached (%u)\n",
-		     name, MAX_BEARERS);
+		pr_warn("Bearer <%s> rejected, bearer limit reached (%u)\n",
+			name, MAX_BEARERS);
 		goto exit;
 	}
 
@@ -490,7 +492,8 @@ restart:
 	strcpy(b_ptr->name, name);
 	res = m_ptr->enable_bearer(b_ptr);
 	if (res) {
-		warn("Bearer <%s> rejected, enable failure (%d)\n", name, -res);
+		pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
+			name, -res);
 		goto exit;
 	}
 
@@ -508,12 +511,13 @@ restart:
 	res = tipc_disc_create(b_ptr, &m_ptr->bcast_addr, disc_domain);
 	if (res) {
 		bearer_disable(b_ptr);
-		warn("Bearer <%s> rejected, discovery object creation failed\n",
-		     name);
+		pr_warn("Bearer <%s> rejected, discovery object creation failed\n",
+			name);
 		goto exit;
 	}
-	info("Enabled bearer <%s>, discovery domain %s, priority %u\n",
-	     name, tipc_addr_string_fill(addr_string, disc_domain), priority);
+	pr_info("Enabled bearer <%s>, discovery domain %s, priority %u\n",
+		name,
+		tipc_addr_string_fill(addr_string, disc_domain), priority);
 exit:
 	write_unlock_bh(&tipc_net_lock);
 	return res;
@@ -531,12 +535,12 @@ int tipc_block_bearer(const char *name)
 	read_lock_bh(&tipc_net_lock);
 	b_ptr = tipc_bearer_find(name);
 	if (!b_ptr) {
-		warn("Attempt to block unknown bearer <%s>\n", name);
+		pr_warn("Attempt to block unknown bearer <%s>\n", name);
 		read_unlock_bh(&tipc_net_lock);
 		return -EINVAL;
 	}
 
-	info("Blocking bearer <%s>\n", name);
+	pr_info("Blocking bearer <%s>\n", name);
 	spin_lock_bh(&b_ptr->lock);
 	b_ptr->blocked = 1;
 	list_splice_init(&b_ptr->cong_links, &b_ptr->links);
@@ -562,7 +566,7 @@ static void bearer_disable(struct tipc_bearer *b_ptr)
 	struct tipc_link *l_ptr;
 	struct tipc_link *temp_l_ptr;
 
-	info("Disabling bearer <%s>\n", b_ptr->name);
+	pr_info("Disabling bearer <%s>\n", b_ptr->name);
 	spin_lock_bh(&b_ptr->lock);
 	b_ptr->blocked = 1;
 	b_ptr->media->disable_bearer(b_ptr);
@@ -584,7 +588,7 @@ int tipc_disable_bearer(const char *name)
 	write_lock_bh(&tipc_net_lock);
 	b_ptr = tipc_bearer_find(name);
 	if (b_ptr == NULL) {
-		warn("Attempt to disable unknown bearer <%s>\n", name);
+		pr_warn("Attempt to disable unknown bearer <%s>\n", name);
 		res = -EINVAL;
 	} else {
 		bearer_disable(b_ptr);
diff --git a/net/tipc/config.c b/net/tipc/config.c
index c5712a3..7978fdd 100644
--- a/net/tipc/config.c
+++ b/net/tipc/config.c
@@ -432,7 +432,7 @@ static void cfg_named_msg_event(void *userdata,
 	if ((size < sizeof(*req_hdr)) ||
 	    (size != TCM_ALIGN(ntohl(req_hdr->tcm_len))) ||
 	    (ntohs(req_hdr->tcm_flags) != TCM_F_REQUEST)) {
-		warn("Invalid configuration message discarded\n");
+		pr_warn("Invalid configuration message discarded\n");
 		return;
 	}
 
@@ -478,7 +478,7 @@ int tipc_cfg_init(void)
 	return 0;
 
 failed:
-	err("Unable to create configuration service\n");
+	pr_err("Unable to create configuration service\n");
 	return res;
 }
 
@@ -494,7 +494,7 @@ void tipc_cfg_reinit(void)
 	seq.lower = seq.upper = tipc_own_addr;
 	res = tipc_publish(config_port_ref, TIPC_ZONE_SCOPE, &seq);
 	if (res)
-		err("Unable to reinitialize configuration service\n");
+		pr_err("Unable to reinitialize configuration service\n");
 }
 
 void tipc_cfg_stop(void)
diff --git a/net/tipc/core.c b/net/tipc/core.c
index f7b9523..3689cb4 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -34,14 +34,13 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <linux/module.h>
-
 #include "core.h"
 #include "ref.h"
 #include "name_table.h"
 #include "subscr.h"
 #include "config.h"
 
+#include <linux/module.h>
 
 #ifndef CONFIG_TIPC_PORTS
 #define CONFIG_TIPC_PORTS 8191
@@ -162,9 +161,9 @@ static int __init tipc_init(void)
 	int res;
 
 	if (tipc_log_resize(CONFIG_TIPC_LOG) != 0)
-		warn("Unable to create log buffer\n");
+		pr_warn("Unable to create log buffer\n");
 
-	info("Activated (version " TIPC_MOD_VER ")\n");
+	pr_info("Activated (version " TIPC_MOD_VER ")\n");
 
 	tipc_own_addr = 0;
 	tipc_remote_management = 1;
@@ -175,9 +174,9 @@ static int __init tipc_init(void)
 
 	res = tipc_core_start();
 	if (res)
-		err("Unable to start in single node mode\n");
+		pr_err("Unable to start in single node mode\n");
 	else
-		info("Started in single node mode\n");
+		pr_info("Started in single node mode\n");
 	return res;
 }
 
@@ -185,7 +184,7 @@ static void __exit tipc_exit(void)
 {
 	tipc_core_stop_net();
 	tipc_core_stop();
-	info("Deactivated\n");
+	pr_info("Deactivated\n");
 }
 
 module_init(tipc_init);
diff --git a/net/tipc/core.h b/net/tipc/core.h
index 2a9bb99..c376ec0 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -37,6 +37,8 @@
 #ifndef _TIPC_CORE_H
 #define _TIPC_CORE_H
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/tipc.h>
 #include <linux/tipc_config.h>
 #include <linux/types.h>
@@ -89,13 +91,6 @@ void tipc_printf(struct print_buf *, const char *fmt, ...);
 #define TIPC_OUTPUT TIPC_LOG
 #endif
 
-#define err(fmt, arg...)  tipc_printf(TIPC_OUTPUT, \
-				      KERN_ERR "TIPC: " fmt, ## arg)
-#define warn(fmt, arg...) tipc_printf(TIPC_OUTPUT, \
-				      KERN_WARNING "TIPC: " fmt, ## arg)
-#define info(fmt, arg...) tipc_printf(TIPC_OUTPUT, \
-				      KERN_NOTICE "TIPC: " fmt, ## arg)
-
 #ifdef CONFIG_TIPC_DEBUG
 
 /*
@@ -105,15 +100,12 @@ void tipc_printf(struct print_buf *, const char *fmt, ...);
 #define DBG_OUTPUT TIPC_LOG
 #endif
 
-#define dbg(fmt, arg...)  tipc_printf(DBG_OUTPUT, KERN_DEBUG fmt, ## arg);
-
 #define msg_dbg(msg, txt) tipc_msg_dbg(DBG_OUTPUT, msg, txt);
 
 void tipc_msg_dbg(struct print_buf *, struct tipc_msg *, const char *);
 
 #else
 
-#define dbg(fmt, arg...)	do {} while (0)
 #define msg_dbg(msg, txt)	do {} while (0)
 
 #define tipc_msg_dbg(buf, msg, txt) do {} while (0)
diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index ae054cf..2f91f37 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -106,8 +106,8 @@ static void disc_dupl_alert(struct tipc_bearer *b_ptr, u32 node_addr,
 	tipc_printbuf_init(&pb, media_addr_str, sizeof(media_addr_str));
 	tipc_media_addr_printf(&pb, media_addr);
 	tipc_printbuf_validate(&pb);
-	warn("Duplicate %s using %s seen on <%s>\n",
-	     node_addr_str, media_addr_str, b_ptr->name);
+	pr_warn("Duplicate %s using %s seen on <%s>\n", node_addr_str,
+		media_addr_str, b_ptr->name);
 }
 
 /**
diff --git a/net/tipc/handler.c b/net/tipc/handler.c
index 9c6f22f..7a52d39 100644
--- a/net/tipc/handler.c
+++ b/net/tipc/handler.c
@@ -57,14 +57,14 @@ unsigned int tipc_k_signal(Handler routine, unsigned long argument)
 	struct queue_item *item;
 
 	if (!handler_enabled) {
-		err("Signal request ignored by handler\n");
+		pr_err("Signal request ignored by handler\n");
 		return -ENOPROTOOPT;
 	}
 
 	spin_lock_bh(&qitem_lock);
 	item = kmem_cache_alloc(tipc_queue_item_cache, GFP_ATOMIC);
 	if (!item) {
-		err("Signal queue out of memory\n");
+		pr_err("Signal queue out of memory\n");
 		spin_unlock_bh(&qitem_lock);
 		return -ENOMEM;
 	}
diff --git a/net/tipc/link.c b/net/tipc/link.c
index f6bf483..e543b9f 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -41,6 +41,12 @@
 #include "discover.h"
 #include "config.h"
 
+/*
+ * Error message prefixes
+ */
+static const char *link_co_err = "Link changeover error, ";
+static const char *link_rst_msg = "Resetting link ";
+static const char *link_unk_evt = "Unknown link event ";
 
 /*
  * Out-of-range value for link session numbers
@@ -300,20 +306,20 @@ struct tipc_link *tipc_link_create(struct tipc_node *n_ptr,
 
 	if (n_ptr->link_cnt >= 2) {
 		tipc_addr_string_fill(addr_string, n_ptr->addr);
-		err("Attempt to establish third link to %s\n", addr_string);
+		pr_err("Attempt to establish third link to %s\n", addr_string);
 		return NULL;
 	}
 
 	if (n_ptr->links[b_ptr->identity]) {
 		tipc_addr_string_fill(addr_string, n_ptr->addr);
-		err("Attempt to establish second link on <%s> to %s\n",
-		    b_ptr->name, addr_string);
+		pr_err("Attempt to establish second link on <%s> to %s\n",
+		       b_ptr->name, addr_string);
 		return NULL;
 	}
 
 	l_ptr = kzalloc(sizeof(*l_ptr), GFP_ATOMIC);
 	if (!l_ptr) {
-		warn("Link creation failed, no memory\n");
+		pr_warn("Link creation failed, no memory\n");
 		return NULL;
 	}
 
@@ -371,7 +377,7 @@ struct tipc_link *tipc_link_create(struct tipc_node *n_ptr,
 void tipc_link_delete(struct tipc_link *l_ptr)
 {
 	if (!l_ptr) {
-		err("Attempt to delete non-existent link\n");
+		pr_err("Attempt to delete non-existent link\n");
 		return;
 	}
 
@@ -632,8 +638,8 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
 			link_set_timer(l_ptr, cont_intv / 4);
 			break;
 		case RESET_MSG:
-			info("Resetting link <%s>, requested by peer\n",
-			     l_ptr->name);
+			pr_info("%s<%s>, requested by peer\n", link_rst_msg,
+				l_ptr->name);
 			tipc_link_reset(l_ptr);
 			l_ptr->state = RESET_RESET;
 			l_ptr->fsm_msg_cnt = 0;
@@ -642,7 +648,7 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
 			link_set_timer(l_ptr, cont_intv);
 			break;
 		default:
-			err("Unknown link event %u in WW state\n", event);
+			pr_err("%s%u in WW state\n", link_unk_evt, event);
 		}
 		break;
 	case WORKING_UNKNOWN:
@@ -654,8 +660,8 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
 			link_set_timer(l_ptr, cont_intv);
 			break;
 		case RESET_MSG:
-			info("Resetting link <%s>, requested by peer "
-			     "while probing\n", l_ptr->name);
+			pr_info("%s<%s>, requested by peer while probing\n",
+				link_rst_msg, l_ptr->name);
 			tipc_link_reset(l_ptr);
 			l_ptr->state = RESET_RESET;
 			l_ptr->fsm_msg_cnt = 0;
@@ -680,8 +686,8 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
 				l_ptr->fsm_msg_cnt++;
 				link_set_timer(l_ptr, cont_intv / 4);
 			} else {	/* Link has failed */
-				warn("Resetting link <%s>, peer not responding\n",
-				     l_ptr->name);
+				pr_warn("%s<%s>, peer not responding\n",
+					link_rst_msg, l_ptr->name);
 				tipc_link_reset(l_ptr);
 				l_ptr->state = RESET_UNKNOWN;
 				l_ptr->fsm_msg_cnt = 0;
@@ -692,7 +698,7 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
 			}
 			break;
 		default:
-			err("Unknown link event %u in WU state\n", event);
+			pr_err("%s%u in WU state\n", link_unk_evt, event);
 		}
 		break;
 	case RESET_UNKNOWN:
@@ -726,7 +732,7 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
 			link_set_timer(l_ptr, cont_intv);
 			break;
 		default:
-			err("Unknown link event %u in RU state\n", event);
+			pr_err("%s%u in RU state\n", link_unk_evt, event);
 		}
 		break;
 	case RESET_RESET:
@@ -751,11 +757,11 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
 			link_set_timer(l_ptr, cont_intv);
 			break;
 		default:
-			err("Unknown link event %u in RR state\n", event);
+			pr_err("%s%u in RR state\n", link_unk_evt, event);
 		}
 		break;
 	default:
-		err("Unknown link state %u/%u\n", l_ptr->state, event);
+		pr_err("Unknown link state %u/%u\n", l_ptr->state, event);
 	}
 }
 
@@ -856,7 +862,8 @@ int tipc_link_send_buf(struct tipc_link *l_ptr, struct sk_buff *buf)
 		}
 		kfree_skb(buf);
 		if (imp > CONN_MANAGER) {
-			warn("Resetting link <%s>, send queue full", l_ptr->name);
+			pr_warn("%s<%s>, send queue full", link_rst_msg,
+				l_ptr->name);
 			tipc_link_reset(l_ptr);
 		}
 		return dsz;
@@ -1409,8 +1416,8 @@ static void link_reset_all(unsigned long addr)
 
 	tipc_node_lock(n_ptr);
 
-	warn("Resetting all links to %s\n",
-	     tipc_addr_string_fill(addr_string, n_ptr->addr));
+	pr_warn("Resetting all links to %s\n",
+		tipc_addr_string_fill(addr_string, n_ptr->addr));
 
 	for (i = 0; i < MAX_BEARERS; i++) {
 		if (n_ptr->links[i]) {
@@ -1428,7 +1435,7 @@ static void link_retransmit_failure(struct tipc_link *l_ptr,
 {
 	struct tipc_msg *msg = buf_msg(buf);
 
-	warn("Retransmission failure on link <%s>\n", l_ptr->name);
+	pr_warn("Retransmission failure on link <%s>\n", l_ptr->name);
 
 	if (l_ptr->addr) {
 		/* Handle failure on standard link */
@@ -1440,21 +1447,23 @@ static void link_retransmit_failure(struct tipc_link *l_ptr,
 		struct tipc_node *n_ptr;
 		char addr_string[16];
 
-		info("Msg seq number: %u,  ", msg_seqno(msg));
-		info("Outstanding acks: %lu\n",
-		     (unsigned long) TIPC_SKB_CB(buf)->handle);
+		pr_info("Msg seq number: %u,  ", msg_seqno(msg));
+		pr_cont("Outstanding acks: %lu\n",
+			(unsigned long) TIPC_SKB_CB(buf)->handle);
 
 		n_ptr = tipc_bclink_retransmit_to();
 		tipc_node_lock(n_ptr);
 
 		tipc_addr_string_fill(addr_string, n_ptr->addr);
-		info("Broadcast link info for %s\n", addr_string);
-		info("Supportable: %d,  ", n_ptr->bclink.supportable);
-		info("Supported: %d,  ", n_ptr->bclink.supported);
-		info("Acked: %u\n", n_ptr->bclink.acked);
-		info("Last in: %u,  ", n_ptr->bclink.last_in);
-		info("Oos state: %u,  ", n_ptr->bclink.oos_state);
-		info("Last sent: %u\n", n_ptr->bclink.last_sent);
+		pr_info("Broadcast link info for %s\n", addr_string);
+		pr_info("Supportable: %d,  Supported: %d,  Acked: %u\n",
+			n_ptr->bclink.supportable,
+			n_ptr->bclink.supported,
+			n_ptr->bclink.acked);
+		pr_info("Last in: %u,  Oos state: %u,  Last sent: %u\n",
+			n_ptr->bclink.last_in,
+			n_ptr->bclink.oos_state,
+			n_ptr->bclink.last_sent);
 
 		tipc_k_signal((Handler)link_reset_all, (unsigned long)n_ptr->addr);
 
@@ -1479,8 +1488,8 @@ void tipc_link_retransmit(struct tipc_link *l_ptr, struct sk_buff *buf,
 			l_ptr->retransm_queue_head = msg_seqno(msg);
 			l_ptr->retransm_queue_size = retransmits;
 		} else {
-			err("Unexpected retransmit on link %s (qsize=%d)\n",
-			    l_ptr->name, l_ptr->retransm_queue_size);
+			pr_err("Unexpected retransmit on link %s (qsize=%d)\n",
+			       l_ptr->name, l_ptr->retransm_queue_size);
 		}
 		return;
 	} else {
@@ -2074,8 +2083,9 @@ static void link_recv_proto_msg(struct tipc_link *l_ptr, struct sk_buff *buf)
 
 		if (msg_linkprio(msg) &&
 		    (msg_linkprio(msg) != l_ptr->priority)) {
-			warn("Resetting link <%s>, priority change %u->%u\n",
-			     l_ptr->name, l_ptr->priority, msg_linkprio(msg));
+			pr_warn("%s<%s>, priority change %u->%u\n",
+				link_rst_msg, l_ptr->name, l_ptr->priority,
+				msg_linkprio(msg));
 			l_ptr->priority = msg_linkprio(msg);
 			tipc_link_reset(l_ptr); /* Enforce change to take effect */
 			break;
@@ -2139,15 +2149,13 @@ static void tipc_link_tunnel(struct tipc_link *l_ptr,
 
 	tunnel = l_ptr->owner->active_links[selector & 1];
 	if (!tipc_link_is_up(tunnel)) {
-		warn("Link changeover error, "
-		     "tunnel link no longer available\n");
+		pr_warn("%stunnel link no longer available\n", link_co_err);
 		return;
 	}
 	msg_set_size(tunnel_hdr, length + INT_H_SIZE);
 	buf = tipc_buf_acquire(length + INT_H_SIZE);
 	if (!buf) {
-		warn("Link changeover error, "
-		     "unable to send tunnel msg\n");
+		pr_warn("%sunable to send tunnel msg\n", link_co_err);
 		return;
 	}
 	skb_copy_to_linear_data(buf, tunnel_hdr, INT_H_SIZE);
@@ -2173,8 +2181,7 @@ void tipc_link_changeover(struct tipc_link *l_ptr)
 		return;
 
 	if (!l_ptr->owner->permit_changeover) {
-		warn("Link changeover error, "
-		     "peer did not permit changeover\n");
+		pr_warn("%speer did not permit changeover\n", link_co_err);
 		return;
 	}
 
@@ -2192,8 +2199,8 @@ void tipc_link_changeover(struct tipc_link *l_ptr)
 			msg_set_size(&tunnel_hdr, INT_H_SIZE);
 			tipc_link_send_buf(tunnel, buf);
 		} else {
-			warn("Link changeover error, "
-			     "unable to send changeover msg\n");
+			pr_warn("%sunable to send changeover msg\n",
+				link_co_err);
 		}
 		return;
 	}
@@ -2246,8 +2253,8 @@ void tipc_link_send_duplicate(struct tipc_link *l_ptr, struct tipc_link *tunnel)
 		msg_set_size(&tunnel_hdr, length + INT_H_SIZE);
 		outbuf = tipc_buf_acquire(length + INT_H_SIZE);
 		if (outbuf == NULL) {
-			warn("Link changeover error, "
-			     "unable to send duplicate msg\n");
+			pr_warn("%sunable to send duplicate msg\n",
+				link_co_err);
 			return;
 		}
 		skb_copy_to_linear_data(outbuf, &tunnel_hdr, INT_H_SIZE);
@@ -2298,8 +2305,8 @@ static int link_recv_changeover_msg(struct tipc_link **l_ptr,
 	if (!dest_link)
 		goto exit;
 	if (dest_link == *l_ptr) {
-		err("Unexpected changeover message on link <%s>\n",
-		    (*l_ptr)->name);
+		pr_err("Unexpected changeover message on link <%s>\n",
+		       (*l_ptr)->name);
 		goto exit;
 	}
 	*l_ptr = dest_link;
@@ -2310,7 +2317,7 @@ static int link_recv_changeover_msg(struct tipc_link **l_ptr,
 			goto exit;
 		*buf = buf_extract(tunnel_buf, INT_H_SIZE);
 		if (*buf == NULL) {
-			warn("Link changeover error, duplicate msg dropped\n");
+			pr_warn("%sduplicate msg dropped\n", link_co_err);
 			goto exit;
 		}
 		kfree_skb(tunnel_buf);
@@ -2319,8 +2326,8 @@ static int link_recv_changeover_msg(struct tipc_link **l_ptr,
 
 	/* First original message ?: */
 	if (tipc_link_is_up(dest_link)) {
-		info("Resetting link <%s>, changeover initiated by peer\n",
-		     dest_link->name);
+		pr_info("%s<%s>, changeover initiated by peer\n", link_rst_msg,
+			dest_link->name);
 		tipc_link_reset(dest_link);
 		dest_link->exp_msg_count = msg_count;
 		if (!msg_count)
@@ -2333,8 +2340,7 @@ static int link_recv_changeover_msg(struct tipc_link **l_ptr,
 
 	/* Receive original message */
 	if (dest_link->exp_msg_count == 0) {
-		warn("Link switchover error, "
-		     "got too many tunnelled messages\n");
+		pr_warn("%sgot too many tunnelled messages\n", link_co_err);
 		goto exit;
 	}
 	dest_link->exp_msg_count--;
@@ -2346,7 +2352,7 @@ static int link_recv_changeover_msg(struct tipc_link **l_ptr,
 			kfree_skb(tunnel_buf);
 			return 1;
 		} else {
-			warn("Link changeover error, original msg dropped\n");
+			pr_warn("%soriginal msg dropped\n", link_co_err);
 		}
 	}
 exit:
@@ -2367,7 +2373,7 @@ void tipc_link_recv_bundle(struct sk_buff *buf)
 	while (msgcount--) {
 		obuf = buf_extract(buf, pos);
 		if (obuf == NULL) {
-			warn("Link unable to unbundle message(s)\n");
+			pr_warn("Link unable to unbundle message(s)\n");
 			break;
 		}
 		pos += align(msg_size(buf_msg(obuf)));
@@ -2538,7 +2544,7 @@ int tipc_link_recv_fragment(struct sk_buff **pending, struct sk_buff **fb,
 			set_fragm_size(pbuf, fragm_sz);
 			set_expected_frags(pbuf, exp_fragm_cnt - 1);
 		} else {
-			dbg("Link unable to reassemble fragmented message\n");
+			pr_debug("Link unable to reassemble fragmented message\n");
 			kfree_skb(fbuf);
 			return -1;
 		}
@@ -3060,5 +3066,5 @@ print_state:
 	tipc_printf(buf, "\n");
 
 	tipc_printbuf_validate(buf);
-	info("%s", print_area);
+	pr_info("%s", print_area);
 }
diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index 158318e..55d3928 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -161,7 +161,7 @@ void tipc_named_publish(struct publication *publ)
 
 	buf = named_prepare_buf(PUBLICATION, ITEM_SIZE, 0);
 	if (!buf) {
-		warn("Publication distribution failure\n");
+		pr_warn("Publication distribution failure\n");
 		return;
 	}
 
@@ -186,7 +186,7 @@ void tipc_named_withdraw(struct publication *publ)
 
 	buf = named_prepare_buf(WITHDRAWAL, ITEM_SIZE, 0);
 	if (!buf) {
-		warn("Withdrawal distribution failure\n");
+		pr_warn("Withdrawal distribution failure\n");
 		return;
 	}
 
@@ -213,7 +213,7 @@ static void named_distribute(struct list_head *message_list, u32 node,
 			rest -= left;
 			buf = named_prepare_buf(PUBLICATION, left, node);
 			if (!buf) {
-				warn("Bulk publication failure\n");
+				pr_warn("Bulk publication failure\n");
 				return;
 			}
 			item = (struct distr_item *)msg_data(buf_msg(buf));
@@ -283,9 +283,10 @@ static void named_purge_publ(struct publication *publ)
 	write_unlock_bh(&tipc_nametbl_lock);
 
 	if (p != publ) {
-		err("Unable to remove publication from failed node\n"
-		    "(type=%u, lower=%u, node=0x%x, ref=%u, key=%u)\n",
-		    publ->type, publ->lower, publ->node, publ->ref, publ->key);
+		pr_err("Unable to remove publication from failed node\n"
+		       " (type=%u, lower=%u, node=0x%x, ref=%u, key=%u)\n",
+		       publ->type, publ->lower, publ->node, publ->ref,
+		       publ->key);
 	}
 
 	kfree(p);
@@ -329,14 +330,14 @@ void tipc_named_recv(struct sk_buff *buf)
 				tipc_nodesub_unsubscribe(&publ->subscr);
 				kfree(publ);
 			} else {
-				err("Unable to remove publication by node 0x%x\n"
-				    "(type=%u, lower=%u, ref=%u, key=%u)\n",
-				    msg_orignode(msg),
-				    ntohl(item->type), ntohl(item->lower),
-				    ntohl(item->ref), ntohl(item->key));
+				pr_err("Unable to remove publication by node 0x%x\n"
+				       " (type=%u, lower=%u, ref=%u, key=%u)\n",
+				       msg_orignode(msg), ntohl(item->type),
+				       ntohl(item->lower), ntohl(item->ref),
+				       ntohl(item->key));
 			}
 		} else {
-			warn("Unrecognized name table message received\n");
+			pr_warn("Unrecognized name table message received\n");
 		}
 		item++;
 	}
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index cade0ac..c8b0b5c 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -126,7 +126,7 @@ static struct publication *publ_create(u32 type, u32 lower, u32 upper,
 {
 	struct publication *publ = kzalloc(sizeof(*publ), GFP_ATOMIC);
 	if (publ == NULL) {
-		warn("Publication creation failure, no memory\n");
+		pr_warn("Publication creation failure, no memory\n");
 		return NULL;
 	}
 
@@ -163,7 +163,7 @@ static struct name_seq *tipc_nameseq_create(u32 type, struct hlist_head *seq_hea
 	struct sub_seq *sseq = tipc_subseq_alloc(1);
 
 	if (!nseq || !sseq) {
-		warn("Name sequence creation failed, no memory\n");
+		pr_warn("Name sequence creation failed, no memory\n");
 		kfree(nseq);
 		kfree(sseq);
 		return NULL;
@@ -263,8 +263,8 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 
 		/* Lower end overlaps existing entry => need an exact match */
 		if ((sseq->lower != lower) || (sseq->upper != upper)) {
-			warn("Cannot publish {%u,%u,%u}, overlap error\n",
-			     type, lower, upper);
+			pr_warn("Cannot publish {%u,%u,%u}, overlap error\n",
+				type, lower, upper);
 			return NULL;
 		}
 
@@ -286,8 +286,8 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 		/* Fail if upper end overlaps into an existing entry */
 		if ((inspos < nseq->first_free) &&
 		    (upper >= nseq->sseqs[inspos].lower)) {
-			warn("Cannot publish {%u,%u,%u}, overlap error\n",
-			     type, lower, upper);
+			pr_warn("Cannot publish {%u,%u,%u}, overlap error\n",
+				type, lower, upper);
 			return NULL;
 		}
 
@@ -296,8 +296,8 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 			struct sub_seq *sseqs = tipc_subseq_alloc(nseq->alloc * 2);
 
 			if (!sseqs) {
-				warn("Cannot publish {%u,%u,%u}, no memory\n",
-				     type, lower, upper);
+				pr_warn("Cannot publish {%u,%u,%u}, no memory\n",
+					type, lower, upper);
 				return NULL;
 			}
 			memcpy(sseqs, nseq->sseqs,
@@ -309,8 +309,8 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
 
 		info = kzalloc(sizeof(*info), GFP_ATOMIC);
 		if (!info) {
-			warn("Cannot publish {%u,%u,%u}, no memory\n",
-			     type, lower, upper);
+			pr_warn("Cannot publish {%u,%u,%u}, no memory\n",
+				type, lower, upper);
 			return NULL;
 		}
 
@@ -492,8 +492,8 @@ struct publication *tipc_nametbl_insert_publ(u32 type, u32 lower, u32 upper,
 
 	if ((scope < TIPC_ZONE_SCOPE) || (scope > TIPC_NODE_SCOPE) ||
 	    (lower > upper)) {
-		dbg("Failed to publish illegal {%u,%u,%u} with scope %u\n",
-		     type, lower, upper, scope);
+		pr_debug("Failed to publish illegal {%u,%u,%u} with scope %u\n",
+			 type, lower, upper, scope);
 		return NULL;
 	}
 
@@ -668,8 +668,8 @@ struct publication *tipc_nametbl_publish(u32 type, u32 lower, u32 upper,
 	struct publication *publ;
 
 	if (table.local_publ_count >= tipc_max_publications) {
-		warn("Publication failed, local publication limit reached (%u)\n",
-		     tipc_max_publications);
+		pr_warn("Publication failed, local publication limit reached (%u)\n",
+			tipc_max_publications);
 		return NULL;
 	}
 
@@ -702,9 +702,9 @@ int tipc_nametbl_withdraw(u32 type, u32 lower, u32 ref, u32 key)
 		return 1;
 	}
 	write_unlock_bh(&tipc_nametbl_lock);
-	err("Unable to remove local publication\n"
-	    "(type=%u, lower=%u, ref=%u, key=%u)\n",
-	    type, lower, ref, key);
+	pr_err("Unable to remove local publication\n"
+	       "(type=%u, lower=%u, ref=%u, key=%u)\n",
+	       type, lower, ref, key);
 	return 0;
 }
 
@@ -725,8 +725,8 @@ void tipc_nametbl_subscribe(struct tipc_subscription *s)
 		tipc_nameseq_subscribe(seq, s);
 		spin_unlock_bh(&seq->lock);
 	} else {
-		warn("Failed to create subscription for {%u,%u,%u}\n",
-		     s->seq.type, s->seq.lower, s->seq.upper);
+		pr_warn("Failed to create subscription for {%u,%u,%u}\n",
+			s->seq.type, s->seq.lower, s->seq.upper);
 	}
 	write_unlock_bh(&tipc_nametbl_lock);
 }
@@ -942,7 +942,7 @@ void tipc_nametbl_stop(void)
 	for (i = 0; i < tipc_nametbl_size; i++) {
 		if (hlist_empty(&table.types[i]))
 			continue;
-		err("tipc_nametbl_stop(): orphaned hash chain detected\n");
+		pr_err("nametbl_stop(): orphaned hash chain detected\n");
 		break;
 	}
 	kfree(table.types);
diff --git a/net/tipc/net.c b/net/tipc/net.c
index 7c236c8..5b5cea2 100644
--- a/net/tipc/net.c
+++ b/net/tipc/net.c
@@ -184,9 +184,9 @@ int tipc_net_start(u32 addr)
 
 	tipc_cfg_reinit();
 
-	info("Started in network mode\n");
-	info("Own node address %s, network identity %u\n",
-	     tipc_addr_string_fill(addr_string, tipc_own_addr), tipc_net_id);
+	pr_info("Started in network mode\n");
+	pr_info("Own node address %s, network identity %u\n",
+		tipc_addr_string_fill(addr_string, tipc_own_addr), tipc_net_id);
 	return 0;
 }
 
@@ -202,5 +202,5 @@ void tipc_net_stop(void)
 	list_for_each_entry_safe(node, t_node, &tipc_node_list, list)
 		tipc_node_delete(node);
 	write_unlock_bh(&tipc_net_lock);
-	info("Left network mode\n");
+	pr_info("Left network mode\n");
 }
diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c
index 7bda8e3..47a839d 100644
--- a/net/tipc/netlink.c
+++ b/net/tipc/netlink.c
@@ -90,7 +90,7 @@ int tipc_netlink_start(void)
 	res = genl_register_family_with_ops(&tipc_genl_family,
 		&tipc_genl_ops, 1);
 	if (res) {
-		err("Failed to register netlink interface\n");
+		pr_err("Failed to register netlink interface\n");
 		return res;
 	}
 
diff --git a/net/tipc/node.c b/net/tipc/node.c
index d4fd341..d21db20 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -105,7 +105,7 @@ struct tipc_node *tipc_node_create(u32 addr)
 	n_ptr = kzalloc(sizeof(*n_ptr), GFP_ATOMIC);
 	if (!n_ptr) {
 		spin_unlock_bh(&node_create_lock);
-		warn("Node creation failed, no memory\n");
+		pr_warn("Node creation failed, no memory\n");
 		return NULL;
 	}
 
@@ -151,8 +151,8 @@ void tipc_node_link_up(struct tipc_node *n_ptr, struct tipc_link *l_ptr)
 
 	n_ptr->working_links++;
 
-	info("Established link <%s> on network plane %c\n",
-	     l_ptr->name, l_ptr->b_ptr->net_plane);
+	pr_info("Established link <%s> on network plane %c\n",
+		l_ptr->name, l_ptr->b_ptr->net_plane);
 
 	if (!active[0]) {
 		active[0] = active[1] = l_ptr;
@@ -160,7 +160,7 @@ void tipc_node_link_up(struct tipc_node *n_ptr, struct tipc_link *l_ptr)
 		return;
 	}
 	if (l_ptr->priority < active[0]->priority) {
-		info("New link <%s> becomes standby\n", l_ptr->name);
+		pr_info("New link <%s> becomes standby\n", l_ptr->name);
 		return;
 	}
 	tipc_link_send_duplicate(active[0], l_ptr);
@@ -168,9 +168,9 @@ void tipc_node_link_up(struct tipc_node *n_ptr, struct tipc_link *l_ptr)
 		active[0] = l_ptr;
 		return;
 	}
-	info("Old link <%s> becomes standby\n", active[0]->name);
+	pr_info("Old link <%s> becomes standby\n", active[0]->name);
 	if (active[1] != active[0])
-		info("Old link <%s> becomes standby\n", active[1]->name);
+		pr_info("Old link <%s> becomes standby\n", active[1]->name);
 	active[0] = active[1] = l_ptr;
 }
 
@@ -211,11 +211,11 @@ void tipc_node_link_down(struct tipc_node *n_ptr, struct tipc_link *l_ptr)
 	n_ptr->working_links--;
 
 	if (!tipc_link_is_active(l_ptr)) {
-		info("Lost standby link <%s> on network plane %c\n",
-		     l_ptr->name, l_ptr->b_ptr->net_plane);
+		pr_info("Lost standby link <%s> on network plane %c\n",
+			l_ptr->name, l_ptr->b_ptr->net_plane);
 		return;
 	}
-	info("Lost link <%s> on network plane %c\n",
+	pr_info("Lost link <%s> on network plane %c\n",
 		l_ptr->name, l_ptr->b_ptr->net_plane);
 
 	active = &n_ptr->active_links[0];
@@ -290,8 +290,8 @@ static void node_lost_contact(struct tipc_node *n_ptr)
 	char addr_string[16];
 	u32 i;
 
-	info("Lost contact with %s\n",
-	     tipc_addr_string_fill(addr_string, n_ptr->addr));
+	pr_info("Lost contact with %s\n",
+		tipc_addr_string_fill(addr_string, n_ptr->addr));
 
 	/* Flush broadcast link info associated with lost node */
 	if (n_ptr->bclink.supported) {
diff --git a/net/tipc/node_subscr.c b/net/tipc/node_subscr.c
index 7a27344..5e34b01 100644
--- a/net/tipc/node_subscr.c
+++ b/net/tipc/node_subscr.c
@@ -51,7 +51,8 @@ void tipc_nodesub_subscribe(struct tipc_node_subscr *node_sub, u32 addr,
 
 	node_sub->node = tipc_node_find(addr);
 	if (!node_sub->node) {
-		warn("Node subscription rejected, unknown node 0x%x\n", addr);
+		pr_warn("Node subscription rejected, unknown node 0x%x\n",
+			addr);
 		return;
 	}
 	node_sub->handle_node_down = handle_down;
diff --git a/net/tipc/port.c b/net/tipc/port.c
index 70bf78b..2cbac39 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -191,7 +191,7 @@ void tipc_port_recv_mcast(struct sk_buff *buf, struct tipc_port_list *dp)
 			struct sk_buff *b = skb_clone(buf, GFP_ATOMIC);
 
 			if (b == NULL) {
-				warn("Unable to deliver multicast message(s)\n");
+				pr_warn("Unable to deliver multicast message(s)\n");
 				goto exit;
 			}
 			if ((index == 0) && (cnt != 0))
@@ -221,12 +221,12 @@ struct tipc_port *tipc_createport_raw(void *usr_handle,
 
 	p_ptr = kzalloc(sizeof(*p_ptr), GFP_ATOMIC);
 	if (!p_ptr) {
-		warn("Port creation failed, no memory\n");
+		pr_warn("Port creation failed, no memory\n");
 		return NULL;
 	}
 	ref = tipc_ref_acquire(p_ptr, &p_ptr->lock);
 	if (!ref) {
-		warn("Port creation failed, reference table exhausted\n");
+		pr_warn("Port creation failed, ref. table exhausted\n");
 		kfree(p_ptr);
 		return NULL;
 	}
@@ -906,7 +906,7 @@ int tipc_createport(void *usr_handle,
 
 	up_ptr = kmalloc(sizeof(*up_ptr), GFP_ATOMIC);
 	if (!up_ptr) {
-		warn("Port creation failed, no memory\n");
+		pr_warn("Port creation failed, no memory\n");
 		return -ENOMEM;
 	}
 	p_ptr = tipc_createport_raw(NULL, port_dispatcher, port_wakeup,
diff --git a/net/tipc/ref.c b/net/tipc/ref.c
index 5cada0e..2a2a938 100644
--- a/net/tipc/ref.c
+++ b/net/tipc/ref.c
@@ -153,11 +153,11 @@ u32 tipc_ref_acquire(void *object, spinlock_t **lock)
 	struct reference *entry = NULL;
 
 	if (!object) {
-		err("Attempt to acquire reference to non-existent object\n");
+		pr_err("Attempt to acquire ref. to non-existent obj\n");
 		return 0;
 	}
 	if (!tipc_ref_table.entries) {
-		err("Reference table not found during acquisition attempt\n");
+		pr_err("Ref. table not found in acquisition attempt\n");
 		return 0;
 	}
 
@@ -211,7 +211,7 @@ void tipc_ref_discard(u32 ref)
 	u32 index_mask;
 
 	if (!tipc_ref_table.entries) {
-		err("Reference table not found during discard attempt\n");
+		pr_err("Ref. table not found during discard attempt\n");
 		return;
 	}
 
@@ -222,11 +222,11 @@ void tipc_ref_discard(u32 ref)
 	write_lock_bh(&ref_table_lock);
 
 	if (!entry->object) {
-		err("Attempt to discard reference to non-existent object\n");
+		pr_err("Attempt to discard ref. to non-existent obj\n");
 		goto exit;
 	}
 	if (entry->ref != ref) {
-		err("Attempt to discard non-existent reference\n");
+		pr_err("Attempt to discard non-existent reference\n");
 		goto exit;
 	}
 
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1ebb49f..09dc5b9 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -34,12 +34,12 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <linux/export.h>
-#include <net/sock.h>
-
 #include "core.h"
 #include "port.h"
 
+#include <linux/export.h>
+#include <net/sock.h>
+
 #define SS_LISTENING	-1	/* socket is listening */
 #define SS_READY	-2	/* socket is connectionless */
 
@@ -1787,13 +1787,13 @@ int tipc_socket_init(void)
 
 	res = proto_register(&tipc_proto, 1);
 	if (res) {
-		err("Failed to register TIPC protocol type\n");
+		pr_err("Failed to register TIPC protocol type\n");
 		goto out;
 	}
 
 	res = sock_register(&tipc_family_ops);
 	if (res) {
-		err("Failed to register TIPC socket type\n");
+		pr_err("Failed to register TIPC socket type\n");
 		proto_unregister(&tipc_proto);
 		goto out;
 	}
diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c
index f976e9c..5ed5965 100644
--- a/net/tipc/subscr.c
+++ b/net/tipc/subscr.c
@@ -305,8 +305,8 @@ static struct tipc_subscription *subscr_subscribe(struct tipc_subscr *s,
 
 	/* Refuse subscription if global limit exceeded */
 	if (atomic_read(&topsrv.subscription_count) >= tipc_max_subscriptions) {
-		warn("Subscription rejected, subscription limit reached (%u)\n",
-		     tipc_max_subscriptions);
+		pr_warn("Subscription rejected, limit reached (%u)\n",
+			tipc_max_subscriptions);
 		subscr_terminate(subscriber);
 		return NULL;
 	}
@@ -314,7 +314,7 @@ static struct tipc_subscription *subscr_subscribe(struct tipc_subscr *s,
 	/* Allocate subscription object */
 	sub = kmalloc(sizeof(*sub), GFP_ATOMIC);
 	if (!sub) {
-		warn("Subscription rejected, no memory\n");
+		pr_warn("Subscription rejected, no memory\n");
 		subscr_terminate(subscriber);
 		return NULL;
 	}
@@ -328,7 +328,7 @@ static struct tipc_subscription *subscr_subscribe(struct tipc_subscr *s,
 	if ((!(sub->filter & TIPC_SUB_PORTS) ==
 	     !(sub->filter & TIPC_SUB_SERVICE)) ||
 	    (sub->seq.lower > sub->seq.upper)) {
-		warn("Subscription rejected, illegal request\n");
+		pr_warn("Subscription rejected, illegal request\n");
 		kfree(sub);
 		subscr_terminate(subscriber);
 		return NULL;
@@ -440,7 +440,7 @@ static void subscr_named_msg_event(void *usr_handle,
 	/* Create subscriber object */
 	subscriber = kzalloc(sizeof(struct tipc_subscriber), GFP_ATOMIC);
 	if (subscriber == NULL) {
-		warn("Subscriber rejected, no memory\n");
+		pr_warn("Subscriber rejected, no memory\n");
 		return;
 	}
 	INIT_LIST_HEAD(&subscriber->subscription_list);
@@ -458,7 +458,7 @@ static void subscr_named_msg_event(void *usr_handle,
 			NULL,
 			&subscriber->port_ref);
 	if (subscriber->port_ref == 0) {
-		warn("Subscriber rejected, unable to create port\n");
+		pr_warn("Subscriber rejected, unable to create port\n");
 		kfree(subscriber);
 		return;
 	}
@@ -517,7 +517,7 @@ int tipc_subscr_start(void)
 	return 0;
 
 failed:
-	err("Failed to create subscription service\n");
+	pr_err("Failed to create subscription service\n");
 	return res;
 }
 
-- 
1.7.9.7

^ permalink raw reply related

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Stephen Hemminger @ 2012-07-14  0:00 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <1342215900-3358-1-git-send-email-jon.mason@intel.com>

On Fri, 13 Jul 2012 14:44:59 -0700
Jon Mason <jon.mason@intel.com> wrote:

> A PCI-Express non-transparent bridge (NTB) is a point-to-point PCIe bus
> connecting 2 systems, providing electrical isolation between the two subsystems.
> A non-transparent bridge is functionally similar to a transparent bridge except
> that both sides of the bridge have their own independent address domains.  The
> host on one side of the bridge will not have the visibility of the complete
> memory or I/O space on the other side of the bridge.  To communicate across the
> non-transparent bridge, each NTB endpoint has one (or more) apertures exposed to
> the local system.  Writes to these apertures are mirrored to memory on the
> remote system.  Communications can also occur through the use of doorbell
> registers that initiate interrupts to the alternate domain, and scratch-pad
> registers accessible from both sides.
> 
> The NTB device driver is needed to configure these memory windows, doorbell, and
> scratch-pad registers as well as use them in such a way as they can be turned
> into a viable communication channel to the remote system.  ntb_hw.[ch]
> determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
> the underlying hardware to provide access and a common interface to the doorbell
> registers, scratch pads, and memory windows.  These hardware interfaces are
> exported so that other, non-mainlined kernel drivers can access these.
> ntb_transport.[ch] also uses the exported interfaces in ntb_hw.[ch] to setup a
> communication channel(s) and provide a reliable way of transferring data from
> one side to the other, which it then exports so that "client" drivers can access
> them.  These client drivers are used to provide a standard kernel interface
> (i.e., Ethernet device) to NTB, such that Linux can transfer data from one
> system to the other in a standard way.
> 
> Signed-off-by: Jon Mason <jon.mason@intel.com>

> +
> +static int max_num_cbs = 2;
> +module_param(max_num_cbs, uint, 0644);
> +MODULE_PARM_DESC(max_num_cbs, "Maximum number of NTB transport connections");

Rather than making it a fixed size, could you dynamically set these up
with rtnl_link_ops?

> +static struct ntb_device *ntbdev;

What about multiple boards in system?

> +/**
> + * ntb_hw_link_status() - return the hardware link status
> + * @ndev: pointer to ntb_device instance
> + *
> + * Returns true if the hardware is connected to the remote system
> + *
> + * RETURNS: true or false based on the hardware link state
> + */
> +bool ntb_hw_link_status(struct ntb_device *ndev)
> +{
> +	return ndev->link_status == NTB_LINK_UP;
> +}
> +EXPORT_SYMBOL(ntb_hw_link_status);

Why isn't this inline in some header?

> +/**
> + * ntb_query_pdev() - return the pci_dev pointer
> + * @ndev: pointer to ntb_device instance
> + *
> + * Given the ntb pointer return the pci_dev pointerfor the NTB hardware device
> + *
> + * RETURNS: a pointer to the ntb pci_dev
> + */
> +struct pci_dev *ntb_query_pdev(struct ntb_device *ndev)
> +{
> +	return ndev->pdev;
> +}
> +EXPORT_SYMBOL(ntb_query_pdev);
> +
> +/**
> + * ntb_query_max_cbs() - return the maximum number of callback tuples
> + * @ndev: pointer to ntb_device instance
> + *
> + * The number of callbacks can vary depending on the platform and MSI-X/MSI
> + * enablement
> + *
> + * RETURNS: the maximum number of callback tuples (3, 15, or 33)
> + */
> +unsigned int ntb_query_max_cbs(struct ntb_device *ndev)
> +{
> +	return ndev->max_cbs > max_num_cbs ? max_num_cbs : ndev->max_cbs;
> +}
> +EXPORT_SYMBOL(ntb_query_max_cbs);
> +
> +/**
> + * ntb_register_event_callback() - register event callback
> + * @ndev: pointer to ntb_device instance
> + * @func: callback function to register
> + *
> + * This function registers a callback for any HW driver events such as link
> + * up/down, power management notices and etc.
> + *
> + * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
> + */
> +int ntb_register_event_callback(struct ntb_device *ndev,
> +				void (*func)(void *handle, unsigned int event))
> +{
> +	if (ndev->event_cb)
> +		return -EINVAL;
> +
> +	ndev->event_cb = func;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(ntb_register_event_callback);
> +
> +/**
> + * ntb_unregister_event_callback() - unregisters the event callback
> + * @ndev: pointer to ntb_device instance
> + *
> + * This function unregisters the existing callback from transport
> + */
> +void ntb_unregister_event_callback(struct ntb_device *ndev)
> +{
> +	ndev->event_cb = NULL;
> +}
> +EXPORT_SYMBOL(ntb_unregister_event_callback);
> +

^ permalink raw reply

* Re: resurrecting tcphealth
From: Stephen Hemminger @ 2012-07-13 23:55 UTC (permalink / raw)
  To: Piotr Sawuk; +Cc: netdev, linux-kernel
In-Reply-To: <e9caf38359467bfa8a1e2ac86f6ef2cc.squirrel@webmail.univie.ac.at>

I am not sure if the is really necessary since the most
of the stats are available elsewhere.

Here are some comments on getting the simplified to match
the kernel style.

>
> static inline struct tcp_sock *tcp_sk(const struct sock *sk)
>diff -rub A/net/ipv4/tcp_input.c B/net/ipv4/tcp_input.c
>--- A/net/ipv4/tcp_input.c	2012-06-22 20:37:50.000000000 +0200
>+++ B/net/ipv4/tcp_input.c	2012-07-06 10:12:12.000000000 +0200
>@@ -4414,6 +4415,8 @@
> 		}
>
> 		if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) {
>+			/* Course retransmit inefficiency- this packet has been received twice. */
>+			tp->dup_pkts_recv++;

I don't understand that comment, could you use a better sentence please?

>
> 	tp->rx_opt.saw_tstamp = 0;
>
>+	/*
>+	 *	Tcp health monitoring is interested in
>+	 *	total per-connection packet arrivals.
>+	 *	This is in the fast path, but is quick.
>+	 */
>+	tp->pkts_recv++;
>+

Comment seems bigger justification than necessary for simple
operation.

>diff -rub A/net/ipv4/tcp_ipv4.c B/net/ipv4/tcp_ipv4.c
>--- A/net/ipv4/tcp_ipv4.c	2012-06-22 20:37:50.000000000 +0200
>+++ B/net/ipv4/tcp_ipv4.c	2012-07-11 09:34:22.000000000 +0200
>@@ -2533,6 +2533,82 @@
> 	return 0;
> }
>
>+
>+/*
>+ *	Output /proc/net/tcphealth
>+ */
>+#define LINESZ 128
>+
>+int tcp_health_seq_show(struct seq_file *seq, void *v)
>+{
>+	int len, num;
>+	char srcIP[32], destIP[32];
Unnecessary see below

>+
>+	unsigned long  SmoothedRttEstimate,
>+		AcksSent, DupAcksSent, PktsRecv, DupPktsRecv;

Do not use CamelCase in kernel code.

>+	struct tcp_iter_state *st;
>+
>+	if (v == SEQ_START_TOKEN) {
>+		seq_printf(seq,
>+		"TCP Health Monitoring (established connections only)\n"
>+		" -Duplicate ACKs indicate lost or reordered packets on the
>connection.\n"
>+		" -Duplicate Packets Received signal a slow and badly inefficient
>connection.\n"
>+		" -RttEst estimates how long future packets will take on a round trip
>over the connection.\n"
>+		"id   Local Address        Remote Address       RttEst(ms) AcksSent "

Header seems excessive, just put one line of header please.


>+		"DupAcksSent PktsRecv DupPktsRecv\n");
>+		goto out;
>+	}
>+
>+	/* Loop through established TCP connections */
>+	st = seq->private;
>+
>+
>+	if (st->state == TCP_SEQ_STATE_ESTABLISHED)
>+	{
>+/*	; //insert read-lock here */

Don't think you need read-lock

>+		const struct tcp_sock *tp = tcp_sk(v);
>+		const struct inet_sock *inet = inet_sk(v);
>+		__be32 dest = inet->inet_daddr;
>+		__be32 src = inet->inet_rcv_saddr;
>+		__u16 destp = ntohs(inet->inet_dport);
>+		__u16 srcp = ntohs(inet->inet_sport);
>+

These temp variables aren't redundant.

>+		num = st->num;
>+		SmoothedRttEstimate = (tp->srtt >> 3);
>+		AcksSent = tp->acks_sent;
>+		DupAcksSent = tp->dup_acks_sent;
>+		PktsRecv = tp->pkts_recv;
>+		DupPktsRecv = tp->dup_pkts_recv;
>+
>+		sprintf(srcIP, "%lu.%lu.%lu.%lu:%u",
>+			((src >> 24) & 0xFF), ((src >> 16) & 0xFF), ((src >> 8) & 0xFF), (src &
>0xFF),
>+			srcp);
>+		sprintf(destIP, "%3d.%3d.%3d.%3d:%u",
>+			((dest >> 24) & 0xFF), ((dest >> 16) & 0xFF), ((dest >> 8) & 0xFF),
>(dest & 0xFF),
>+			destp);
>+
>+		seq_printf(seq, "%d: %-21s %-21s "
>+				"%8lu %8lu %8lu %8lu %8lu%n",
>+				num,
>+				srcIP,
>+				destIP,
>+				SmoothedRttEstimate,
>+				AcksSent,
>+				DupAcksSent,
>+				PktsRecv,
>+				DupPktsRecv,
>+
>+				&len
>+			);
>+

Kernel has %pI4 to print IP addresses. 

	seq_printf(seq, "%d: %-21pI4 %-21pI4 "
			"%8lu %8lu %8lu %8lu %8lu\n",
			num,
			&inet->inet_rcv_saddr,
			&inet->inet_daddr,
			tp->srtt >> 3,
			tp->acks_sent,
			tp->dup_acks_sent,
			tp->pkts_recv,
			tp->dup_pkts_recv);
    
>+		seq_printf(seq, "%*s\n", LINESZ - 1 - len, "");

This padding of line is bogus, just print variable length line.
Are you trying to make it fixed length record file?

^ permalink raw reply

* Re: [RFC 2/2] net: Add support for NTB virtual ethernet device
From: Jiri Pirko @ 2012-07-13 23:14 UTC (permalink / raw)
  To: Jon Mason; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <1342215900-3358-2-git-send-email-jon.mason@intel.com>

Fri, Jul 13, 2012 at 11:45:00PM CEST, jon.mason@intel.com wrote:
>A virtual ethernet device that uses the NTB transport API to send/receive data.
>
>Signed-off-by: Jon Mason <jon.mason@intel.com>
>---
> drivers/net/Kconfig      |    4 +
> drivers/net/Makefile     |    1 +
> drivers/net/ntb_netdev.c |  411 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 416 insertions(+), 0 deletions(-)
> create mode 100644 drivers/net/ntb_netdev.c
>
>diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>index 0c2bd80..9bf8a71 100644
>--- a/drivers/net/Kconfig
>+++ b/drivers/net/Kconfig
>@@ -178,6 +178,10 @@ config NETPOLL_TRAP
> config NET_POLL_CONTROLLER
> 	def_bool NETPOLL
> 
>+config NTB_NETDEV
>+	tristate "Virtual Ethernet over NTB"
>+	depends on NTB
>+
> config RIONET
> 	tristate "RapidIO Ethernet over messaging driver support"
> 	depends on RAPIDIO
>diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>index 3d375ca..9890148 100644
>--- a/drivers/net/Makefile
>+++ b/drivers/net/Makefile
>@@ -69,3 +69,4 @@ obj-$(CONFIG_USB_IPHETH)        += usb/
> obj-$(CONFIG_USB_CDC_PHONET)   += usb/
> 
> obj-$(CONFIG_HYPERV_NET) += hyperv/
>+obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o
>diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
>new file mode 100644
>index 0000000..bcbd9d4
>--- /dev/null
>+++ b/drivers/net/ntb_netdev.c
>@@ -0,0 +1,411 @@
>+/*
>+ * This file is provided under a dual BSD/GPLv2 license.  When using or
>+ *   redistributing this file, you may do so under either license.
>+ *
>+ *   GPL LICENSE SUMMARY
>+ *
>+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
>+ *
>+ *   This program is free software; you can redistribute it and/or modify
>+ *   it under the terms of version 2 of the GNU General Public License as
>+ *   published by the Free Software Foundation.
>+ *
>+ *   This program is distributed in the hope that it will be useful, but
>+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
>+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>+ *   General Public License for more details.
>+ *
>+ *   You should have received a copy of the GNU General Public License
>+ *   along with this program; if not, write to the Free Software
>+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
>+ *   The full GNU General Public License is included in this distribution
>+ *   in the file called LICENSE.GPL.
>+ *
>+ *   BSD LICENSE
>+ *
>+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
>+ *
>+ *   Redistribution and use in source and binary forms, with or without
>+ *   modification, are permitted provided that the following conditions
>+ *   are met:
>+ *
>+ *     * Redistributions of source code must retain the above copyright
>+ *       notice, this list of conditions and the following disclaimer.
>+ *     * Redistributions in binary form must reproduce the above copy
>+ *       notice, this list of conditions and the following disclaimer in
>+ *       the documentation and/or other materials provided with the
>+ *       distribution.
>+ *     * Neither the name of Intel Corporation nor the names of its
>+ *       contributors may be used to endorse or promote products derived
>+ *       from this software without specific prior written permission.
>+ *
>+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>+ *
>+ * Intel PCIe NTB Network Linux driver
>+ *
>+ * Contact Information:
>+ * Jon Mason <jon.mason@intel.com>
>+ */
>+#include <linux/etherdevice.h>
>+#include <linux/ethtool.h>
>+#include <linux/module.h>
>+#include <linux/ntb.h>
>+
>+#define NTB_NETDEV_VER	"0.4"

Is it really necessary to provide this in-file versioning? Doesn't
kernel version itself do the trick?

>+
>+MODULE_DESCRIPTION(KBUILD_MODNAME);
>+MODULE_VERSION(NTB_NETDEV_VER);
>+MODULE_LICENSE("Dual BSD/GPL");
>+MODULE_AUTHOR("Intel Corporation");
>+
>+struct ntb_netdev {
>+	struct net_device *ndev;
>+	struct ntb_transport_qp *qp;
>+};
>+
>+#define	NTB_TX_TIMEOUT_MS	1000
>+#define	NTB_RXQ_SIZE		100
>+
>+static struct net_device *netdev;
>+
>+static void ntb_netdev_event_handler(int status)
>+{
>+	struct ntb_netdev *dev = netdev_priv(netdev);
>+
>+	pr_debug("%s: Event %x, Link %x\n", KBUILD_MODNAME, status,
>+		 ntb_transport_link_query(dev->qp));
>+
>+	/* Currently, only link status event is supported */
>+	if (status)
>+		netif_carrier_on(netdev);
>+	else
>+		netif_carrier_off(netdev);
>+}
>+
>+static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp)
>+{
>+	struct net_device *ndev = netdev;
>+	struct sk_buff *skb;
>+	int len, rc;
>+
>+	while ((skb = ntb_transport_rx_dequeue(qp, &len))) {
>+		pr_debug("%s: %d byte payload received\n", __func__, len);
>+
>+		skb_put(skb, len);
>+		skb->protocol = eth_type_trans(skb, ndev);
>+		skb->ip_summed = CHECKSUM_NONE;
>+
>+		if (netif_rx(skb) == NET_RX_DROP) {
>+			ndev->stats.rx_errors++;
>+			ndev->stats.rx_dropped++;
>+		} else {
>+			ndev->stats.rx_packets++;
>+			ndev->stats.rx_bytes += len;
>+		}
>+
>+		skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
>+		if (!skb) {
>+			ndev->stats.rx_errors++;
>+			ndev->stats.rx_frame_errors++;
>+			pr_err("%s: No skb\n", __func__);
>+			break;
>+		}
>+
>+		rc = ntb_transport_rx_enqueue(qp, skb, skb->data,
>+					      ndev->mtu + ETH_HLEN);
>+		if (rc) {
>+			ndev->stats.rx_errors++;
>+			ndev->stats.rx_fifo_errors++;
>+			pr_err("%s: error re-enqueuing\n", __func__);
>+			break;
>+		}
>+	}
>+}
>+
>+static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp)
>+{
>+	struct net_device *ndev = netdev;
>+	struct sk_buff *skb;
>+	int len;
>+
>+	while ((skb = ntb_transport_tx_dequeue(qp, &len))) {
>+		ndev->stats.tx_packets++;
>+		ndev->stats.tx_bytes += skb->len;
>+		dev_kfree_skb(skb);
>+	}
>+
>+	if (netif_queue_stopped(ndev))
>+		netif_wake_queue(ndev);
>+}
>+
>+static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
>+					 struct net_device *ndev)
>+{
>+	struct ntb_netdev *dev = netdev_priv(ndev);
>+	int rc;
>+
>+	pr_debug("%s: ntb_transport_tx_enqueue\n", KBUILD_MODNAME);
>+
>+	rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
>+	if (rc)
>+		goto err;
>+
>+	return NETDEV_TX_OK;
>+
>+err:
>+	ndev->stats.tx_dropped++;
>+	ndev->stats.tx_errors++;
>+	netif_stop_queue(ndev);
>+	return NETDEV_TX_BUSY;
>+}
>+
>+static int ntb_netdev_open(struct net_device *ndev)
>+{
>+	struct ntb_netdev *dev = netdev_priv(ndev);
>+	struct sk_buff *skb;
>+	int rc, i, len;
>+
>+	/* Add some empty rx bufs */
>+	for (i = 0; i < NTB_RXQ_SIZE; i++) {
>+		skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
>+		if (!skb) {
>+			rc = -ENOMEM;
>+			goto err;
>+		}
>+
>+		rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
>+					      ndev->mtu + ETH_HLEN);
>+		if (rc == -EINVAL)
>+			goto err;
>+	}
>+
>+	netif_carrier_off(ndev);
>+	ntb_transport_link_up(dev->qp);
>+
>+	return 0;
>+
>+err:
>+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
>+		kfree(skb);
>+	return rc;
>+}
>+
>+static int ntb_netdev_close(struct net_device *ndev)
>+{
>+	struct ntb_netdev *dev = netdev_priv(ndev);
>+	struct sk_buff *skb;
>+	int len;
>+
>+	ntb_transport_link_down(dev->qp);
>+
>+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
>+		kfree(skb);
>+
>+	return 0;
>+}
>+
>+static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
>+{
>+	struct ntb_netdev *dev = netdev_priv(ndev);
>+	struct sk_buff *skb;
>+	int len, rc;
>+
>+	if (new_mtu > ntb_transport_max_size(dev->qp) - ETH_HLEN)
>+		return -EINVAL;
>+
>+	if (!netif_running(ndev)) {
>+		ndev->mtu = new_mtu;
>+		return 0;
>+	}
>+
>+	/* Bring down the link and dispose of posted rx entries */
>+	ntb_transport_link_down(dev->qp);
>+
>+	if (ndev->mtu < new_mtu) {
>+		int i;
>+
>+		for (i = 0; (skb = ntb_transport_rx_remove(dev->qp, &len)); i++)
>+			kfree(skb);
>+
>+		for (; i; i--) {
>+			skb = netdev_alloc_skb(ndev, new_mtu + ETH_HLEN);
>+			if (!skb) {
>+				rc = -ENOMEM;
>+				goto err;
>+			}
>+
>+			rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
>+						      new_mtu + ETH_HLEN);
>+			if (rc) {
>+				kfree(skb);
>+				goto err;
>+			}
>+		}
>+	}
>+
>+	ndev->mtu = new_mtu;
>+
>+	ntb_transport_link_up(dev->qp);
>+
>+	return 0;
>+
>+err:
>+	ntb_transport_link_down(dev->qp);
>+
>+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
>+		kfree(skb);
>+
>+	pr_err("Error changing MTU, device inoperable\n");

Would be maybe better to use netdev_err here (and on similar other
places)

Also, it might be good to provide rollback in case any of
netdev_alloc_skb() fails.

>+	return rc;
>+}
>+
>+static void ntb_netdev_tx_timeout(struct net_device *ndev)
>+{
>+	if (netif_running(ndev))
>+		netif_wake_queue(ndev);
>+}
>+
>+static const struct net_device_ops ntb_netdev_ops = {
>+	.ndo_open = ntb_netdev_open,
>+	.ndo_stop = ntb_netdev_close,
>+	.ndo_start_xmit = ntb_netdev_start_xmit,
>+	.ndo_change_mtu = ntb_netdev_change_mtu,
>+	.ndo_tx_timeout = ntb_netdev_tx_timeout,
>+	.ndo_set_mac_address = eth_mac_addr,

Does your device support mac change while it's up and running?

>+};
>+
>+static void ntb_get_drvinfo(__attribute__((unused)) struct net_device *dev,
>+			    struct ethtool_drvinfo *info)
>+{
>+	strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
>+	strlcpy(info->version, NTB_NETDEV_VER, sizeof(info->version));
>+}
>+
>+static const char ntb_nic_stats[][ETH_GSTRING_LEN] = {
>+	"rx_packets", "rx_bytes", "rx_errors", "rx_dropped", "rx_length_errors",
>+	"rx_frame_errors", "rx_fifo_errors",
>+	"tx_packets", "tx_bytes", "tx_errors", "tx_dropped",
>+};
>+
>+static int ntb_get_stats_count(__attribute__((unused)) struct net_device *dev)
>+{
>+	return ARRAY_SIZE(ntb_nic_stats);
>+}
>+
>+static int ntb_get_sset_count(struct net_device *dev, int sset)
>+{
>+	switch (sset) {
>+	case ETH_SS_STATS:
>+		return ntb_get_stats_count(dev);
>+	default:
>+		return -EOPNOTSUPP;
>+	}
>+}
>+
>+static void ntb_get_strings(__attribute__((unused)) struct net_device *dev,
>+			    u32 sset, u8 *data)
>+{
>+	switch (sset) {
>+	case ETH_SS_STATS:
>+		memcpy(data, *ntb_nic_stats, sizeof(ntb_nic_stats));
>+	}
>+}
>+
>+static void
>+ntb_get_ethtool_stats(struct net_device *dev,
>+		      __attribute__((unused)) struct ethtool_stats *stats,
>+		      u64 *data)
>+{
>+	int i = 0;
>+
>+	data[i++] = dev->stats.rx_packets;
>+	data[i++] = dev->stats.rx_bytes;
>+	data[i++] = dev->stats.rx_errors;
>+	data[i++] = dev->stats.rx_dropped;
>+	data[i++] = dev->stats.rx_length_errors;
>+	data[i++] = dev->stats.rx_frame_errors;
>+	data[i++] = dev->stats.rx_fifo_errors;
>+	data[i++] = dev->stats.tx_packets;
>+	data[i++] = dev->stats.tx_bytes;
>+	data[i++] = dev->stats.tx_errors;
>+	data[i++] = dev->stats.tx_dropped;
>+}
>+
>+static const struct ethtool_ops ntb_ethtool_ops = {
>+	.get_drvinfo = ntb_get_drvinfo,
>+	.get_sset_count = ntb_get_sset_count,
>+	.get_strings = ntb_get_strings,
>+	.get_ethtool_stats = ntb_get_ethtool_stats,
>+	.get_link = ethtool_op_get_link,
>+};
>+
>+static int __init ntb_netdev_init_module(void)
>+{
>+	struct ntb_netdev *dev;
>+	int rc;
>+
>+	pr_info("%s: Probe\n", KBUILD_MODNAME);
>+
>+	netdev = alloc_etherdev(sizeof(struct ntb_netdev));

I might be missing something but this place (module init) does not seems
like a good place to do alloc_etherdev(). Do you want to support only
one netdevice instance?

Anyway, I think that using "static netdev" should be avoided in any case.


>+	if (!netdev)
>+		return -ENOMEM;
>+
>+	dev = netdev_priv(netdev);
>+	dev->ndev = netdev;
>+	netdev->features = NETIF_F_HIGHDMA;
>+
>+	netdev->hw_features = netdev->features;
>+	netdev->watchdog_timeo = msecs_to_jiffies(NTB_TX_TIMEOUT_MS);
>+
>+	random_ether_addr(netdev->perm_addr);
>+	memcpy(netdev->dev_addr, netdev->perm_addr, netdev->addr_len);
>+
>+	netdev->netdev_ops = &ntb_netdev_ops;
>+	SET_ETHTOOL_OPS(netdev, &ntb_ethtool_ops);
>+
>+	dev->qp = ntb_transport_create_queue(ntb_netdev_rx_handler,
>+					     ntb_netdev_tx_handler,
>+					     ntb_netdev_event_handler);
>+	if (!dev->qp) {
>+		rc = -EIO;
>+		goto err;
>+	}
>+
>+	netdev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
>+
>+	rc = register_netdev(netdev);
>+	if (rc)
>+		goto err1;
>+
>+	pr_info("%s: %s created\n", KBUILD_MODNAME, netdev->name);
>+	return 0;
>+
>+err1:
>+	ntb_transport_free_queue(dev->qp);
>+err:
>+	free_netdev(netdev);
>+	return rc;
>+}
>+module_init(ntb_netdev_init_module);
>+
>+static void __exit ntb_netdev_exit_module(void)
>+{
>+	struct ntb_netdev *dev = netdev_priv(netdev);
>+
>+	unregister_netdev(netdev);
>+	ntb_transport_free_queue(dev->qp);
>+	free_netdev(netdev);
>+
>+	pr_info("%s: Driver removed\n", KBUILD_MODNAME);
>+}
>+module_exit(ntb_netdev_exit_module);
>-- 
>1.7.5.4
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Ksummit-2012-discuss] Organising Mini Summits within the Kernel Summit
From: Ben Hutchings @ 2012-07-13 22:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Bottomley, ksummit-2012-discuss, netdev
In-Reply-To: <20120711084417.4a8132ff@nehalam.linuxnetplumber.net>

On Wed, 2012-07-11 at 08:44 -0700, Stephen Hemminger wrote:
> On Wed, 11 Jul 2012 09:09:15 +0100
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > Hi All,
> > 
> > We have set aside the second day of the kernel summit (Tuesday 28
> > August) as mini-summit day.  So far we have only the PCI mini summit on
> > this day, so if you can think of other topics, please send them to the
> > kernel summit discuss list:
> > 
> > ksummit-2012-discuss@lists.linux-foundation.org
> > 
> > Looking at the available rooms, we think we can run about four or five
> > mini summits.
> > 
> > As an added incentive, mini summit organisers get to pick who they
> > invite and all the people they pick will get an automatic invitation to
> > the third day of the kernel summit (but not the core first day) and the
> > evening events.
> > 
> > James
> 
> Is there enough interest to have a networking mini-summit?

I would be interested.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [RFC 2/2] net: Add support for NTB virtual ethernet device
From: Jon Mason @ 2012-07-13 21:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, linux-pci, Dave Jiang
In-Reply-To: <1342215900-3358-1-git-send-email-jon.mason@intel.com>

A virtual ethernet device that uses the NTB transport API to send/receive data.

Signed-off-by: Jon Mason <jon.mason@intel.com>
---
 drivers/net/Kconfig      |    4 +
 drivers/net/Makefile     |    1 +
 drivers/net/ntb_netdev.c |  411 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 416 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/ntb_netdev.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0c2bd80..9bf8a71 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -178,6 +178,10 @@ config NETPOLL_TRAP
 config NET_POLL_CONTROLLER
 	def_bool NETPOLL
 
+config NTB_NETDEV
+	tristate "Virtual Ethernet over NTB"
+	depends on NTB
+
 config RIONET
 	tristate "RapidIO Ethernet over messaging driver support"
 	depends on RAPIDIO
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3d375ca..9890148 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -69,3 +69,4 @@ obj-$(CONFIG_USB_IPHETH)        += usb/
 obj-$(CONFIG_USB_CDC_PHONET)   += usb/
 
 obj-$(CONFIG_HYPERV_NET) += hyperv/
+obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o
diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
new file mode 100644
index 0000000..bcbd9d4
--- /dev/null
+++ b/drivers/net/ntb_netdev.c
@@ -0,0 +1,411 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Network Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/module.h>
+#include <linux/ntb.h>
+
+#define NTB_NETDEV_VER	"0.4"
+
+MODULE_DESCRIPTION(KBUILD_MODNAME);
+MODULE_VERSION(NTB_NETDEV_VER);
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Intel Corporation");
+
+struct ntb_netdev {
+	struct net_device *ndev;
+	struct ntb_transport_qp *qp;
+};
+
+#define	NTB_TX_TIMEOUT_MS	1000
+#define	NTB_RXQ_SIZE		100
+
+static struct net_device *netdev;
+
+static void ntb_netdev_event_handler(int status)
+{
+	struct ntb_netdev *dev = netdev_priv(netdev);
+
+	pr_debug("%s: Event %x, Link %x\n", KBUILD_MODNAME, status,
+		 ntb_transport_link_query(dev->qp));
+
+	/* Currently, only link status event is supported */
+	if (status)
+		netif_carrier_on(netdev);
+	else
+		netif_carrier_off(netdev);
+}
+
+static void ntb_netdev_rx_handler(struct ntb_transport_qp *qp)
+{
+	struct net_device *ndev = netdev;
+	struct sk_buff *skb;
+	int len, rc;
+
+	while ((skb = ntb_transport_rx_dequeue(qp, &len))) {
+		pr_debug("%s: %d byte payload received\n", __func__, len);
+
+		skb_put(skb, len);
+		skb->protocol = eth_type_trans(skb, ndev);
+		skb->ip_summed = CHECKSUM_NONE;
+
+		if (netif_rx(skb) == NET_RX_DROP) {
+			ndev->stats.rx_errors++;
+			ndev->stats.rx_dropped++;
+		} else {
+			ndev->stats.rx_packets++;
+			ndev->stats.rx_bytes += len;
+		}
+
+		skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
+		if (!skb) {
+			ndev->stats.rx_errors++;
+			ndev->stats.rx_frame_errors++;
+			pr_err("%s: No skb\n", __func__);
+			break;
+		}
+
+		rc = ntb_transport_rx_enqueue(qp, skb, skb->data,
+					      ndev->mtu + ETH_HLEN);
+		if (rc) {
+			ndev->stats.rx_errors++;
+			ndev->stats.rx_fifo_errors++;
+			pr_err("%s: error re-enqueuing\n", __func__);
+			break;
+		}
+	}
+}
+
+static void ntb_netdev_tx_handler(struct ntb_transport_qp *qp)
+{
+	struct net_device *ndev = netdev;
+	struct sk_buff *skb;
+	int len;
+
+	while ((skb = ntb_transport_tx_dequeue(qp, &len))) {
+		ndev->stats.tx_packets++;
+		ndev->stats.tx_bytes += skb->len;
+		dev_kfree_skb(skb);
+	}
+
+	if (netif_queue_stopped(ndev))
+		netif_wake_queue(ndev);
+}
+
+static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff *skb,
+					 struct net_device *ndev)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+	int rc;
+
+	pr_debug("%s: ntb_transport_tx_enqueue\n", KBUILD_MODNAME);
+
+	rc = ntb_transport_tx_enqueue(dev->qp, skb, skb->data, skb->len);
+	if (rc)
+		goto err;
+
+	return NETDEV_TX_OK;
+
+err:
+	ndev->stats.tx_dropped++;
+	ndev->stats.tx_errors++;
+	netif_stop_queue(ndev);
+	return NETDEV_TX_BUSY;
+}
+
+static int ntb_netdev_open(struct net_device *ndev)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct sk_buff *skb;
+	int rc, i, len;
+
+	/* Add some empty rx bufs */
+	for (i = 0; i < NTB_RXQ_SIZE; i++) {
+		skb = netdev_alloc_skb(ndev, ndev->mtu + ETH_HLEN);
+		if (!skb) {
+			rc = -ENOMEM;
+			goto err;
+		}
+
+		rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
+					      ndev->mtu + ETH_HLEN);
+		if (rc == -EINVAL)
+			goto err;
+	}
+
+	netif_carrier_off(ndev);
+	ntb_transport_link_up(dev->qp);
+
+	return 0;
+
+err:
+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+		kfree(skb);
+	return rc;
+}
+
+static int ntb_netdev_close(struct net_device *ndev)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct sk_buff *skb;
+	int len;
+
+	ntb_transport_link_down(dev->qp);
+
+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+		kfree(skb);
+
+	return 0;
+}
+
+static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
+{
+	struct ntb_netdev *dev = netdev_priv(ndev);
+	struct sk_buff *skb;
+	int len, rc;
+
+	if (new_mtu > ntb_transport_max_size(dev->qp) - ETH_HLEN)
+		return -EINVAL;
+
+	if (!netif_running(ndev)) {
+		ndev->mtu = new_mtu;
+		return 0;
+	}
+
+	/* Bring down the link and dispose of posted rx entries */
+	ntb_transport_link_down(dev->qp);
+
+	if (ndev->mtu < new_mtu) {
+		int i;
+
+		for (i = 0; (skb = ntb_transport_rx_remove(dev->qp, &len)); i++)
+			kfree(skb);
+
+		for (; i; i--) {
+			skb = netdev_alloc_skb(ndev, new_mtu + ETH_HLEN);
+			if (!skb) {
+				rc = -ENOMEM;
+				goto err;
+			}
+
+			rc = ntb_transport_rx_enqueue(dev->qp, skb, skb->data,
+						      new_mtu + ETH_HLEN);
+			if (rc) {
+				kfree(skb);
+				goto err;
+			}
+		}
+	}
+
+	ndev->mtu = new_mtu;
+
+	ntb_transport_link_up(dev->qp);
+
+	return 0;
+
+err:
+	ntb_transport_link_down(dev->qp);
+
+	while ((skb = ntb_transport_rx_remove(dev->qp, &len)))
+		kfree(skb);
+
+	pr_err("Error changing MTU, device inoperable\n");
+	return rc;
+}
+
+static void ntb_netdev_tx_timeout(struct net_device *ndev)
+{
+	if (netif_running(ndev))
+		netif_wake_queue(ndev);
+}
+
+static const struct net_device_ops ntb_netdev_ops = {
+	.ndo_open = ntb_netdev_open,
+	.ndo_stop = ntb_netdev_close,
+	.ndo_start_xmit = ntb_netdev_start_xmit,
+	.ndo_change_mtu = ntb_netdev_change_mtu,
+	.ndo_tx_timeout = ntb_netdev_tx_timeout,
+	.ndo_set_mac_address = eth_mac_addr,
+};
+
+static void ntb_get_drvinfo(__attribute__((unused)) struct net_device *dev,
+			    struct ethtool_drvinfo *info)
+{
+	strlcpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
+	strlcpy(info->version, NTB_NETDEV_VER, sizeof(info->version));
+}
+
+static const char ntb_nic_stats[][ETH_GSTRING_LEN] = {
+	"rx_packets", "rx_bytes", "rx_errors", "rx_dropped", "rx_length_errors",
+	"rx_frame_errors", "rx_fifo_errors",
+	"tx_packets", "tx_bytes", "tx_errors", "tx_dropped",
+};
+
+static int ntb_get_stats_count(__attribute__((unused)) struct net_device *dev)
+{
+	return ARRAY_SIZE(ntb_nic_stats);
+}
+
+static int ntb_get_sset_count(struct net_device *dev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return ntb_get_stats_count(dev);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static void ntb_get_strings(__attribute__((unused)) struct net_device *dev,
+			    u32 sset, u8 *data)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		memcpy(data, *ntb_nic_stats, sizeof(ntb_nic_stats));
+	}
+}
+
+static void
+ntb_get_ethtool_stats(struct net_device *dev,
+		      __attribute__((unused)) struct ethtool_stats *stats,
+		      u64 *data)
+{
+	int i = 0;
+
+	data[i++] = dev->stats.rx_packets;
+	data[i++] = dev->stats.rx_bytes;
+	data[i++] = dev->stats.rx_errors;
+	data[i++] = dev->stats.rx_dropped;
+	data[i++] = dev->stats.rx_length_errors;
+	data[i++] = dev->stats.rx_frame_errors;
+	data[i++] = dev->stats.rx_fifo_errors;
+	data[i++] = dev->stats.tx_packets;
+	data[i++] = dev->stats.tx_bytes;
+	data[i++] = dev->stats.tx_errors;
+	data[i++] = dev->stats.tx_dropped;
+}
+
+static const struct ethtool_ops ntb_ethtool_ops = {
+	.get_drvinfo = ntb_get_drvinfo,
+	.get_sset_count = ntb_get_sset_count,
+	.get_strings = ntb_get_strings,
+	.get_ethtool_stats = ntb_get_ethtool_stats,
+	.get_link = ethtool_op_get_link,
+};
+
+static int __init ntb_netdev_init_module(void)
+{
+	struct ntb_netdev *dev;
+	int rc;
+
+	pr_info("%s: Probe\n", KBUILD_MODNAME);
+
+	netdev = alloc_etherdev(sizeof(struct ntb_netdev));
+	if (!netdev)
+		return -ENOMEM;
+
+	dev = netdev_priv(netdev);
+	dev->ndev = netdev;
+	netdev->features = NETIF_F_HIGHDMA;
+
+	netdev->hw_features = netdev->features;
+	netdev->watchdog_timeo = msecs_to_jiffies(NTB_TX_TIMEOUT_MS);
+
+	random_ether_addr(netdev->perm_addr);
+	memcpy(netdev->dev_addr, netdev->perm_addr, netdev->addr_len);
+
+	netdev->netdev_ops = &ntb_netdev_ops;
+	SET_ETHTOOL_OPS(netdev, &ntb_ethtool_ops);
+
+	dev->qp = ntb_transport_create_queue(ntb_netdev_rx_handler,
+					     ntb_netdev_tx_handler,
+					     ntb_netdev_event_handler);
+	if (!dev->qp) {
+		rc = -EIO;
+		goto err;
+	}
+
+	netdev->mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
+
+	rc = register_netdev(netdev);
+	if (rc)
+		goto err1;
+
+	pr_info("%s: %s created\n", KBUILD_MODNAME, netdev->name);
+	return 0;
+
+err1:
+	ntb_transport_free_queue(dev->qp);
+err:
+	free_netdev(netdev);
+	return rc;
+}
+module_init(ntb_netdev_init_module);
+
+static void __exit ntb_netdev_exit_module(void)
+{
+	struct ntb_netdev *dev = netdev_priv(netdev);
+
+	unregister_netdev(netdev);
+	ntb_transport_free_queue(dev->qp);
+	free_netdev(netdev);
+
+	pr_info("%s: Driver removed\n", KBUILD_MODNAME);
+}
+module_exit(ntb_netdev_exit_module);
-- 
1.7.5.4

^ permalink raw reply related

* [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-07-13 21:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, linux-pci, Dave Jiang

A PCI-Express non-transparent bridge (NTB) is a point-to-point PCIe bus
connecting 2 systems, providing electrical isolation between the two subsystems.
A non-transparent bridge is functionally similar to a transparent bridge except
that both sides of the bridge have their own independent address domains.  The
host on one side of the bridge will not have the visibility of the complete
memory or I/O space on the other side of the bridge.  To communicate across the
non-transparent bridge, each NTB endpoint has one (or more) apertures exposed to
the local system.  Writes to these apertures are mirrored to memory on the
remote system.  Communications can also occur through the use of doorbell
registers that initiate interrupts to the alternate domain, and scratch-pad
registers accessible from both sides.

The NTB device driver is needed to configure these memory windows, doorbell, and
scratch-pad registers as well as use them in such a way as they can be turned
into a viable communication channel to the remote system.  ntb_hw.[ch]
determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
the underlying hardware to provide access and a common interface to the doorbell
registers, scratch pads, and memory windows.  These hardware interfaces are
exported so that other, non-mainlined kernel drivers can access these.
ntb_transport.[ch] also uses the exported interfaces in ntb_hw.[ch] to setup a
communication channel(s) and provide a reliable way of transferring data from
one side to the other, which it then exports so that "client" drivers can access
them.  These client drivers are used to provide a standard kernel interface
(i.e., Ethernet device) to NTB, such that Linux can transfer data from one
system to the other in a standard way.

Signed-off-by: Jon Mason <jon.mason@intel.com>
---
 MAINTAINERS                 |    6 +
 drivers/Kconfig             |    2 +
 drivers/Makefile            |    1 +
 drivers/ntb/Kconfig         |   13 +
 drivers/ntb/Makefile        |    3 +
 drivers/ntb/ntb_hw.c        | 1283 +++++++++++++++++++++++++++++++++++++++++++
 drivers/ntb/ntb_hw.h        |  115 ++++
 drivers/ntb/ntb_regs.h      |  150 +++++
 drivers/ntb/ntb_transport.c | 1283 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/ntb.h         |   78 +++
 10 files changed, 2934 insertions(+), 0 deletions(-)
 create mode 100644 drivers/ntb/Kconfig
 create mode 100644 drivers/ntb/Makefile
 create mode 100644 drivers/ntb/ntb_hw.c
 create mode 100644 drivers/ntb/ntb_hw.h
 create mode 100644 drivers/ntb/ntb_regs.h
 create mode 100644 drivers/ntb/ntb_transport.c
 create mode 100644 include/linux/ntb.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d1d9ae6..70d7e0d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4818,6 +4818,12 @@ S:	Maintained
 F:	Documentation/scsi/NinjaSCSI.txt
 F:	drivers/scsi/nsp32*
 
+NTB DRIVER
+M:	Jon Mason <jon.mason@intel.com>
+S:	Supported
+F:	drivers/ntb/
+F:	include/linux/ntb.h
+
 NTFS FILESYSTEM
 M:	Anton Altaparmakov <anton@tuxera.com>
 L:	linux-ntfs-dev@lists.sourceforge.net
diff --git a/drivers/Kconfig b/drivers/Kconfig
index bfc9186..ebc16d3 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -148,4 +148,6 @@ source "drivers/iio/Kconfig"
 
 source "drivers/vme/Kconfig"
 
+source "drivers/ntb/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 2ba29ff..39bba94 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -137,3 +137,4 @@ obj-$(CONFIG_EXTCON)		+= extcon/
 obj-$(CONFIG_MEMORY)		+= memory/
 obj-$(CONFIG_IIO)		+= iio/
 obj-$(CONFIG_VME_BUS)		+= vme/
+obj-$(CONFIG_NTB)		+= ntb/
diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
new file mode 100644
index 0000000..f69df793
--- /dev/null
+++ b/drivers/ntb/Kconfig
@@ -0,0 +1,13 @@
+config NTB
+       tristate "Intel Non-Transparent Bridge support"
+       depends on PCI
+       depends on X86
+       help
+        The PCI-E Non-transparent bridge hardware is a point-to-point PCI-E bus
+        connecting 2 systems.  When configured, writes to the device's PCI
+        mapped memory will be mirrored to a buffer on the remote system.  The
+        ntb Linux driver uses this point-to-point communication as a method to
+        transfer data from one system to the other.
+
+        If unsure, say N.
+
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
new file mode 100644
index 0000000..0b53393
--- /dev/null
+++ b/drivers/ntb/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_NTB) += ntb.o 
+
+ntb-objs := ntb_hw.o ntb_transport.o
diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
new file mode 100644
index 0000000..8f46317
--- /dev/null
+++ b/drivers/ntb/ntb_hw.c
@@ -0,0 +1,1283 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/debugfs.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include "ntb_hw.h"
+#include "ntb_regs.h"
+
+#define NTB_NAME	"Intel(R) PCI-E Non-Transparent Bridge Driver"
+#define NTB_VER		"0.20"
+
+MODULE_DESCRIPTION(NTB_NAME);
+MODULE_VERSION(NTB_VER);
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_AUTHOR("Intel Corporation");
+
+static int max_num_cbs = 2;
+module_param(max_num_cbs, uint, 0644);
+MODULE_PARM_DESC(max_num_cbs, "Maximum number of NTB transport connections");
+
+static bool no_msix;
+module_param(no_msix, bool, 0644);
+MODULE_PARM_DESC(no_msix, "Do not allow MSI-X interrupts to be selected");
+
+enum {
+	NTB_CONN_CLASSIC = 0,
+	NTB_CONN_B2B,
+	NTB_CONN_RP,
+};
+
+enum {
+	NTB_DEV_USD = 0,
+	NTB_DEV_DSD,
+};
+
+enum {
+	SNB_HW = 0,
+	BWD_HW,
+};
+
+struct ntb_mw {
+	dma_addr_t phys_addr;
+	void __iomem *vbase;
+	resource_size_t bar_sz;
+};
+
+struct ntb_db_cb {
+	void (*callback) (int db_num);
+	unsigned int db_num;
+	struct ntb_device *ndev;
+};
+
+struct ntb_device {
+	struct pci_dev *pdev;
+	struct msix_entry *msix_entries;
+	void __iomem *reg_base;
+	struct ntb_mw mw[NTB_NUM_MW];
+	struct {
+		unsigned int max_spads;
+		unsigned int max_db_bits;
+		unsigned int msix_cnt;
+	} limits;
+	struct {
+		void __iomem *pdb;
+		void __iomem *pdb_mask;
+		void __iomem *sdb;
+		void __iomem *sbar2_xlat;
+		void __iomem *sbar4_xlat;
+		void __iomem *spad_write;
+		void __iomem *spad_read;
+		void __iomem *lnk_cntl;
+		void __iomem *lnk_stat;
+		void __iomem *spci_cmd;
+	} reg_ofs;
+	void *ntb_transport;
+	void (*event_cb)(void *handle, unsigned int event);
+
+	struct ntb_db_cb *db_cb;
+	unsigned char hw_type;
+	unsigned char conn_type;
+	unsigned char dev_type;
+	unsigned char num_msix;
+	unsigned char bits_per_vector;
+	unsigned char max_cbs;
+	unsigned char link_status;
+	struct delayed_work hb_timer;
+	unsigned long last_ts;
+};
+
+/* Translate memory window 0,1 to BAR 2,4 */
+#define MW_TO_BAR(mw)	(mw * 2 + 2)
+
+static DEFINE_PCI_DEVICE_TABLE(ntb_pci_tbl) = {
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_BWD)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_JSF)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_JSF)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_SNB)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_SNB)},
+	{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB)},
+	{0}
+};
+MODULE_DEVICE_TABLE(pci, ntb_pci_tbl);
+
+static struct ntb_device *ntbdev;
+
+/**
+ * ntb_hw_link_status() - return the hardware link status
+ * @ndev: pointer to ntb_device instance
+ *
+ * Returns true if the hardware is connected to the remote system
+ *
+ * RETURNS: true or false based on the hardware link state
+ */
+bool ntb_hw_link_status(struct ntb_device *ndev)
+{
+	return ndev->link_status == NTB_LINK_UP;
+}
+EXPORT_SYMBOL(ntb_hw_link_status);
+
+/**
+ * ntb_query_pdev() - return the pci_dev pointer
+ * @ndev: pointer to ntb_device instance
+ *
+ * Given the ntb pointer return the pci_dev pointerfor the NTB hardware device
+ *
+ * RETURNS: a pointer to the ntb pci_dev
+ */
+struct pci_dev *ntb_query_pdev(struct ntb_device *ndev)
+{
+	return ndev->pdev;
+}
+EXPORT_SYMBOL(ntb_query_pdev);
+
+/**
+ * ntb_query_max_cbs() - return the maximum number of callback tuples
+ * @ndev: pointer to ntb_device instance
+ *
+ * The number of callbacks can vary depending on the platform and MSI-X/MSI
+ * enablement
+ *
+ * RETURNS: the maximum number of callback tuples (3, 15, or 33)
+ */
+unsigned int ntb_query_max_cbs(struct ntb_device *ndev)
+{
+	return ndev->max_cbs > max_num_cbs ? max_num_cbs : ndev->max_cbs;
+}
+EXPORT_SYMBOL(ntb_query_max_cbs);
+
+/**
+ * ntb_register_event_callback() - register event callback
+ * @ndev: pointer to ntb_device instance
+ * @func: callback function to register
+ *
+ * This function registers a callback for any HW driver events such as link
+ * up/down, power management notices and etc.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_register_event_callback(struct ntb_device *ndev,
+				void (*func)(void *handle, unsigned int event))
+{
+	if (ndev->event_cb)
+		return -EINVAL;
+
+	ndev->event_cb = func;
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_register_event_callback);
+
+/**
+ * ntb_unregister_event_callback() - unregisters the event callback
+ * @ndev: pointer to ntb_device instance
+ *
+ * This function unregisters the existing callback from transport
+ */
+void ntb_unregister_event_callback(struct ntb_device *ndev)
+{
+	ndev->event_cb = NULL;
+}
+EXPORT_SYMBOL(ntb_unregister_event_callback);
+
+/**
+ * ntb_register_db_callback() - register a callback for doorbell interrupt
+ * @ndev: pointer to ntb_device instance
+ * @idx: doorbell index to register callback, zero based
+ * @func: callback function to register
+ *
+ * This function registers a callback function for the doorbell interrupt
+ * on the primary side. The function will unmask the doorbell as well to
+ * allow interrupt.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_register_db_callback(struct ntb_device *ndev, unsigned int idx,
+			     void (*func) (int db_num))
+{
+	unsigned long mask;
+
+	if (idx >= ndev->max_cbs || ndev->db_cb[idx].callback) {
+		dev_warn(&ndev->pdev->dev, "Invalid Index.\n");
+		return -EINVAL;
+	}
+
+	ndev->db_cb[idx].callback = func;
+
+	/* unmask interrupt */
+	mask = readw(ndev->reg_ofs.pdb_mask);
+	clear_bit(idx * ndev->bits_per_vector, &mask);
+	writew(mask, ndev->reg_ofs.pdb_mask);
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_register_db_callback);
+
+/**
+ * ntb_unregister_db_callback() - unregister a callback for doorbell interrupt
+ * @ndev: pointer to ntb_device instance
+ * @idx: doorbell index to register callback, zero based
+ *
+ * This function unregisters a callback function for the doorbell interrupt
+ * on the primary side. The function will also mask the said doorbell.
+ */
+void ntb_unregister_db_callback(struct ntb_device *ndev, unsigned int idx)
+{
+	unsigned long mask;
+
+	if (idx >= ndev->max_cbs || !ndev->db_cb[idx].callback)
+		return;
+
+	mask = readw(ndev->reg_ofs.pdb_mask);
+	set_bit(idx * ndev->bits_per_vector, &mask);
+	writew(mask, ndev->reg_ofs.pdb_mask);
+
+	ndev->db_cb[idx].callback = NULL;
+}
+EXPORT_SYMBOL(ntb_unregister_db_callback);
+
+/**
+ * ntb_register_transport() - Register NTB transport with NTB HW driver
+ * @transport: transport identifier
+ *
+ * This function allows a transport to reserve the hardware driver for
+ * NTB usage.
+ *
+ * RETURNS: pointer to ntb_device, NULL on error.
+ */
+struct ntb_device *ntb_register_transport(void *transport)
+{
+	struct ntb_device *ndev = ntbdev;
+
+	if (ndev->ntb_transport)
+		return NULL;
+
+	ndev->ntb_transport = transport;
+	return ndev;
+}
+EXPORT_SYMBOL(ntb_register_transport);
+
+/**
+ * ntb_unregister_transport() - Unregister the transport with the NTB HW driver
+ * @ndev - ntb_device of the transport to be freed
+ *
+ * This function unregisters the transport from the HW driver and performs any
+ * necessary cleanups.
+ */
+void ntb_unregister_transport(struct ntb_device *ndev)
+{
+	int i;
+
+	if (!ndev->ntb_transport)
+		return;
+
+	for (i = 0; i < ndev->max_cbs; i++)
+		ntb_unregister_db_callback(ndev, i);
+
+	ntb_unregister_event_callback(ndev);
+	ndev->ntb_transport = NULL;
+}
+EXPORT_SYMBOL(ntb_unregister_transport);
+
+/**
+ * ntb_get_max_spads() - get the total scratch regs usable
+ * @ndev: pointer to ntb_device instance
+ *
+ * This function returns the max 32bit scratchpad registers usable by the
+ * upper layer.
+ *
+ * RETURNS: total number of scratch pad registers available
+ */
+int ntb_get_max_spads(struct ntb_device *ndev)
+{
+	return ndev->limits.max_spads;
+}
+EXPORT_SYMBOL(ntb_get_max_spads);
+
+/**
+ * ntb_write_local_spad() - write to the secondary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to the scratchpad register, 0 based
+ * @val: the data value to put into the register
+ *
+ * This function allows writing of a 32bit value to the indexed scratchpad
+ * register. The register resides on the secondary (external) side.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val)
+{
+	if (idx >= ndev->limits.max_spads)
+		return -EINVAL;
+
+	dev_dbg(&ndev->pdev->dev, "Writing %x to local scratch pad index %d\n",
+		val, idx);
+	writel(val, ndev->reg_ofs.spad_read + idx * 4);
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_write_local_spad);
+
+/**
+ * ntb_read_local_spad() - read from the primary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to scratchpad register, 0 based
+ * @val: pointer to 32bit integer for storing the register value
+ *
+ * This function allows reading of the 32bit scratchpad register on
+ * the primary (internal) side.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
+{
+	if (idx >= ndev->limits.max_spads)
+		return -EINVAL;
+
+	*val = readl(ndev->reg_ofs.spad_write + idx * 4);
+	dev_dbg(&ndev->pdev->dev,
+		"Reading %x from local scratch pad index %d\n", *val, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_read_local_spad);
+
+/**
+ * ntb_write_remote_spad() - write to the secondary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to the scratchpad register, 0 based
+ * @val: the data value to put into the register
+ *
+ * This function allows writing of a 32bit value to the indexed scratchpad
+ * register. The register resides on the secondary (external) side.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val)
+{
+	if (idx >= ndev->limits.max_spads)
+		return -EINVAL;
+
+	dev_dbg(&ndev->pdev->dev, "Writing %x to remote scratch pad index %d\n",
+		val, idx);
+	writel(val, ndev->reg_ofs.spad_write + idx * 4);
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_write_remote_spad);
+
+/**
+ * ntb_read_remote_spad() - read from the primary scratchpad register
+ * @ndev: pointer to ntb_device instance
+ * @idx: index to scratchpad register, 0 based
+ * @val: pointer to 32bit integer for storing the register value
+ *
+ * This function allows reading of the 32bit scratchpad register on
+ * the primary (internal) side.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
+{
+	if (idx >= ndev->limits.max_spads)
+		return -EINVAL;
+
+	*val = readl(ndev->reg_ofs.spad_read + idx * 4);
+	dev_dbg(&ndev->pdev->dev,
+		"Reading %x from remote scratch pad index %d\n", *val, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_read_remote_spad);
+
+/**
+ * ntb_get_mw_vbase() - get virtual addr for the NTB memory window
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ *
+ * This function provides the base virtual address of the memory window
+ * specified.
+ *
+ * RETURNS: pointer to virtual address, or NULL on error.
+ */
+void *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw)
+{
+	if (mw > NTB_NUM_MW)
+		return NULL;
+
+	return ndev->mw[mw].vbase;
+}
+EXPORT_SYMBOL(ntb_get_mw_vbase);
+
+/**
+ * ntb_get_mw_size() - return size of NTB memory window
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ *
+ * This function provides the physical size of the memory window specified
+ *
+ * RETURNS: the size of the memory window or zero on error
+ */
+resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw)
+{
+	if (mw > NTB_NUM_MW)
+		return 0;
+
+	return ndev->mw[mw].bar_sz;
+}
+EXPORT_SYMBOL(ntb_get_mw_size);
+
+/**
+ * ntb_set_mw_addr - set the memory window address
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ * @addr: base address for data
+ *
+ * This function sets the base physical address of the memory window.  This
+ * memory address is where data from the remote system will be transfered into
+ * or out of depending on how the transport is configured.
+ */
+void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr)
+{
+	if (mw > NTB_NUM_MW)
+		return;
+
+	dev_dbg(&ndev->pdev->dev, "Writing addr %Lx to BAR %d\n", addr,
+		MW_TO_BAR(mw));
+
+	ndev->mw[mw].phys_addr = addr;
+
+	switch (MW_TO_BAR(mw)) {
+	case NTB_BAR_23:
+		writeq(addr, ndev->reg_ofs.sbar2_xlat);
+		break;
+	case NTB_BAR_45:
+		writeq(addr, ndev->reg_ofs.sbar4_xlat);
+		break;
+	}
+}
+EXPORT_SYMBOL(ntb_set_mw_addr);
+
+/**
+ * ntb_ring_sdb() - Set the doorbell on the secondary/external side
+ * @ndev: pointer to ntb_device instance
+ * @db: doorbell to ring
+ *
+ * This function allows triggering of a doorbell on the secondary/external
+ * side that will initiate an interrupt on the remote host
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_ring_sdb(struct ntb_device *ndev, unsigned int db)
+{
+	dev_dbg(&ndev->pdev->dev, "%s: ringing doorbell %d\n", __func__, db);
+
+	if (db >= ndev->max_cbs)
+		return -EINVAL;
+
+	if (ndev->hw_type == BWD_HW)
+		writeq((u64) 1 << db, ndev->reg_ofs.sdb);
+	else
+		writew(((1 << ndev->bits_per_vector) - 1) <<
+		       (db * ndev->bits_per_vector), ndev->reg_ofs.sdb);
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_ring_sdb);
+
+static void ntb_link_event(struct ntb_device *ndev, int link_state)
+{
+	unsigned int event;
+
+	if (ndev->link_status == link_state)
+		return;
+
+	if (link_state == NTB_LINK_UP) {
+		u16 status;
+
+		dev_info(&ndev->pdev->dev, "Link Up\n");
+		ndev->link_status = NTB_LINK_UP;
+		event = NTB_EVENT_HW_LINK_UP;
+
+		if (ndev->hw_type == BWD_HW)
+			status = readw(ndev->reg_ofs.lnk_stat);
+		else {
+			int rc = pci_read_config_word(ndev->pdev,
+						      SNB_LINK_STATUS_OFFSET,
+						      &status);
+			if (rc)
+				return;
+		}
+		dev_info(&ndev->pdev->dev, "Link Width %d, Link Speed %d\n",
+			 (status & NTB_LINK_WIDTH_MASK) >> 4,
+			 (status & NTB_LINK_SPEED_MASK));
+	} else {
+		dev_info(&ndev->pdev->dev, "Link Down\n");
+		ndev->link_status = NTB_LINK_DOWN;
+		event = NTB_EVENT_HW_LINK_DOWN;
+	}
+
+	/* notify the upper layer if we have an event change */
+	if (ndev->event_cb)
+		ndev->event_cb(ndev->ntb_transport, event);
+}
+
+static int ntb_link_status(struct ntb_device *ndev)
+{
+	int link_state;
+
+	if (ndev->hw_type == BWD_HW) {
+		u32 ntb_cntl;
+
+		ntb_cntl = readl(ndev->reg_ofs.lnk_cntl);
+		if (ntb_cntl & BWD_CNTL_LINK_DOWN)
+			link_state = NTB_LINK_DOWN;
+		else
+			link_state = NTB_LINK_UP;
+	} else {
+		u16 status;
+		int rc;
+
+		rc = pci_read_config_word(ndev->pdev, SNB_LINK_STATUS_OFFSET,
+					  &status);
+		if (rc)
+			return rc;
+
+		if (status & NTB_LINK_STATUS_ACTIVE)
+			link_state = NTB_LINK_UP;
+		else
+			link_state = NTB_LINK_DOWN;
+	}
+
+	ntb_link_event(ndev, link_state);
+
+	return 0;
+}
+
+/* BWD doesn't have link status interrupt, poll on that platform */
+static void ntb_handle_heartbeat(struct work_struct *work)
+{
+	struct ntb_device *ndev = container_of(work, struct ntb_device,
+					       hb_timer.work);
+	unsigned long ts = jiffies;
+
+	/* If we haven't gotten an interrupt in a while, check the BWD link
+	 * status bit
+	 */
+	if (ts > ndev->last_ts + NTB_HB_TIMEOUT) {
+		int rc = ntb_link_status(ndev);
+		if (rc)
+			dev_err(&ndev->pdev->dev,
+				"Error determining link status\n");
+	}
+
+	schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);
+}
+
+static int ntb_xeon_setup(struct ntb_device *ndev)
+{
+	int rc;
+	u8 val;
+
+	ndev->hw_type = SNB_HW;
+
+	rc = pci_read_config_byte(ndev->pdev, NTB_PPD_OFFSET, &val);
+	if (rc)
+		return rc;
+
+	switch (val & SNB_PPD_CONN_TYPE) {
+	case NTB_CONN_B2B:
+		ndev->conn_type = NTB_CONN_B2B;
+		break;
+	case NTB_CONN_CLASSIC:
+	case NTB_CONN_RP:
+	default:
+		dev_err(&ndev->pdev->dev, "Only B2B supported at this time\n");
+		return -EINVAL;
+	}
+
+	if (val & SNB_PPD_DEV_TYPE)
+		ndev->dev_type = NTB_DEV_DSD;
+	else
+		ndev->dev_type = NTB_DEV_USD;
+
+	ndev->reg_ofs.pdb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
+	ndev->reg_ofs.pdb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
+	ndev->reg_ofs.sbar2_xlat = ndev->reg_base + SNB_SBAR2XLAT_OFFSET;
+	ndev->reg_ofs.sbar4_xlat = ndev->reg_base + SNB_SBAR4XLAT_OFFSET;
+	ndev->reg_ofs.lnk_cntl = ndev->reg_base + SNB_NTBCNTL_OFFSET;
+	ndev->reg_ofs.lnk_stat = ndev->reg_base + SNB_LINK_STATUS_OFFSET;
+	ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET;
+	ndev->reg_ofs.spci_cmd = ndev->reg_base + SNB_PCICMD_OFFSET;
+
+	if (ndev->conn_type == NTB_CONN_B2B) {
+		ndev->reg_ofs.sdb = ndev->reg_base + SNB_B2B_DOORBELL_OFFSET;
+		ndev->reg_ofs.spad_write = ndev->reg_base + SNB_B2B_SPAD_OFFSET;
+		ndev->limits.max_spads = SNB_MAX_SPADS;
+	} else {
+		ndev->reg_ofs.sdb = ndev->reg_base + SNB_SDOORBELL_OFFSET;
+		ndev->reg_ofs.spad_write = ndev->reg_base + SNB_SPAD_OFFSET;
+		ndev->limits.max_spads = SNB_MAX_COMPAT_SPADS;
+	}
+
+	ndev->limits.max_db_bits = SNB_MAX_DB_BITS;
+	ndev->limits.msix_cnt = SNB_MSIX_CNT;
+	ndev->bits_per_vector = SNB_DB_BITS_PER_VEC;
+
+	return 0;
+}
+
+static int ntb_bwd_setup(struct ntb_device *ndev)
+{
+	int rc;
+	u32 val;
+
+	ndev->hw_type = BWD_HW;
+
+	rc = pci_read_config_dword(ndev->pdev, NTB_PPD_OFFSET, &val);
+	if (rc)
+		return rc;
+
+	switch ((val & BWD_PPD_CONN_TYPE) >> 8) {
+	case NTB_CONN_B2B:
+		ndev->conn_type = NTB_CONN_B2B;
+		break;
+	case NTB_CONN_RP:
+	default:
+		dev_err(&ndev->pdev->dev, "Only B2B supported at this time\n");
+		return -EINVAL;
+	}
+
+	if (val & BWD_PPD_DEV_TYPE)
+		ndev->dev_type = NTB_DEV_DSD;
+	else
+		ndev->dev_type = NTB_DEV_USD;
+
+	/* Initiate PCI-E link training */
+	rc = pci_write_config_dword(ndev->pdev, NTB_PPD_OFFSET,
+				    val | BWD_PPD_INIT_LINK);
+	if (rc)
+		return rc;
+
+	ndev->reg_ofs.pdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
+	ndev->reg_ofs.pdb_mask = ndev->reg_base + BWD_PDBMSK_OFFSET;
+	ndev->reg_ofs.sbar2_xlat = ndev->reg_base + BWD_SBAR2XLAT_OFFSET;
+	ndev->reg_ofs.sbar4_xlat = ndev->reg_base + BWD_SBAR4XLAT_OFFSET;
+	ndev->reg_ofs.lnk_cntl = ndev->reg_base + BWD_NTBCNTL_OFFSET;
+	ndev->reg_ofs.lnk_stat = ndev->reg_base + BWD_LINK_STATUS_OFFSET;
+	ndev->reg_ofs.spad_read = ndev->reg_base + BWD_SPAD_OFFSET;
+	ndev->reg_ofs.spci_cmd = ndev->reg_base + BWD_PCICMD_OFFSET;
+
+	if (ndev->conn_type == NTB_CONN_B2B) {
+		ndev->reg_ofs.sdb = ndev->reg_base + BWD_B2B_DOORBELL_OFFSET;
+		ndev->reg_ofs.spad_write = ndev->reg_base + BWD_B2B_SPAD_OFFSET;
+		ndev->limits.max_spads = BWD_MAX_SPADS;
+	} else {
+		ndev->reg_ofs.sdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
+		ndev->reg_ofs.spad_write = ndev->reg_base + BWD_SPAD_OFFSET;
+		ndev->limits.max_spads = BWD_MAX_COMPAT_SPADS;
+	}
+
+	ndev->limits.max_db_bits = BWD_MAX_DB_BITS;
+	ndev->limits.msix_cnt = BWD_MSIX_CNT;
+	ndev->bits_per_vector = BWD_DB_BITS_PER_VEC;
+
+	/* Since bwd doesn't have a link interrupt, setup a heartbeat timer */
+	INIT_DELAYED_WORK(&ndev->hb_timer, ntb_handle_heartbeat);
+	schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);
+
+	return 0;
+}
+
+static int __devinit ntb_device_setup(struct ntb_device *ndev)
+{
+	int rc;
+
+	switch (ndev->pdev->device) {
+	case PCI_DEVICE_ID_INTEL_NTB_2ND_SNB:
+	case PCI_DEVICE_ID_INTEL_NTB_RP_JSF:
+	case PCI_DEVICE_ID_INTEL_NTB_RP_SNB:
+	case PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF:
+	case PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB:
+	case PCI_DEVICE_ID_INTEL_NTB_B2B_JSF:
+	case PCI_DEVICE_ID_INTEL_NTB_B2B_SNB:
+		rc = ntb_xeon_setup(ndev);
+		break;
+	case PCI_DEVICE_ID_INTEL_NTB_B2B_BWD:
+		rc = ntb_bwd_setup(ndev);
+		break;
+	default:
+		rc = -ENODEV;
+	}
+
+	/* Enable Bus Master and Memory Space on the secondary side */
+	writew(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER, ndev->reg_ofs.spci_cmd);
+
+	return rc;
+}
+
+static void ntb_device_free(struct ntb_device *ndev)
+{
+	if (ndev->hw_type == BWD_HW)
+		cancel_delayed_work_sync(&ndev->hb_timer);
+}
+
+static irqreturn_t bwd_callback_msix_irq(int irq, void *data)
+{
+	struct ntb_db_cb *db_cb = data;
+	struct ntb_device *ndev = db_cb->ndev;
+
+	dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for DB %d\n", irq,
+		db_cb->db_num);
+
+	if (db_cb->callback)
+		db_cb->callback(db_cb->db_num);
+
+	/* No need to check for the specific HB irq, any interrupt means
+	 * we're connected.
+	 */
+	ndev->last_ts = jiffies;
+
+	writeq((u64) 1 << db_cb->db_num, ndev->reg_ofs.pdb);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t xeon_callback_msix_irq(int irq, void *data)
+{
+	struct ntb_db_cb *db_cb = data;
+	struct ntb_device *ndev = db_cb->ndev;
+
+	dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for DB %d\n", irq,
+		db_cb->db_num);
+
+	if (db_cb->callback)
+		db_cb->callback(db_cb->db_num);
+
+	/* On Sandybridge, there are 16 bits in the interrupt register
+	 * but only 4 vectors.  So, 5 bits are assigned to the first 3
+	 * vectors, with the 4th having a single bit for link
+	 * interrupts.
+	 */
+	writew(((1 << ndev->bits_per_vector) - 1) <<
+	       (db_cb->db_num * ndev->bits_per_vector), ndev->reg_ofs.pdb);
+
+	return IRQ_HANDLED;
+}
+
+/* Since we do not have a HW doorbell in BWD, this is only used in JF/JT */
+static irqreturn_t xeon_event_msix_irq(int irq, void *dev)
+{
+	struct ntb_device *ndev = dev;
+	int rc;
+
+	dev_dbg(&ndev->pdev->dev, "MSI-X irq %d received for Events\n", irq);
+
+	rc = ntb_link_status(ndev);
+	if (rc)
+		dev_err(&ndev->pdev->dev, "Error determining link status\n");
+
+	/* bit 15 is always the link bit */
+	writew(1 << ndev->limits.max_db_bits, ndev->reg_ofs.pdb);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ntb_interrupt(int irq, void *dev)
+{
+	struct ntb_device *ndev = dev;
+	unsigned int i = 0;
+
+	if (ndev->hw_type == BWD_HW) {
+		u64 pdb = readq(ndev->reg_ofs.pdb);
+
+		dev_dbg(&ndev->pdev->dev, "irq %d - pdb = %Lx\n", irq, pdb);
+
+		while (pdb) {
+			i = __ffs(pdb);
+			pdb &= pdb - 1;
+			bwd_callback_msix_irq(irq, &ndev->db_cb[i]);
+		}
+	} else {
+		u16 pdb = readw(ndev->reg_ofs.pdb);
+
+		dev_dbg(&ndev->pdev->dev, "irq %d - pdb = %x sdb %x\n", irq,
+			pdb, readw(ndev->reg_ofs.sdb));
+
+		if (pdb & SNB_DB_HW_LINK) {
+			xeon_event_msix_irq(irq, dev);
+			pdb &= ~SNB_DB_HW_LINK;
+		}
+
+		while (pdb) {
+			i = __ffs(pdb);
+			pdb &= pdb - 1;
+			xeon_callback_msix_irq(irq, &ndev->db_cb[i]);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static int ntb_setup_msix(struct ntb_device *ndev)
+{
+	struct pci_dev *pdev = ndev->pdev;
+	struct msix_entry *msix;
+	int msix_entries;
+	int rc, i, pos;
+	u16 val;
+
+	if (no_msix) {
+		rc = -EINVAL;
+		goto err;
+	}
+
+	pos = pci_find_capability(pdev, PCI_CAP_ID_MSIX);
+	if (!pos) {
+		rc = -EIO;
+		goto err1;
+	}
+
+	rc = pci_read_config_word(pdev, pos + PCI_MSIX_FLAGS, &val);
+	if (rc)
+		goto err1;
+
+	msix_entries = msix_table_size(val);
+	if (msix_entries > ndev->limits.msix_cnt) {
+		rc = -EINVAL;
+		goto err1;
+	}
+
+	ndev->msix_entries = kmalloc(sizeof(struct msix_entry) * msix_entries,
+				     GFP_KERNEL);
+	if (!ndev->msix_entries) {
+		rc = -ENOMEM;
+		goto err1;
+	}
+
+	for (i = 0; i < msix_entries; i++)
+		ndev->msix_entries[i].entry = i;
+
+	rc = pci_enable_msix(pdev, ndev->msix_entries, msix_entries);
+	if (rc < 0)
+		goto err2;
+	if (rc > 0) {
+		/* On SNB, the link interrupt is always tied to 4th vector.  If
+		 * we can't get all 4, then we can't use MSI-X.
+		 */
+		if (ndev->hw_type != BWD_HW) {
+			rc = -EIO;
+			goto err2;
+		}
+
+		dev_warn(&pdev->dev,
+			 "Only %d MSI-X vectors.  Limiting the number of queues to that number.\n",
+			 rc);
+		msix_entries = rc;
+	}
+
+	for (i = 0; i < msix_entries; i++) {
+		msix = &ndev->msix_entries[i];
+		WARN_ON(!msix->vector);
+
+		/* Use the last MSI-X vector for Link status */
+		if (ndev->hw_type == BWD_HW) {
+			rc = request_irq(msix->vector, bwd_callback_msix_irq, 0,
+					 "ntb-callback-msix", &ndev->db_cb[i]);
+			if (rc)
+				goto err3;
+		} else {
+			if (i == msix_entries - 1) {
+				rc = request_irq(msix->vector,
+						 xeon_event_msix_irq, 0,
+						 "ntb-event-msix", ndev);
+				if (rc)
+					goto err3;
+			} else {
+				rc = request_irq(msix->vector,
+						 xeon_callback_msix_irq, 0,
+						 "ntb-callback-msix",
+						 &ndev->db_cb[i]);
+				if (rc)
+					goto err3;
+			}
+		}
+	}
+
+	ndev->num_msix = msix_entries;
+	if (ndev->hw_type == BWD_HW)
+		ndev->max_cbs = msix_entries;
+	else
+		ndev->max_cbs = msix_entries - 1;
+
+	return 0;
+
+err3:
+	while (--i >= 0) {
+		msix = &ndev->msix_entries[i];
+		if (ndev->hw_type != BWD_HW && i == ndev->num_msix - 1)
+			free_irq(msix->vector, ndev);
+		else
+			free_irq(msix->vector, &ndev->db_cb[i]);
+	}
+	pci_disable_msix(pdev);
+err2:
+	kfree(ndev->msix_entries);
+err1:
+	dev_err(&pdev->dev, "Error allocating MSI-X interrupt\n");
+err:
+	ndev->num_msix = 0;
+	return rc;
+}
+
+static int ntb_setup_msi(struct ntb_device *ndev)
+{
+	struct pci_dev *pdev = ndev->pdev;
+	int rc;
+
+	rc = pci_enable_msi(pdev);
+	if (rc)
+		return rc;
+
+	rc = request_irq(pdev->irq, ntb_interrupt, 0, "ntb-msi", ndev);
+	if (rc) {
+		pci_disable_msi(pdev);
+		dev_err(&pdev->dev, "Error allocating MSI interrupt\n");
+		return rc;
+	}
+
+	return 0;
+}
+
+static int ntb_setup_intx(struct ntb_device *ndev)
+{
+	struct pci_dev *pdev = ndev->pdev;
+	int rc;
+
+	pci_msi_off(pdev);
+
+	/* Verify intx is enabled */
+	pci_intx(pdev, 1);
+
+	rc = request_irq(pdev->irq, ntb_interrupt, IRQF_SHARED, "ntb-intx",
+			 ndev);
+	if (rc)
+		return rc;
+
+	return 0;
+}
+
+static int __devinit ntb_setup_interrupts(struct ntb_device *ndev)
+{
+	int rc;
+
+	/* On BWD, disable all interrupts.  On SNB, disable all but Link
+	 * Interrupt.  The rest will be unmasked as callbacks are registered.
+	 */
+	if (ndev->hw_type == BWD_HW)
+		writeq(~0, ndev->reg_ofs.pdb_mask);
+	else
+		writew(~(1 << ndev->limits.max_db_bits),
+		       ndev->reg_ofs.pdb_mask);
+
+	rc = ntb_setup_msix(ndev);
+	if (!rc)
+		goto done;
+
+	ndev->bits_per_vector = 1;
+	ndev->max_cbs = ndev->limits.max_db_bits;
+
+	rc = ntb_setup_msi(ndev);
+	if (!rc)
+		goto done;
+
+	rc = ntb_setup_intx(ndev);
+	if (rc) {
+		dev_err(&ndev->pdev->dev, "no usable interrupts\n");
+		return rc;
+	}
+
+done:
+	return 0;
+}
+
+static void __devexit ntb_free_interrupts(struct ntb_device *ndev)
+{
+	struct pci_dev *pdev = ndev->pdev;
+
+	/* mask interrupts */
+	if (ndev->hw_type == BWD_HW)
+		writeq(~0, ndev->reg_ofs.pdb_mask);
+	else
+		writew(~0, ndev->reg_ofs.pdb_mask);
+
+	if (ndev->num_msix) {
+		struct msix_entry *msix;
+		u32 i;
+
+		for (i = 0; i < ndev->num_msix; i++) {
+			msix = &ndev->msix_entries[i];
+			if (ndev->hw_type != BWD_HW && i == ndev->num_msix - 1)
+				free_irq(msix->vector, ndev);
+			else
+				free_irq(msix->vector, &ndev->db_cb[i]);
+		}
+		pci_disable_msix(pdev);
+	} else {
+		free_irq(pdev->irq, ndev);
+
+		if (pci_dev_msi_enabled(pdev))
+			pci_disable_msi(pdev);
+	}
+}
+
+static int __devinit ntb_create_callbacks(struct ntb_device *ndev)
+{
+	int i;
+
+	/* Checken-egg issue.  We won't know how many callbacks are necessary
+	 * until we see how many MSI-X vectors we get, but these pointers need
+	 * to be passed into the MSI-X register fucntion.  So, we allocate the
+	 * max, knowing that they might not all be used, to work around this.
+	 */
+	ndev->db_cb = kcalloc(ndev->limits.max_db_bits,
+			      sizeof(struct ntb_db_cb),
+			      GFP_KERNEL);
+	if (!ndev->db_cb)
+		return -ENOMEM;
+
+	for (i = 0; i < ndev->limits.max_db_bits; i++) {
+		ndev->db_cb[i].db_num = i;
+		ndev->db_cb[i].ndev = ndev;
+	}
+
+	return 0;
+}
+
+static void ntb_free_callbacks(struct ntb_device *ndev)
+{
+	int i;
+
+	for (i = 0; i < ndev->limits.max_db_bits; i++)
+		ntb_unregister_db_callback(ndev, i);
+
+	kfree(ndev->db_cb);
+}
+
+static int __devinit
+ntb_pci_probe(struct pci_dev *pdev,
+	      __attribute__((unused)) const struct pci_device_id *id)
+{
+	struct ntb_device *ndev;
+	int rc, i;
+
+	ndev = kzalloc(sizeof(struct ntb_device), GFP_KERNEL);
+	if (!ndev)
+		return -ENOMEM;
+
+	ntbdev = ndev;
+	ndev->pdev = pdev;
+	ndev->link_status = NTB_LINK_DOWN;
+	pci_set_drvdata(pdev, ndev);
+
+	rc = pci_enable_device(pdev);
+	if (rc)
+		goto err;
+
+	pci_set_master(ndev->pdev);
+
+	rc = pci_request_selected_regions(pdev, NTB_BAR_MASK, KBUILD_MODNAME);
+	if (rc)
+		goto err1;
+
+	ndev->reg_base = pci_ioremap_bar(pdev, NTB_BAR_MMIO);
+	if (!ndev->reg_base) {
+		dev_warn(&pdev->dev, "Cannot remap BAR 0\n");
+		rc = -EIO;
+		goto err2;
+	}
+
+	for (i = 0; i < NTB_NUM_MW; i++) {
+		ndev->mw[i].bar_sz = pci_resource_len(pdev, MW_TO_BAR(i));
+		ndev->mw[i].vbase =
+		    ioremap_wc(pci_resource_start(pdev, MW_TO_BAR(i)),
+			       ndev->mw[i].bar_sz);
+		dev_info(&pdev->dev, "MW %d size %d\n", i,
+			 (u32) pci_resource_len(pdev, MW_TO_BAR(i)));
+		if (!ndev->mw[i].vbase) {
+			dev_warn(&pdev->dev, "Cannot remap BAR %d\n",
+				 MW_TO_BAR(i));
+			rc = -EIO;
+			goto err3;
+		}
+	}
+
+	rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (rc) {
+		rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (rc)
+			goto err3;
+
+		dev_warn(&pdev->dev, "Cannot DMA highmem\n");
+	}
+
+	rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (rc) {
+		rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (rc)
+			goto err3;
+
+		dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
+	}
+
+	rc = ntb_device_setup(ndev);
+	if (rc)
+		goto err3;
+
+	rc = ntb_create_callbacks(ndev);
+	if (rc)
+		goto err4;
+
+	rc = ntb_setup_interrupts(ndev);
+	if (rc)
+		goto err5;
+
+	/* The scratchpad registers keep the values between rmmod/insmod,
+	 * blast them now
+	 */
+	for (i = 0; i < ndev->limits.max_spads; i++) {
+		ntb_write_local_spad(ndev, i, 0);
+		ntb_write_remote_spad(ndev, i, 0);
+	}
+
+	/* Let's bring the NTB link up */
+	writel(NTB_CNTL_BAR23_SNOOP | NTB_CNTL_BAR45_SNOOP,
+	       ndev->reg_ofs.lnk_cntl);
+
+	return 0;
+
+err5:
+	ntb_free_callbacks(ndev);
+err4:
+	ntb_device_free(ndev);
+err3:
+	for (i--; i >= 0; i--)
+		iounmap(ndev->mw[i].vbase);
+	iounmap(ndev->reg_base);
+err2:
+	pci_release_selected_regions(pdev, NTB_BAR_MASK);
+err1:
+	pci_disable_device(pdev);
+err:
+	kfree(ndev);
+
+	dev_err(&pdev->dev, "Error loading %s module\n", KBUILD_MODNAME);
+	return rc;
+}
+
+static void __devexit ntb_pci_remove(struct pci_dev *pdev)
+{
+	struct ntb_device *ndev = pci_get_drvdata(pdev);
+	int i;
+	u32 ntb_cntl;
+
+	/* Bring NTB link down */
+	ntb_cntl = readl(ndev->reg_ofs.lnk_cntl);
+	ntb_cntl |= NTB_LINK_DISABLE;
+	writel(ntb_cntl, ndev->reg_ofs.lnk_cntl);
+
+	ntb_free_interrupts(ndev);
+	ntb_free_callbacks(ndev);
+	ntb_device_free(ndev);
+
+	for (i = 0; i < NTB_NUM_MW; i++)
+		iounmap(ndev->mw[i].vbase);
+
+	iounmap(ndev->reg_base);
+	pci_release_selected_regions(pdev, NTB_BAR_MASK);
+	pci_disable_device(pdev);
+	kfree(ndev);
+}
+
+static struct pci_driver ntb_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = ntb_pci_tbl,
+	.probe = ntb_pci_probe,
+	.remove = __devexit_p(ntb_pci_remove),
+};
+
+static int __init ntb_init_module(void)
+{
+	pr_info("%s: %s, version %s\n", KBUILD_MODNAME, NTB_NAME, NTB_VER);
+
+	return pci_register_driver(&ntb_pci_driver);
+}
+module_init(ntb_init_module);
+
+static void __exit ntb_exit_module(void)
+{
+	pci_unregister_driver(&ntb_pci_driver);
+
+	pr_info("%s: Driver removed\n", KBUILD_MODNAME);
+}
+module_exit(ntb_exit_module);
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
new file mode 100644
index 0000000..4cad371
--- /dev/null
+++ b/drivers/ntb/ntb_hw.h
@@ -0,0 +1,115 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_JSF		0x3725
+#define PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF	0x3726
+#define PCI_DEVICE_ID_INTEL_NTB_RP_JSF		0x3727
+#define PCI_DEVICE_ID_INTEL_NTB_RP_SNB		0x3C08
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_SNB		0x3C0D
+#define PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB	0x3C0E
+#define PCI_DEVICE_ID_INTEL_NTB_2ND_SNB		0x3C0F
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_BWD		0x0C4E
+
+#define msix_table_size(control)	((control & PCI_MSIX_FLAGS_QSIZE)+1)
+
+#define NTB_BAR_MMIO		0
+#define NTB_BAR_23		2
+#define NTB_BAR_45		4
+#define NTB_BAR_MASK		((1 << NTB_BAR_MMIO) | (1 << NTB_BAR_23) |\
+				 (1 << NTB_BAR_45))
+
+#define NTB_LINK_DOWN		0
+#define NTB_LINK_UP		1
+
+#define NTB_HB_TIMEOUT		msecs_to_jiffies(1000)
+
+#define NTB_NUM_MW		2
+
+struct ntb_device;
+
+enum {
+	NTB_EVENT_SW_EVENT0	= (1 << 0),
+	NTB_EVENT_SW_EVENT1	= (1 << 1),
+	NTB_EVENT_SW_EVENT2	= (1 << 2),
+	NTB_EVENT_HW_ERROR	= (1 << 3),
+	NTB_EVENT_HW_LINK_UP	= (1 << 4),
+	NTB_EVENT_HW_LINK_DOWN	= (1 << 5),
+};
+
+bool ntb_hw_link_status(struct ntb_device *ndev);
+struct pci_dev *ntb_query_pdev(struct ntb_device *ndev);
+unsigned int ntb_query_max_cbs(struct ntb_device *ndev);
+struct ntb_device *ntb_register_transport(void *transport);
+void ntb_unregister_transport(struct ntb_device *ndev);
+void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr);
+int ntb_register_db_callback(struct ntb_device *ndev, unsigned int idx,
+			     void (*db_cb_func) (int db_num));
+void ntb_unregister_db_callback(struct ntb_device *ndev, unsigned int idx);
+int ntb_register_event_callback(struct ntb_device *ndev,
+				void (*event_cb_func) (void *handle,
+						       unsigned int event));
+void ntb_unregister_event_callback(struct ntb_device *ndev);
+int ntb_get_max_spads(struct ntb_device *ndev);
+int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
+int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
+int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
+int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
+void *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
+resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
+int ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
diff --git a/drivers/ntb/ntb_regs.h b/drivers/ntb/ntb_regs.h
new file mode 100644
index 0000000..c7b8a24
--- /dev/null
+++ b/drivers/ntb/ntb_regs.h
@@ -0,0 +1,150 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+#define NTB_LINK_ENABLE		0x0000
+#define NTB_LINK_DISABLE	0x0002
+#define NTB_LINK_STATUS_ACTIVE	0x2000
+#define NTB_LINK_SPEED_MASK	0x000f
+#define NTB_LINK_WIDTH_MASK	0x03f0
+
+#define SNB_MSIX_CNT		4
+#define SNB_MAX_SPADS		16
+#define SNB_MAX_COMPAT_SPADS	8
+/* Reserve the uppermost bit for link interrupt */
+#define SNB_MAX_DB_BITS		15
+#define SNB_DB_BITS_PER_VEC	5
+
+#define SNB_DB_HW_LINK		0x8000
+
+#define SNB_PCICMD_OFFSET	0x0504
+#define SNB_DEVCTRL_OFFSET	0x0598
+#define SNB_LINK_STATUS_OFFSET	0x01A2
+
+#define SNB_PBAR2LMT_OFFSET	0x0000
+#define SNB_PBAR4LMT_OFFSET	0x0008
+#define SNB_PBAR2XLAT_OFFSET	0x0010
+#define SNB_PBAR4XLAT_OFFSET	0x0018
+#define SNB_SBAR2LMT_OFFSET	0x0020
+#define SNB_SBAR4LMT_OFFSET	0x0028
+#define SNB_SBAR2XLAT_OFFSET	0x0030
+#define SNB_SBAR4XLAT_OFFSET	0x0038
+#define SNB_SBAR0BASE_OFFSET	0x0040
+#define SNB_SBAR2BASE_OFFSET	0x0048
+#define SNB_SBAR4BASE_OFFSET	0x0050
+#define SNB_NTBCNTL_OFFSET	0x0058
+#define SNB_SBDF_OFFSET		0x005C
+#define SNB_PDOORBELL_OFFSET	0x0060
+#define SNB_PDBMSK_OFFSET	0x0062
+#define SNB_SDOORBELL_OFFSET	0x0064
+#define SNB_SDBMSK_OFFSET	0x0066
+#define SNB_USMEMMISS		0x0070
+#define SNB_SPAD_OFFSET		0x0080
+#define SNB_SPADSEMA4_OFFSET	0x00c0
+#define SNB_WCCNTRL_OFFSET	0x00e0
+#define SNB_B2B_SPAD_OFFSET	0x0100
+#define SNB_B2B_DOORBELL_OFFSET	0x0140
+#define SNB_B2B_XLAT_OFFSET	0x0144
+
+#define BWD_MSIX_CNT		34
+#define BWD_MAX_SPADS		16
+#define BWD_MAX_COMPAT_SPADS	16
+#define BWD_MAX_DB_BITS		34
+#define BWD_DB_BITS_PER_VEC	1
+
+#define BWD_PCICMD_OFFSET	0xb004
+#define BWD_MBAR23_OFFSET	0xb018
+#define BWD_MBAR45_OFFSET	0xb020
+#define BWD_DEVCTRL_OFFSET	0xb048
+#define BWD_LINK_STATUS_OFFSET	0xb052
+
+#define BWD_SBAR2XLAT_OFFSET	0x0008
+#define BWD_SBAR4XLAT_OFFSET	0x0010
+#define BWD_PDOORBELL_OFFSET	0x0020
+#define BWD_PDBMSK_OFFSET	0x0028
+#define BWD_NTBCNTL_OFFSET	0x0060
+#define BWD_EBDF_OFFSET		0x0064
+#define BWD_SPAD_OFFSET		0x0080
+#define BWD_SPADSEMA_OFFSET	0x00c0
+#define BWD_STKYSPAD_OFFSET	0x00c4
+#define BWD_PBAR2XLAT_OFFSET	0x8008
+#define BWD_PBAR4XLAT_OFFSET	0x8010
+#define BWD_B2B_DOORBELL_OFFSET	0x8020
+#define BWD_B2B_SPAD_OFFSET	0x8080
+#define BWD_B2B_SPADSEMA_OFFSET	0x80c0
+#define BWD_B2B_STKYSPAD_OFFSET	0x80c4
+
+#define NTB_CNTL_BAR23_SNOOP	(1 << 2)
+#define NTB_CNTL_BAR45_SNOOP	(1 << 6)
+#define BWD_CNTL_LINK_DOWN	(1 << 16)
+
+#define NTB_PPD_OFFSET		0x00D4
+#define SNB_PPD_CONN_TYPE	0x0003
+#define SNB_PPD_DEV_TYPE	0x0010
+#define BWD_PPD_INIT_LINK	0x0004
+#define BWD_PPD_CONN_TYPE	0x0300
+#define BWD_PPD_DEV_TYPE	0x1000
+
+#define BWD_PBAR2XLAT_USD_ADDR	0x0000004000000000
+#define BWD_PBAR4XLAT_USD_ADDR	0x0000008000000000
+#define BWD_MBAR23_USD_ADDR	0x000000410000000C
+#define BWD_MBAR45_USD_ADDR	0x000000810000000C
+#define BWD_PBAR2XLAT_DSD_ADDR	0x0000004100000000
+#define BWD_PBAR4XLAT_DSD_ADDR	0x0000008100000000
+#define BWD_MBAR23_DSD_ADDR	0x000000400000000C
+#define BWD_MBAR45_DSD_ADDR	0x000000800000000C
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
new file mode 100644
index 0000000..88ab23a
--- /dev/null
+++ b/drivers/ntb/ntb_transport.c
@@ -0,0 +1,1283 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/errno.h>
+#include <linux/export.h>
+#include <linux/interrupt.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include "ntb_hw.h"
+
+static int transport_mtu = 0x4014;
+module_param(transport_mtu, uint, 0644);
+MODULE_PARM_DESC(transport_mtu, "Maximum size of NTB transport packets");
+
+struct ntb_queue_entry {
+	/* ntb_queue list reference */
+	struct list_head entry;
+	/* pointers to data to be transfered */
+	void *callback_data;
+	void *buf;
+	unsigned int len;
+	unsigned int flags;
+};
+
+struct ntb_transport_qp {
+	struct ntb_device *ndev;
+
+	bool client_ready;
+	bool qp_link;
+	u8 qp_num;	/* Only 64 QP's are allowed.  0-63 */
+
+	void (*tx_handler) (struct ntb_transport_qp *qp);
+	struct tasklet_struct tx_work;
+	struct list_head txq;
+	struct list_head txc;
+	struct list_head txe;
+	spinlock_t txq_lock;
+	spinlock_t txc_lock;
+	spinlock_t txe_lock;
+	void *tx_mw_begin;
+	void *tx_mw_end;
+	void *tx_offset;
+
+	void (*rx_handler) (struct ntb_transport_qp *qp);
+	struct tasklet_struct rx_work;
+	struct list_head rxq;
+	struct list_head rxc;
+	struct list_head rxe;
+	spinlock_t rxq_lock;
+	spinlock_t rxc_lock;
+	spinlock_t rxe_lock;
+	void *rx_buff_begin;
+	void *rx_buff_end;
+	void *rx_offset;
+
+	void (*event_handler) (int status);
+	struct delayed_work link_work;
+
+	struct dentry *debugfs_dir;
+	struct dentry *debugfs_stats;
+
+	/* Stats */
+	u64 rx_bytes;
+	u64 rx_pkts;
+	u64 rx_ring_empty;
+	u64 rx_err_no_buf;
+	u64 rx_err_oflow;
+	u64 rx_err_ver;
+	u64 tx_bytes;
+	u64 tx_pkts;
+	u64 tx_ring_full;
+};
+
+struct ntb_transport_mw {
+	size_t size;
+	void *virt_addr;
+	dma_addr_t dma_addr;
+};
+
+struct ntb_transport {
+	struct ntb_device *ndev;
+	struct ntb_transport_mw mw[NTB_NUM_MW];
+	struct ntb_transport_qp *qps;
+	unsigned int max_qps;
+	unsigned long qp_bitmap;
+	bool transport_link;
+	struct delayed_work link_work;
+	struct dentry *debugfs_dir;
+};
+
+enum {
+	DESC_DONE_FLAG = 1 << 0,
+	LINK_DOWN_FLAG = 1 << 1,
+	HW_ERROR_FLAG = 1 << 2,
+};
+
+struct ntb_payload_header {
+	u64 ver;
+	unsigned int len;
+	unsigned int flags;
+};
+
+enum {
+	MW0_SZ = 0,
+	MW1_SZ,
+	NUM_QPS,
+	QP_LINKS,
+	MAX_SPAD,
+};
+
+#define QP_TO_MW(qp)		((qp) % NTB_NUM_MW)
+#define NTB_QP_DEF_NUM_ENTRIES	100
+#define NTB_LINK_DOWN_TIMEOUT	10
+
+static struct ntb_transport *transport;
+
+static int debugfs_open(struct inode *inode, struct file *filp)
+{
+	filp->private_data = inode->i_private;
+	return 0;
+}
+
+static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
+			    loff_t *offp)
+{
+	struct ntb_transport_qp *qp;
+	char buf[512];
+	ssize_t ret, out_offset, out_count;
+
+	out_count = 512;
+
+	qp = filp->private_data;
+	out_offset = 0;
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "NTB Transport stats\n");
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_bytes - %llu\n", qp->rx_bytes);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_pkts - %llu\n", qp->rx_pkts);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_ring_empty - %llu\n", qp->rx_ring_empty);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_er_oflow - %llu\n", qp->rx_err_oflow);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_err_ver - %llu\n", qp->rx_err_ver);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "rx_offset - %p\n", qp->rx_offset);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_bytes - %llu\n", qp->tx_bytes);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_pkts - %llu\n", qp->tx_pkts);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_ring_full - %llu\n", qp->tx_ring_full);
+	out_offset += snprintf(buf + out_offset, out_count - out_offset,
+			       "tx_offset - %p\n", qp->tx_offset);
+
+	ret = simple_read_from_buffer(ubuf, count, offp, buf, out_offset);
+	return ret;
+}
+
+static const struct file_operations ntb_qp_debugfs_stats = {
+	.owner = THIS_MODULE,
+	.open = debugfs_open,
+	.read = debugfs_read,
+};
+
+static void ntb_list_add_head(spinlock_t *lock, struct list_head *entry,
+			      struct list_head *list)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(lock, flags);
+	list_add(entry, list);
+	spin_unlock_irqrestore(lock, flags);
+}
+
+static void ntb_list_add_tail(spinlock_t *lock, struct list_head *entry,
+			      struct list_head *list)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(lock, flags);
+	list_add_tail(entry, list);
+	spin_unlock_irqrestore(lock, flags);
+}
+
+static struct ntb_queue_entry *ntb_list_rm_head(spinlock_t *lock,
+						struct list_head *list)
+{
+	struct ntb_queue_entry *entry;
+	unsigned long flags;
+
+	spin_lock_irqsave(lock, flags);
+	if (list_empty(list)) {
+		entry = NULL;
+		goto out;
+	}
+	entry = list_first_entry(list, struct ntb_queue_entry, entry);
+	list_del(&entry->entry);
+out:
+	spin_unlock_irqrestore(lock, flags);
+
+	return entry;
+}
+
+static int ntb_transport_setup_qp_mw(unsigned int qp_num)
+{
+	struct ntb_transport_qp *qp = &transport->qps[qp_num];
+	u8 mw_num = QP_TO_MW(qp_num);
+	unsigned int size, num_qps_mw;
+
+	WARN_ON(transport->mw[mw_num].virt_addr == 0);
+
+	if (transport->max_qps % NTB_NUM_MW && !mw_num)
+		num_qps_mw = transport->max_qps / NTB_NUM_MW +
+		    (transport->max_qps % NTB_NUM_MW - mw_num);
+	else
+		num_qps_mw = transport->max_qps / NTB_NUM_MW;
+
+	size = transport->mw[mw_num].size / num_qps_mw;
+	pr_debug("orig size = %d, num qps = %d, size = %d\n",
+		 (int) transport->mw[mw_num].size, transport->max_qps, size);
+
+	qp->rx_buff_begin = transport->mw[mw_num].virt_addr +
+	    (qp_num / NTB_NUM_MW * size);
+	qp->rx_buff_end = qp->rx_buff_begin + size;
+	pr_info("QP %d - RX Buff start %p end %p\n", qp->qp_num,
+		qp->rx_buff_begin, qp->rx_buff_end);
+	qp->rx_offset = qp->rx_buff_begin;
+
+	qp->tx_mw_begin = ntb_get_mw_vbase(transport->ndev, mw_num) +
+	    (qp_num / NTB_NUM_MW * size);
+	qp->tx_mw_end = qp->tx_mw_begin + size;
+	pr_info("QP %d - TX MW start %p end %p\n", qp->qp_num, qp->tx_mw_begin,
+		qp->tx_mw_end);
+	qp->tx_offset = qp->tx_mw_begin;
+
+	qp->rx_pkts = 0;
+	qp->tx_pkts = 0;
+
+	return 0;
+}
+
+static int ntb_set_mw(int num_mw, unsigned int size)
+{
+	struct ntb_transport_mw *mw = &transport->mw[num_mw];
+	struct pci_dev *pdev = ntb_query_pdev(transport->ndev);
+	void *offset;
+
+	/* Alloc memory for receiving data.  Must be 4k aligned */
+	mw->size = ALIGN(size, 4096);
+
+	mw->virt_addr = dma_alloc_coherent(&pdev->dev, mw->size, &mw->dma_addr,
+					   GFP_KERNEL);
+	if (!mw->virt_addr) {
+		pr_err("Unable to allocate MW buffer of size %d\n",
+		       (int) mw->size);
+		return -ENOMEM;
+	}
+
+	/* setup the hdr offsets with 0's */
+	for (offset = mw->virt_addr;
+	     offset + sizeof(struct ntb_payload_header) < mw->virt_addr + size;
+	     offset += transport_mtu + sizeof(struct ntb_payload_header))
+		memset(offset, 0, sizeof(struct ntb_payload_header));
+
+	/* Notify HW the memory location of the receive buffer */
+	ntb_set_mw_addr(transport->ndev, num_mw, mw->dma_addr);
+
+	return 0;
+}
+
+static void ntb_transport_event_callback(void *data, unsigned int event)
+{
+	struct ntb_transport *nt = data;
+
+	if (event == NTB_EVENT_HW_ERROR)
+		BUG();
+
+	if (event == NTB_EVENT_HW_LINK_UP)
+		schedule_delayed_work(&nt->link_work, 0);
+
+	if (event == NTB_EVENT_HW_LINK_DOWN) {
+		int i;
+
+		nt->transport_link = NTB_LINK_DOWN;
+
+		/* Pass along the info to any clients */
+		for (i = 0; i < nt->max_qps; i++)
+			if (!test_bit(i, &nt->qp_bitmap)) {
+				struct ntb_transport_qp *qp = &nt->qps[i];
+
+				if (qp->event_handler &&
+				    qp->qp_link != NTB_LINK_DOWN)
+					qp->event_handler(NTB_LINK_DOWN);
+
+				qp->qp_link = NTB_LINK_DOWN;
+			}
+
+		/* The scratchpad registers keep the values if the remote side
+		 * goes down, blast them now to give them a sane value the next
+		 * time they are accessed
+		 */
+		for (i = 0; i < MAX_SPAD; i++) {
+			ntb_write_local_spad(transport->ndev, i, 0);
+			ntb_write_remote_spad(transport->ndev, i, 0);
+		}
+	}
+}
+
+static void ntb_transport_link_work(struct work_struct *work)
+{
+	struct ntb_transport *nt = container_of(work, struct ntb_transport,
+						link_work.work);
+	struct ntb_device *ndev = nt->ndev;
+	u32 val;
+	int rc, i;
+
+	/* send the local info */
+	rc = ntb_write_remote_spad(ndev, MW0_SZ, ntb_get_mw_size(ndev, 0));
+	if (rc) {
+		pr_err("Error writing %x to remote spad %d\n",
+		       (u32) ntb_get_mw_size(ndev, 0), MW0_SZ);
+		goto out;
+	}
+
+	rc = ntb_write_remote_spad(ndev, MW1_SZ, ntb_get_mw_size(ndev, 1));
+	if (rc) {
+		pr_err("Error writing %x to remote spad %d\n",
+		       (u32) ntb_get_mw_size(ndev, 1), MW1_SZ);
+		goto out;
+	}
+
+	rc = ntb_write_remote_spad(ndev, NUM_QPS, nt->max_qps);
+	if (rc) {
+		pr_err("Error writing %x to remote spad %d\n",
+		       nt->max_qps, NUM_QPS);
+		goto out;
+	}
+
+	rc = ntb_write_remote_spad(ndev, QP_LINKS, 0);
+	if (rc) {
+		pr_err("Error writing %x to remote spad %d\n", 0, QP_LINKS);
+		goto out;
+	}
+
+	/* Query the remote side for its info */
+	rc = ntb_read_remote_spad(ndev, NUM_QPS, &val);
+	if (rc) {
+		pr_err("Error reading remote spad %d\n", NUM_QPS);
+		goto out;
+	}
+
+	if (val != nt->max_qps)
+		goto out;
+	pr_info("Remote max number of qps = %d\n", val);
+
+	rc = ntb_read_remote_spad(ndev, MW0_SZ, &val);
+	if (rc) {
+		pr_err("Error reading remote spad %d\n", MW0_SZ);
+		goto out;
+	}
+
+	if (!val)
+		goto out;
+	pr_info("Remote MW0 size = %d\n", val);
+
+	rc = ntb_set_mw(0, val);
+	if (rc)
+		goto out;
+
+	rc = ntb_read_remote_spad(ndev, MW1_SZ, &val);
+	if (rc) {
+		pr_err("Error reading remote spad %d\n", MW1_SZ);
+		goto out;
+	}
+
+	if (!val)
+		goto out;
+	pr_info("Remote MW1 size = %d\n", val);
+
+	rc = ntb_set_mw(1, val);
+	if (rc)
+		goto out;
+
+	for (i = 0; i < nt->max_qps; i++) {
+		struct ntb_transport_qp *qp = &nt->qps[i];
+
+		rc = ntb_transport_setup_qp_mw(i);
+		if (rc)
+			goto out;
+
+		if (qp->client_ready)
+			schedule_delayed_work(&qp->link_work, 0);
+	}
+
+	nt->transport_link = NTB_LINK_UP;
+
+	return;
+
+out:
+	if (ntb_hw_link_status(ndev))
+		schedule_delayed_work(&nt->link_work,
+				      msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+}
+
+static void ntb_qp_link_work(struct work_struct *work)
+{
+	struct ntb_transport_qp *qp;
+	int rc, val;
+
+	qp = container_of(work, struct ntb_transport_qp, link_work.work);
+
+	WARN_ON(transport->transport_link != NTB_LINK_UP);
+
+	rc = ntb_read_local_spad(transport->ndev, QP_LINKS, &val);
+	if (rc) {
+		pr_err("Error reading spad %d\n", QP_LINKS);
+		return;
+	}
+
+	rc = ntb_write_remote_spad(transport->ndev, QP_LINKS,
+				   val | 1 << qp->qp_num);
+	if (rc)
+		pr_err("Error writing %x to remote spad %d\n",
+		       val | 1 << qp->qp_num, QP_LINKS);
+
+	/* query remote spad for qp ready bits */
+	rc = ntb_read_remote_spad(transport->ndev, QP_LINKS, &val);
+	if (rc)
+		pr_err("Error reading remote spad %d\n", QP_LINKS);
+
+	pr_debug("Remote QP link status = %x\n", val);
+
+	/* See if the remote side is up */
+	if (1 << qp->qp_num & val) {
+		qp->qp_link = NTB_LINK_UP;
+
+		if (qp->event_handler)
+			qp->event_handler(NTB_LINK_UP);
+	} else if (ntb_hw_link_status(transport->ndev))
+		schedule_delayed_work(&qp->link_work,
+				      msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+}
+
+static void ntb_transport_init_queue(unsigned int qp_num)
+{
+	struct ntb_transport_qp *qp;
+
+	qp = &transport->qps[qp_num];
+	qp->qp_num = qp_num;
+	qp->ndev = transport->ndev;
+	qp->qp_link = NTB_LINK_DOWN;
+
+	if (transport->debugfs_dir) {
+		char debugfs_name[4];
+
+		snprintf(debugfs_name, 4, "qp%d", qp_num);
+		qp->debugfs_dir = debugfs_create_dir(debugfs_name,
+						     transport->debugfs_dir);
+
+		qp->debugfs_stats = debugfs_create_file("stats", S_IRUSR,
+							qp->debugfs_dir, qp,
+							&ntb_qp_debugfs_stats);
+	}
+
+	INIT_DELAYED_WORK(&qp->link_work, ntb_qp_link_work);
+
+	spin_lock_init(&qp->rxc_lock);
+	spin_lock_init(&qp->rxq_lock);
+	spin_lock_init(&qp->rxe_lock);
+	spin_lock_init(&qp->txc_lock);
+	spin_lock_init(&qp->txq_lock);
+	spin_lock_init(&qp->txe_lock);
+
+	INIT_LIST_HEAD(&qp->rxq);
+	INIT_LIST_HEAD(&qp->rxc);
+	INIT_LIST_HEAD(&qp->rxe);
+	INIT_LIST_HEAD(&qp->txq);
+	INIT_LIST_HEAD(&qp->txc);
+	INIT_LIST_HEAD(&qp->txe);
+}
+
+static int ntb_transport_init(void)
+{
+	int rc, i;
+
+	transport = kzalloc(sizeof(struct ntb_transport), GFP_KERNEL);
+	if (!transport)
+		return -ENOMEM;
+
+	if (debugfs_initialized())
+		transport->debugfs_dir = debugfs_create_dir(KBUILD_MODNAME,
+							    NULL);
+	else
+		transport->debugfs_dir = NULL;
+
+	transport->ndev = ntb_register_transport(transport);
+	if (!transport->ndev) {
+		rc = -EIO;
+		goto err;
+	}
+
+	transport->max_qps = ntb_query_max_cbs(transport->ndev);
+	if (!transport->max_qps) {
+		rc = -EIO;
+		goto err1;
+	}
+
+	transport->qps = kcalloc(transport->max_qps,
+				 sizeof(struct ntb_transport_qp), GFP_KERNEL);
+	if (!transport->qps) {
+		rc = -ENOMEM;
+		goto err1;
+	}
+
+	transport->qp_bitmap = ((u64) 1 << transport->max_qps) - 1;
+
+	for (i = 0; i < transport->max_qps; i++)
+		ntb_transport_init_queue(i);
+
+	rc = ntb_register_event_callback(transport->ndev,
+					 ntb_transport_event_callback);
+	if (rc)
+		goto err2;
+
+	INIT_DELAYED_WORK(&transport->link_work, ntb_transport_link_work);
+
+	if (ntb_hw_link_status(transport->ndev))
+		schedule_delayed_work(&transport->link_work, 0);
+
+	return 0;
+
+err2:
+	kfree(transport->qps);
+err1:
+	ntb_unregister_transport(transport->ndev);
+err:
+	debugfs_remove_recursive(transport->debugfs_dir);
+	kfree(transport);
+	return rc;
+}
+
+static void ntb_transport_free(void)
+{
+	struct pci_dev *pdev;
+	int i;
+
+	if (!transport)
+		return;
+
+	transport->transport_link = NTB_LINK_DOWN;
+
+	cancel_delayed_work_sync(&transport->link_work);
+
+	debugfs_remove_recursive(transport->debugfs_dir);
+
+	ntb_unregister_event_callback(transport->ndev);
+
+	pdev = ntb_query_pdev(transport->ndev);
+
+	for (i = 0; i < NTB_NUM_MW; i++)
+		if (transport->mw[i].virt_addr)
+			dma_free_coherent(&pdev->dev, transport->mw[i].size,
+					  transport->mw[i].virt_addr,
+					  transport->mw[i].dma_addr);
+
+	kfree(transport->qps);
+	ntb_unregister_transport(transport->ndev);
+	kfree(transport);
+	transport = NULL;
+}
+
+static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
+			     struct ntb_queue_entry *entry, void *offset)
+{
+	struct ntb_payload_header *hdr = offset;
+
+	entry->len = hdr->len;
+	offset += sizeof(struct ntb_payload_header);
+	memcpy(entry->buf, offset, entry->len);
+
+	/* Ensure that the data is fully copied out before clearing the flag */
+	wmb();
+	hdr->flags = 0;
+	ntb_list_add_tail(&qp->rxc_lock, &entry->entry, &qp->rxc);
+
+	if (qp->rx_handler && qp->client_ready)
+		qp->rx_handler(qp);
+}
+
+static int ntb_process_rxc(struct ntb_transport_qp *qp)
+{
+	struct ntb_payload_header *hdr;
+	struct ntb_queue_entry *entry;
+	void *offset;
+
+	entry = ntb_list_rm_head(&qp->rxq_lock, &qp->rxq);
+	if (!entry) {
+		hdr = qp->rx_offset;
+		pr_debug("no buffer - HDR ver %llu, len %d, flags %x\n",
+			hdr->ver, hdr->len, hdr->flags);
+		qp->rx_err_no_buf++;
+		return -ENOMEM;
+	}
+
+	offset = qp->rx_offset;
+	hdr = offset;
+
+	if (!(hdr->flags & DESC_DONE_FLAG)) {
+		ntb_list_add_tail(&qp->rxq_lock, &entry->entry, &qp->rxq);
+		qp->rx_ring_empty++;
+		return -EAGAIN;
+	}
+
+	if (hdr->ver != qp->rx_pkts) {
+		pr_debug("qp %d: version mismatch, expected %llu - got %llu\n",
+			 qp->qp_num, qp->rx_pkts, hdr->ver);
+		ntb_list_add_tail(&qp->rxq_lock, &entry->entry, &qp->rxq);
+		qp->rx_err_ver++;
+		return -EIO;
+	}
+
+	if (hdr->flags & NTB_LINK_DOWN) {
+		pr_info("qp %d: Link Down\n", qp->qp_num);
+		qp->qp_link = NTB_LINK_DOWN;
+		schedule_delayed_work(&qp->link_work,
+				      msecs_to_jiffies(NTB_LINK_DOWN_TIMEOUT));
+
+		if (qp->event_handler)
+			qp->event_handler(NTB_LINK_DOWN);
+
+		ntb_list_add_tail(&qp->rxq_lock, &entry->entry, &qp->rxq);
+
+		/* Ensure that the data is fully copied out before clearing the
+		 * done flag
+		 */
+		wmb();
+		hdr->flags = 0;
+		goto out;
+	}
+
+	pr_debug("rx offset %p, ver %llu - %d payload received, "
+		 "buf size %d\n", qp->rx_offset, hdr->ver, hdr->len,
+		 entry->len);
+
+	if (hdr->len <= entry->len)
+		ntb_rx_copy_task(qp, entry, offset);
+	else {
+		ntb_list_add_tail(&qp->rxq_lock, &entry->entry, &qp->rxq);
+
+		/* Ensure that the data is fully copied out before clearing the
+		 * done flag
+		 */
+		wmb();
+		hdr->flags = 0;
+		qp->rx_err_oflow++;
+		pr_err("RX overflow! Wanted %d got %d\n", hdr->len, entry->len);
+	}
+
+	qp->rx_bytes += hdr->len;
+	qp->rx_pkts++;
+
+out:
+	qp->rx_offset =
+	    (qp->rx_offset +
+	     ((transport_mtu + sizeof(struct ntb_payload_header)) * 2) >=
+	     qp->rx_buff_end) ? qp->rx_buff_begin : qp->rx_offset +
+	    transport_mtu + sizeof(struct ntb_payload_header);
+
+	return 0;
+}
+
+static void ntb_transport_rx(unsigned long data)
+{
+	struct ntb_transport_qp *qp = (struct ntb_transport_qp *)data;
+	int rc;
+
+	do {
+		rc = ntb_process_rxc(qp);
+	} while (!rc);
+}
+
+static void ntb_transport_rxc_db(int db_num)
+{
+	struct ntb_transport_qp *qp = &transport->qps[db_num];
+
+	pr_debug("%s: doorbell %d received\n", __func__, db_num);
+
+	tasklet_schedule(&qp->rx_work);
+}
+
+static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
+			     struct ntb_queue_entry *entry,
+			     void *offset)
+{
+	struct ntb_payload_header *hdr = offset;
+	int rc;
+
+	offset += sizeof(struct ntb_payload_header);
+	memcpy_toio(offset, entry->buf, entry->len);
+
+	hdr->len = entry->len;
+	hdr->ver = qp->tx_pkts;
+
+	/* Ensure that the data is fully copied out before setting the flag */
+	wmb();
+	hdr->flags = entry->flags | DESC_DONE_FLAG;
+
+	rc = ntb_ring_sdb(qp->ndev, qp->qp_num);
+	if (rc)
+		pr_err("%s: error ringing db %d\n", __func__, qp->qp_num);
+
+	if (entry->len > 0) {
+		qp->tx_bytes += entry->len;
+
+		/* Add fully transmitted data to completion queue */
+		ntb_list_add_tail(&qp->txc_lock, &entry->entry, &qp->txc);
+
+		if (qp->tx_handler)
+			qp->tx_handler(qp);
+	} else
+		ntb_list_add_tail(&qp->txe_lock, &entry->entry, &qp->txe);
+}
+
+static int ntb_process_tx(struct ntb_transport_qp *qp,
+			  struct ntb_queue_entry *entry)
+{
+	struct ntb_payload_header *hdr;
+	void *offset;
+
+	offset = qp->tx_offset;
+	hdr = offset;
+
+	pr_debug("%lld - offset %p, tx %p, entry len %d flags %x buff %p\n",
+		 qp->tx_pkts, offset, qp->tx_offset, entry->len, entry->flags,
+		 entry->buf);
+	if (hdr->flags) {
+		ntb_list_add_head(&qp->txq_lock, &entry->entry, &qp->txq);
+		qp->tx_ring_full++;
+		return -EAGAIN;
+	}
+
+	if (entry->len > transport_mtu) {
+		pr_err("Trying to send pkt size of %d\n", entry->len);
+		entry->flags = HW_ERROR_FLAG;
+
+		ntb_list_add_tail(&qp->txc_lock, &entry->entry, &qp->txc);
+
+		if (qp->tx_handler)
+			qp->tx_handler(qp);
+
+		return 0;
+	}
+
+	ntb_tx_copy_task(qp, entry, offset);
+
+	qp->tx_offset =
+	    (qp->tx_offset +
+	     ((transport_mtu + sizeof(struct ntb_payload_header)) * 2) >=
+	     qp->tx_mw_end) ? qp->tx_mw_begin : qp->tx_offset + transport_mtu +
+	    sizeof(struct ntb_payload_header);
+
+	qp->tx_pkts++;
+
+	return 0;
+}
+
+static void ntb_transport_tx(unsigned long data)
+{
+	struct ntb_transport_qp *qp = (struct ntb_transport_qp *)data;
+	struct ntb_queue_entry *entry;
+	int rc;
+
+	do {
+		entry = ntb_list_rm_head(&qp->txq_lock, &qp->txq);
+		if (!entry)
+			break;
+
+		rc = ntb_process_tx(qp, entry);
+	} while (!rc);
+}
+
+static void ntb_send_link_down(struct ntb_transport_qp *qp)
+{
+	struct ntb_queue_entry *entry;
+	int i;
+
+	if (qp->qp_link == NTB_LINK_DOWN)
+		return;
+
+	qp->qp_link = NTB_LINK_DOWN;
+
+	for (i = 0; i < NTB_LINK_DOWN_TIMEOUT; i++) {
+		entry = ntb_list_rm_head(&qp->txe_lock, &qp->txe);
+		if (entry)
+			break;
+		msleep(100);
+	}
+
+	entry->callback_data = NULL;
+	entry->buf = NULL;
+	entry->len = 0;
+	entry->flags = LINK_DOWN_FLAG;
+
+	ntb_list_add_tail(&qp->txq_lock, &entry->entry, &qp->txq);
+	tasklet_schedule(&qp->tx_work);
+}
+
+/**
+ * ntb_transport_create_queue - Create a new NTB transport layer queue
+ * @rx_handler: receive callback function
+ * @tx_handler: transmit callback function
+ * @event_handler: event callback function
+ *
+ * Create a new NTB transport layer queue and provide the queue with a callback
+ * routine for both transmit and receive.  The receive callback routine will be
+ * used to pass up data when the transport has received it on the queue.   The
+ * transmit callback routine will be called when the transport has completed the
+ * transmission of the data on the queue and the data is ready to be freed.
+ *
+ * RETURNS: pointer to newly created ntb_queue, NULL on error.
+ */
+struct ntb_transport_qp *
+ntb_transport_create_queue(void (*rx_handler) (struct ntb_transport_qp *qp),
+			   void (*tx_handler) (struct ntb_transport_qp *qp),
+			   void (*event_handler)(int status))
+{
+	struct ntb_queue_entry *entry;
+	struct ntb_transport_qp *qp;
+	unsigned int free_queue;
+	int rc, i;
+
+	if (!transport) {
+		rc = ntb_transport_init();
+		if (rc)
+			return NULL;
+	}
+
+	free_queue = ffs(transport->qp_bitmap);
+	if (!free_queue)
+		goto err;
+
+	/* decrement free_queue to make it zero based */
+	free_queue--;
+
+	clear_bit(free_queue, &transport->qp_bitmap);
+
+	qp = &transport->qps[free_queue];
+	qp->rx_handler = rx_handler;
+	qp->tx_handler = tx_handler;
+	qp->event_handler = event_handler;
+
+	for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
+		entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
+		if (!entry)
+			goto err1;
+
+		ntb_list_add_tail(&qp->rxe_lock, &entry->entry, &qp->rxe);
+	}
+
+	for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
+		entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
+		if (!entry)
+			goto err2;
+
+		ntb_list_add_tail(&qp->txe_lock, &entry->entry, &qp->txe);
+	}
+
+	tasklet_init(&qp->rx_work, ntb_transport_rx, (unsigned long) qp);
+	tasklet_init(&qp->tx_work, ntb_transport_tx, (unsigned long) qp);
+
+	rc = ntb_register_db_callback(qp->ndev, free_queue,
+				      ntb_transport_rxc_db);
+	if (rc)
+		goto err3;
+
+	pr_info("NTB Transport QP %d created\n", qp->qp_num);
+
+	return qp;
+
+err3:
+	tasklet_disable(&qp->rx_work);
+	tasklet_disable(&qp->tx_work);
+err2:
+	while ((entry = ntb_list_rm_head(&qp->txe_lock, &qp->txe)))
+		kfree(entry);
+err1:
+	while ((entry = ntb_list_rm_head(&qp->rxe_lock, &qp->rxe)))
+		kfree(entry);
+	set_bit(free_queue, &transport->qp_bitmap);
+err:
+	return NULL;
+}
+EXPORT_SYMBOL(ntb_transport_create_queue);
+
+/**
+ * ntb_transport_free_queue - Frees NTB transport queue
+ * @qp: NTB queue to be freed
+ *
+ * Frees NTB transport queue
+ */
+void ntb_transport_free_queue(struct ntb_transport_qp *qp)
+{
+	struct ntb_queue_entry *entry;
+
+	if (!qp)
+		return;
+
+	cancel_delayed_work_sync(&qp->link_work);
+
+	ntb_unregister_db_callback(qp->ndev, qp->qp_num);
+	tasklet_disable(&qp->rx_work);
+	tasklet_disable(&qp->tx_work);
+
+	while ((entry = ntb_list_rm_head(&qp->rxe_lock, &qp->rxe)))
+		kfree(entry);
+
+	while ((entry = ntb_list_rm_head(&qp->rxq_lock, &qp->rxq))) {
+		pr_warn("Freeing item from a non-empty queue\n");
+		kfree(entry);
+	}
+
+	while ((entry = ntb_list_rm_head(&qp->rxc_lock, &qp->rxc))) {
+		pr_warn("Freeing item from a non-empty queue\n");
+		kfree(entry);
+	}
+
+	while ((entry = ntb_list_rm_head(&qp->txe_lock, &qp->txe)))
+		kfree(entry);
+
+	while ((entry = ntb_list_rm_head(&qp->txq_lock, &qp->txq))) {
+		pr_warn("Freeing item from a non-empty queue\n");
+		kfree(entry);
+	}
+
+	while ((entry = ntb_list_rm_head(&qp->txc_lock, &qp->txc))) {
+		pr_warn("Freeing item from a non-empty queue\n");
+		kfree(entry);
+	}
+
+	set_bit(qp->qp_num, &transport->qp_bitmap);
+
+	pr_info("NTB Transport QP %d freed\n", qp->qp_num);
+
+	if (transport->qp_bitmap == ((u64) 1 << transport->max_qps) - 1)
+		ntb_transport_free();
+}
+EXPORT_SYMBOL(ntb_transport_free_queue);
+
+/**
+ * ntb_transport_rx_remove - Dequeues enqueued rx packet
+ * @qp: NTB queue to be freed
+ * @len: pointer to variable to write enqueued buffers length
+ *
+ * Dequeues unused buffers from receive queue.  Should only be used during
+ * shutdown of qp.
+ *
+ * RETURNS: NULL error value on error, or void* for success.
+ */
+void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len)
+{
+	struct ntb_queue_entry *entry;
+	void *buf;
+
+	if (!qp || qp->client_ready == NTB_LINK_UP)
+		return NULL;
+
+	entry = ntb_list_rm_head(&qp->rxq_lock, &qp->rxq);
+	if (!entry)
+		return NULL;
+
+	buf = entry->callback_data;
+	*len = entry->len;
+
+	ntb_list_add_tail(&qp->rxe_lock, &entry->entry, &qp->rxe);
+
+	return buf;
+}
+EXPORT_SYMBOL(ntb_transport_rx_remove);
+
+/**
+ * ntb_transport_rx_enqueue - Enqueue a new NTB queue entry
+ * @qp: NTB transport layer queue the entry is to be enqueued on
+ * @cb: per buffer pointer for callback function to use
+ * @data: pointer to data buffer that incoming packets will be copied into
+ * @len: length of the data buffer
+ *
+ * Enqueue a new receive buffer onto the transport queue into which a NTB
+ * payload can be received into.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+			     unsigned int len)
+{
+	struct ntb_queue_entry *entry;
+
+	if (!qp)
+		return -EINVAL;
+
+	entry = ntb_list_rm_head(&qp->rxe_lock, &qp->rxe);
+	if (!entry)
+		return -ENOMEM;
+
+	entry->callback_data = cb;
+	entry->buf = data;
+	entry->len = len;
+
+	ntb_list_add_tail(&qp->rxq_lock, &entry->entry, &qp->rxq);
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_transport_rx_enqueue);
+
+/**
+ * ntb_transport_tx_enqueue - Enqueue a new NTB queue entry
+ * @qp: NTB transport layer queue the entry is to be enqueued on
+ * @cb: per buffer pointer for callback function to use
+ * @data: pointer to data buffer that will be sent
+ * @len: length of the data buffer
+ *
+ * Enqueue a new transmit buffer onto the transport queue from which a NTB
+ * payload will be transmitted.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+			     unsigned int len)
+{
+	struct ntb_queue_entry *entry;
+
+	if (!qp || qp->qp_link != NTB_LINK_UP)
+		return -EINVAL;
+
+	entry = ntb_list_rm_head(&qp->txe_lock, &qp->txe);
+	if (!entry) {
+		/* ring full, kick it */
+		tasklet_schedule(&qp->tx_work);
+		return -ENOMEM;
+	}
+
+	entry->callback_data = cb;
+	entry->buf = data;
+	entry->len = len;
+	entry->flags = 0;
+
+	ntb_list_add_tail(&qp->txq_lock, &entry->entry, &qp->txq);
+
+	tasklet_schedule(&qp->tx_work);
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_transport_tx_enqueue);
+
+/**
+ * ntb_transport_tx_dequeue - Dequeue a NTB queue entry
+ * @qp: NTB transport layer queue to be dequeued from
+ * @len: length of the data buffer
+ *
+ * This function will dequeue a buffer from the transmit complete queue.
+ * Entries will only be enqueued on this queue after having been
+ * transfered to the remote side.
+ *
+ * RETURNS: callback pointer of the buffer from the transport queue, or NULL
+ * on empty
+ */
+void *ntb_transport_tx_dequeue(struct ntb_transport_qp *qp, unsigned int *len)
+{
+	struct ntb_queue_entry *entry;
+	void *buf;
+
+	if (!qp)
+		return NULL;
+
+	entry = ntb_list_rm_head(&qp->txc_lock, &qp->txc);
+	if (!entry)
+		return NULL;
+
+	buf = entry->callback_data;
+	if (entry->flags != HW_ERROR_FLAG)
+		*len = entry->len;
+	else
+		*len = -EIO;
+
+	ntb_list_add_tail(&qp->txe_lock, &entry->entry, &qp->txe);
+
+	return buf;
+}
+EXPORT_SYMBOL(ntb_transport_tx_dequeue);
+
+/**
+ * ntb_transport_rx_dequeue - Dequeue a NTB queue entry
+ * @qp: NTB transport layer queue to be dequeued from
+ * @len: length of the data buffer
+ *
+ * This function will dequeue a buffer from the receive complete queue.
+ * Entries will only be enqueued on this queue after having been fully received.
+ *
+ * RETURNS: callback pointer of the buffer from the transport queue, or NULL
+ * on empty
+ */
+void *ntb_transport_rx_dequeue(struct ntb_transport_qp *qp, unsigned int *len)
+{
+	struct ntb_queue_entry *entry;
+	void *buf;
+
+	if (!qp)
+		return NULL;
+
+	entry = ntb_list_rm_head(&qp->rxc_lock, &qp->rxc);
+	if (!entry)
+		return NULL;
+
+	buf = entry->callback_data;
+	*len = entry->len;
+
+	ntb_list_add_tail(&qp->rxe_lock, &entry->entry, &qp->rxe);
+
+	return buf;
+}
+EXPORT_SYMBOL(ntb_transport_rx_dequeue);
+
+/**
+ * ntb_transport_link_up - Notify NTB transport of client readiness to use queue
+ * @qp: NTB transport layer queue to be enabled
+ *
+ * Notify NTB transport layer of client readiness to use queue
+ */
+void ntb_transport_link_up(struct ntb_transport_qp *qp)
+{
+	if (!qp)
+		return;
+
+	qp->client_ready = NTB_LINK_UP;
+
+	if (transport->transport_link == NTB_LINK_UP)
+		schedule_delayed_work(&qp->link_work, 0);
+}
+EXPORT_SYMBOL(ntb_transport_link_up);
+
+/**
+ * ntb_transport_link_down - Notify NTB transport to no longer enqueue data
+ * @qp: NTB transport layer queue to be disabled
+ *
+ * Notify NTB transport layer of client's desire to no longer receive data on
+ * transport queue specified.  It is the client's responsibility to ensure all
+ * entries on queue are purged or otherwise handled appropraitely.
+ */
+void ntb_transport_link_down(struct ntb_transport_qp *qp)
+{
+	int rc, val;
+
+	if (!qp)
+		return;
+
+	qp->client_ready = NTB_LINK_DOWN;
+
+	cancel_delayed_work_sync(&qp->link_work);
+	qp->qp_link = NTB_LINK_DOWN;
+
+	rc = ntb_read_local_spad(transport->ndev, QP_LINKS, &val);
+	if (rc) {
+		pr_err("Error reading spad %d\n", QP_LINKS);
+		return;
+	}
+
+	rc = ntb_write_remote_spad(transport->ndev, QP_LINKS,
+				   val & ~(1 << qp->qp_num));
+	if (rc)
+		pr_err("Error writing %x to remote spad %d\n",
+		       val & ~(1 << qp->qp_num), QP_LINKS);
+
+	if (transport->transport_link == NTB_LINK_UP)
+		ntb_send_link_down(qp);
+}
+EXPORT_SYMBOL(ntb_transport_link_down);
+
+/**
+ * ntb_transport_link_query - Query transport link state
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query connectivity to the remote system of the NTB transport queue
+ *
+ * RETURNS: true for link up or false for link down
+ */
+bool ntb_transport_link_query(struct ntb_transport_qp *qp)
+{
+	return qp->qp_link == NTB_LINK_UP;
+}
+EXPORT_SYMBOL(ntb_transport_link_query);
+
+/**
+ * ntb_transport_qp_num - Query the qp number
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query qp number of the NTB transport queue
+ *
+ * RETURNS: a zero based number specifying the qp number
+ */
+unsigned char ntb_transport_qp_num(struct ntb_transport_qp *qp)
+{
+	return qp->qp_num;
+}
+EXPORT_SYMBOL(ntb_transport_qp_num);
+
+/**
+ * ntb_transport_max_size - Query the max payload size of a qp
+ * @qp: NTB transport layer queue to be queried
+ *
+ * Query the maximum payload size permissible on the given qp
+ *
+ * RETURNS: the max payload size of a qp
+ */
+unsigned int
+ntb_transport_max_size(__attribute__((unused)) struct ntb_transport_qp *qp)
+{
+	return transport_mtu;
+}
+EXPORT_SYMBOL(ntb_transport_max_size);
diff --git a/include/linux/ntb.h b/include/linux/ntb.h
new file mode 100644
index 0000000..4d0efc3
--- /dev/null
+++ b/include/linux/ntb.h
@@ -0,0 +1,78 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2012 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copy
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Intel PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Jon Mason <jon.mason@intel.com>
+ */
+
+struct ntb_transport_qp;
+
+unsigned char ntb_transport_qp_num(struct ntb_transport_qp *qp);
+unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp);
+struct ntb_transport_qp *
+ntb_transport_create_queue(void (*rx_handler)(struct ntb_transport_qp *qp),
+			   void (*tx_handler)(struct ntb_transport_qp *qp),
+			   void (*event_handler)(int status));
+void ntb_transport_free_queue(struct ntb_transport_qp *qp);
+int ntb_transport_rx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+			     unsigned int len);
+int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
+			     unsigned int len);
+void *ntb_transport_tx_dequeue(struct ntb_transport_qp *qp, unsigned int *len);
+void *ntb_transport_rx_dequeue(struct ntb_transport_qp *qp, unsigned int *len);
+void *ntb_transport_rx_remove(struct ntb_transport_qp *qp, unsigned int *len);
+void ntb_transport_link_up(struct ntb_transport_qp *qp);
+void ntb_transport_link_down(struct ntb_transport_qp *qp);
+bool ntb_transport_link_query(struct ntb_transport_qp *qp);
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH] mac802154: fix sparse warning for mac802154_slave_get_priv
From: Silviu-Mihai Popescu @ 2012-07-13 20:36 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, davem, alex.bluesman.smirnov, Silviu-Mihai Popescu

Make sparse happy by fixing the following error:
	* symbol 'mac802154_slave_get_priv' was not declared. Should it be static?

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
---
 net/mac802154/mib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac802154/mib.c b/net/mac802154/mib.c
index ab59821..1cf6557 100644
--- a/net/mac802154/mib.c
+++ b/net/mac802154/mib.c
@@ -34,7 +34,7 @@ struct hw_addr_filt_notify_work {
 	unsigned long changed;
 };
 
-struct mac802154_priv *mac802154_slave_get_priv(struct net_device *dev)
+static struct mac802154_priv *mac802154_slave_get_priv(struct net_device *dev)
 {
 	struct mac802154_sub_if_data *priv = netdev_priv(dev);
 
-- 
1.7.11-rc3

^ permalink raw reply related

* Hello
From: Clarice Dion @ 2012-07-13 20:03 UTC (permalink / raw)


Hello
Am Clarice Dion, am looking for a honest partner, please if
you don't mind write me back so i can tell you more about
myself with my picture hope to hear from you lots of love kiss.

^ permalink raw reply

* Re: [DANGER 8/7]: ipv4: Cache output routes in fib_info nexthops.
From: Vijay Subramanian @ 2012-07-13 20:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120713.041003.2251275100418141024.davem@davemloft.net>

On 13 July 2012 04:10, David Miller <davem@davemloft.net> wrote:
> From: Vijay Subramanian <subramanian.vijay@gmail.com>
> Date: Thu, 12 Jul 2012 17:52:54 -0700
>
>> I did not get a chance to see why it suddenly starts working. Hope
>> this helps. I will dig around more.
>
> The problem is the setting of ->rt_gateway for local subnet routes.
>

When I tested this yesterday, the peer was actually not on the same
subnet and I saw this problem.  I think the problem was also present
for a peer on the same subnet.
Anyway, I applied this patch and the DANGER one after this and the
problem has disappeared, fib_lookup() now returns RTN_LOCAL and ssh
responds immediately.
I tested with 2 peers, one on the same subnet and one on a different
one. So far, it looks good.

Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>

Thanks,
Vijay

^ permalink raw reply

* Re: [PATCH] iproute2: Fix memory hog of ip batched command.
From: Pravin Shelar @ 2012-07-13 19:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, jpettit, jesse
In-Reply-To: <20120713095342.09324ebc@nehalam.linuxnetplumber.net>

On Fri, Jul 13, 2012 at 9:53 AM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Thu, 12 Jul 2012 18:21:06 -0700
> Pravin B Shelar <pshelar@nicira.com> wrote:
>
>> ipaddr_list_or_flush() builds list of all device at start of
>> every flush or list operation, but does not free memory at end.
>> This can hog lot of memory for large batched command.
>> Following patch fixes it.
>>
>> Reported-by: Justin Pettit <jpettit@nicira.com>
>> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
>
> What about this instead? It also has a couple of other changes:
>   1. stdout isn't flushed on each print only at end
>   2. address list does not need to be fetched when doing flush
>
sounds good.

> It comes up clean under valgrind.
>
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 5e03d1e..8870ae8 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -768,11 +768,145 @@ static int store_nlmsg(const struct sockaddr_nl *who, struct nlmsghdr *n,
>         return 0;
>  }
>
> +static void free_nlmsg_chain(struct nlmsg_chain *info)
> +{
> +       struct nlmsg_list *l, *n;
> +
> +       for (l = info->head; l; l = n) {
> +               n = l->next;
> +               free(l);
> +       }
> +}
> +
> +static void ipaddr_filter(struct nlmsg_chain *linfo, struct nlmsg_chain *ainfo)
> +{
> +       struct nlmsg_list *l, **lp;
> +
> +       lp = &linfo->head;
> +       while ( (l = *lp) != NULL) {
> +               int ok = 0;
> +               struct ifinfomsg *ifi = NLMSG_DATA(&l->h);
> +               struct nlmsg_list *a;
> +
> +               for (a = ainfo->head; a; a = a->next) {
> +                       struct nlmsghdr *n = &a->h;
> +                       struct ifaddrmsg *ifa = NLMSG_DATA(n);
> +
> +                       if (ifa->ifa_index != ifi->ifi_index ||
> +                           (filter.family && filter.family != ifa->ifa_family))
> +                               continue;
> +                       if ((filter.scope^ifa->ifa_scope)&filter.scopemask)
> +                               continue;
> +                       if ((filter.flags^ifa->ifa_flags)&filter.flagmask)
> +                               continue;
> +                       if (filter.pfx.family || filter.label) {
> +                               struct rtattr *tb[IFA_MAX+1];
> +                               parse_rtattr(tb, IFA_MAX, IFA_RTA(ifa), IFA_PAYLOAD(n));
> +                               if (!tb[IFA_LOCAL])
> +                                       tb[IFA_LOCAL] = tb[IFA_ADDRESS];
> +
> +                               if (filter.pfx.family && tb[IFA_LOCAL]) {
> +                                       inet_prefix dst;
> +                                       memset(&dst, 0, sizeof(dst));
> +                                       dst.family = ifa->ifa_family;
> +                                       memcpy(&dst.data, RTA_DATA(tb[IFA_LOCAL]), RTA_PAYLOAD(tb[IFA_LOCAL]));
> +                                       if (inet_addr_match(&dst, &filter.pfx, filter.pfx.bitlen))
> +                                               continue;
> +                               }
> +                               if (filter.label) {
> +                                       SPRINT_BUF(b1);
> +                                       const char *label;
> +                                       if (tb[IFA_LABEL])
> +                                               label = RTA_DATA(tb[IFA_LABEL]);
> +                                       else
> +                                               label = ll_idx_n2a(ifa->ifa_index, b1);
> +                                       if (fnmatch(filter.label, label, 0) != 0)
> +                                               continue;
> +                               }
> +                       }
> +
> +                       ok = 1;
> +                       break;
> +               }
> +               if (!ok) {
> +                       *lp = l->next;
> +                       free(l);
> +               } else
> +                       lp = &l->next;
> +       }
> +}
> +
> +static int ipaddr_flush(void)
> +{
> +       int round = 0;
> +       char flushb[4096-512];
> +
> +       filter.flushb = flushb;
> +       filter.flushp = 0;
> +       filter.flushe = sizeof(flushb);
> +
> +       while ((max_flush_loops == 0) || (round < max_flush_loops)) {
> +               const struct rtnl_dump_filter_arg a[3] = {
> +                       {
> +                               .filter = print_addrinfo_secondary,
> +                               .arg1 = stdout,
> +                       },
> +                       {
> +                               .filter = print_addrinfo_primary,
> +                               .arg1 = stdout,
> +                       },
> +                       {
> +                               .filter = NULL,
> +                               .arg1 = NULL,
> +                       },
> +               };
> +               if (rtnl_wilddump_request(&rth, filter.family, RTM_GETADDR) < 0) {
> +                       perror("Cannot send dump request");
> +                       exit(1);
> +               }
> +               filter.flushed = 0;
> +               if (rtnl_dump_filter_l(&rth, a) < 0) {
> +                       fprintf(stderr, "Flush terminated\n");
> +                       exit(1);
> +               }
> +               if (filter.flushed == 0) {
> + flush_done:
> +                       if (show_stats) {
> +                               if (round == 0)
> +                                       printf("Nothing to flush.\n");
> +                               else
> +                                       printf("*** Flush is complete after %d round%s ***\n", round, round>1?"s":"");
> +                       }
> +                       fflush(stdout);
> +                       return 0;
> +               }
> +               round++;
> +               if (flush_update() < 0)
> +                       return 1;
> +
> +               if (show_stats) {
> +                       printf("\n*** Round %d, deleting %d addresses ***\n", round, filter.flushed);
> +                       fflush(stdout);
> +               }
> +
> +               /* If we are flushing, and specifying primary, then we
> +                * want to flush only a single round.  Otherwise, we'll
> +                * start flushing secondaries that were promoted to
> +                * primaries.
> +                */
> +               if (!(filter.flags & IFA_F_SECONDARY) && (filter.flagmask & IFA_F_SECONDARY))
> +                       goto flush_done;
> +       }
> +       fprintf(stderr, "*** Flush remains incomplete after %d rounds. ***\n", max_flush_loops);
> +       fflush(stderr);
> +       return 1;
> +}
> +
>  static int ipaddr_list_or_flush(int argc, char **argv, int flush)
>  {
>         struct nlmsg_chain linfo = { NULL, NULL};
>         struct nlmsg_chain ainfo = { NULL, NULL};
> -       struct nlmsg_list *l, *n;
> +       struct nlmsg_list *l;
>         char *filter_dev = NULL;
>         int no_link = 0;
>
> @@ -863,16 +997,6 @@ static int ipaddr_list_or_flush(int argc, char **argv, int flush)
>                 argv++; argc--;
>         }
>
> -       if (rtnl_wilddump_request(&rth, preferred_family, RTM_GETLINK) < 0) {
> -               perror("Cannot send dump request");
> -               exit(1);
> -       }
> -
> -       if (rtnl_dump_filter(&rth, store_nlmsg, &linfo) < 0) {
> -               fprintf(stderr, "Dump terminated\n");
> -               exit(1);
> -       }
> -
>         if (filter_dev) {
>                 filter.ifindex = ll_name_to_index(filter_dev);
>                 if (filter.ifindex <= 0) {
> @@ -881,72 +1005,23 @@ static int ipaddr_list_or_flush(int argc, char **argv, int flush)
>                 }
>         }
>
> -       if (flush) {
> -               int round = 0;
> -               char flushb[4096-512];
> -
> -               filter.flushb = flushb;
> -               filter.flushp = 0;
> -               filter.flushe = sizeof(flushb);
> -
> -               while ((max_flush_loops == 0) || (round < max_flush_loops)) {
> -                       const struct rtnl_dump_filter_arg a[3] = {
> -                               {
> -                                       .filter = print_addrinfo_secondary,
> -                                       .arg1 = stdout,
> -                               },
> -                               {
> -                                       .filter = print_addrinfo_primary,
> -                                       .arg1 = stdout,
> -                               },
> -                               {
> -                                       .filter = NULL,
> -                                       .arg1 = NULL,
> -                               },
> -                       };
> -                       if (rtnl_wilddump_request(&rth, filter.family, RTM_GETADDR) < 0) {
> -                               perror("Cannot send dump request");
> -                               exit(1);
> -                       }
> -                       filter.flushed = 0;
> -                       if (rtnl_dump_filter_l(&rth, a) < 0) {
> -                               fprintf(stderr, "Flush terminated\n");
> -                               exit(1);
> -                       }
> -                       if (filter.flushed == 0) {
> -flush_done:
> -                               if (show_stats) {
> -                                       if (round == 0)
> -                                               printf("Nothing to flush.\n");
> -                                       else
> -                                               printf("*** Flush is complete after %d round%s ***\n", round, round>1?"s":"");
> -                               }
> -                               fflush(stdout);
> -                               return 0;
> -                       }
> -                       round++;
> -                       if (flush_update() < 0)
> -                               return 1;
> +       if (flush)
> +               return ipaddr_flush();
>
> -                       if (show_stats) {
> -                               printf("\n*** Round %d, deleting %d addresses ***\n", round, filter.flushed);
> -                               fflush(stdout);
> -                       }
> +       if (rtnl_wilddump_request(&rth, preferred_family, RTM_GETLINK) < 0) {
> +               perror("Cannot send dump request");
> +               exit(1);
> +       }
>
> -                       /* If we are flushing, and specifying primary, then we
> -                        * want to flush only a single round.  Otherwise, we'll
> -                        * start flushing secondaries that were promoted to
> -                        * primaries.
> -                        */
> -                       if (!(filter.flags & IFA_F_SECONDARY) && (filter.flagmask & IFA_F_SECONDARY))
> -                               goto flush_done;
> -               }
> -               fprintf(stderr, "*** Flush remains incomplete after %d rounds. ***\n", max_flush_loops);
> -               fflush(stderr);
> -               return 1;
> +       if (rtnl_dump_filter(&rth, store_nlmsg, &linfo) < 0) {
> +               fprintf(stderr, "Dump terminated\n");
> +               exit(1);
>         }
>
> -       if (filter.family != AF_PACKET) {
> +       if (filter.family && filter.family != AF_PACKET) {
> +               if (filter.oneline)
> +                       no_link = 1;
> +
>                 if (rtnl_wilddump_request(&rth, filter.family, RTM_GETADDR) < 0) {
>                         perror("Cannot send dump request");
>                         exit(1);
> @@ -956,80 +1031,24 @@ flush_done:
>                         fprintf(stderr, "Dump terminated\n");
>                         exit(1);
>                 }
> -       }
> -
> -
> -       if (filter.family && filter.family != AF_PACKET) {
> -               struct nlmsg_list **lp;
> -               lp = &linfo.head;
> -
> -               if (filter.oneline)
> -                       no_link = 1;
>
> -               while ((l=*lp)!=NULL) {
> -                       int ok = 0;
> -                       struct ifinfomsg *ifi = NLMSG_DATA(&l->h);
> -                       struct nlmsg_list *a;
> -
> -                       for (a = ainfo.head; a; a = a->next) {
> -                               struct nlmsghdr *n = &a->h;
> -                               struct ifaddrmsg *ifa = NLMSG_DATA(n);
> -
> -                               if (ifa->ifa_index != ifi->ifi_index ||
> -                                   (filter.family && filter.family != ifa->ifa_family))
> -                                       continue;
> -                               if ((filter.scope^ifa->ifa_scope)&filter.scopemask)
> -                                       continue;
> -                               if ((filter.flags^ifa->ifa_flags)&filter.flagmask)
> -                                       continue;
> -                               if (filter.pfx.family || filter.label) {
> -                                       struct rtattr *tb[IFA_MAX+1];
> -                                       parse_rtattr(tb, IFA_MAX, IFA_RTA(ifa), IFA_PAYLOAD(n));
> -                                       if (!tb[IFA_LOCAL])
> -                                               tb[IFA_LOCAL] = tb[IFA_ADDRESS];
> -
> -                                       if (filter.pfx.family && tb[IFA_LOCAL]) {
> -                                               inet_prefix dst;
> -                                               memset(&dst, 0, sizeof(dst));
> -                                               dst.family = ifa->ifa_family;
> -                                               memcpy(&dst.data, RTA_DATA(tb[IFA_LOCAL]), RTA_PAYLOAD(tb[IFA_LOCAL]));
> -                                               if (inet_addr_match(&dst, &filter.pfx, filter.pfx.bitlen))
> -                                                       continue;
> -                                       }
> -                                       if (filter.label) {
> -                                               SPRINT_BUF(b1);
> -                                               const char *label;
> -                                               if (tb[IFA_LABEL])
> -                                                       label = RTA_DATA(tb[IFA_LABEL]);
> -                                               else
> -                                                       label = ll_idx_n2a(ifa->ifa_index, b1);
> -                                               if (fnmatch(filter.label, label, 0) != 0)
> -                                                       continue;
> -                                       }
> -                               }
> -
> -                               ok = 1;
> -                               break;
> -                       }
> -                       if (!ok) {
> -                               *lp = l->next;
> -                               free(l);
> -                       } else
> -                               lp = &l->next;
> -               }
> +               ipaddr_filter(&linfo, &ainfo);
>         }
>
> -       for (l = linfo.head; l; l = n) {
> -               n = l->next;
> -               if (no_link || print_linkinfo(NULL, &l->h, stdout) == 0) {
> -                       struct ifinfomsg *ifi = NLMSG_DATA(&l->h);
> -                       if (filter.family != AF_PACKET)
> -                               print_selected_addrinfo(ifi->ifi_index, ainfo.head, stdout);
> +       if (!no_link) {
> +               for (l = linfo.head; l; l = l->next) {
> +                       if (print_linkinfo(NULL, &l->h, stdout) == 0) {
> +                               struct ifinfomsg *ifi = NLMSG_DATA(&l->h);
> +                               if (filter.family != AF_PACKET)
> +                                       print_selected_addrinfo(ifi->ifi_index, ainfo.head, stdout);
> +                       }
>                 }
I am not sure why you have changed no_link check for printing address.

otherwise looks good.

>                 fflush(stdout);
> -               free(l);
>         }
>
> +       free_nlmsg_chain(&ainfo);
> +       free_nlmsg_chain(&linfo);
> +
>         return 0;
>  }
>
>

^ permalink raw reply

* Re: [PATCH net-next] tipc: Use pr_fmt
From: Joe Perches @ 2012-07-13 19:04 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: davem, netdev, Jon Maloy, Erik Hugne, ying.xue
In-Reply-To: <20120713154544.GA31280@windriver.com>

On Fri, 2012-07-13 at 11:45 -0400, Paul Gortmaker wrote:
> [[PATCH net-next] tipc: Use pr_fmt] On 12/07/2012 (Thu 10:51) Joe Perches wrote:
> > How about extending this to use the more common pr_fmt prefix?
> We can probably do that.  I'll incorporate that into the 3/8 patch,
> rather than have two similar patches in the series, with the 2nd just
> going in and touching all the same lines a second time.

Hi Paul, that sounds fine to me,  cheers, Joe

^ permalink raw reply

* RE: [PATCH] ixgbevf - Prevent RX/TX statistics getting reset to zero
From: Narendra_K @ 2012-07-13 18:26 UTC (permalink / raw)
  To: gregory.v.rose; +Cc: jeffrey.t.kirsher, netdev
In-Reply-To: <20120713101401.0000421f@unknown>

> -----Original Message-----
> From: Greg Rose [mailto:gregory.v.rose@intel.com]
> Sent: Friday, July 13, 2012 10:44 PM
> To: K, Narendra
> Cc: jeffrey.t.kirsher@intel.com; netdev@vger.kernel.org
> Subject: Re: [PATCH] ixgbevf - Prevent RX/TX statistics getting reset to zero
> 
> On Fri, 13 Jul 2012 05:36:37 -0700
> <Narendra_K@Dell.com> wrote:
> 
> > > -----Original Message-----
> > > From: Jeff Kirsher [mailto:tarbal@gmail.com]
> > > Sent: Thursday, July 12, 2012 11:56 PM
> > > To: K, Narendra; gregory.v.rose@intel.com
> > > Cc: netdev@vger.kernel.org
> > > Subject: Re: [PATCH] ixgbevf - Prevent RX/TX statistics getting
> > > reset to zero
> > >
> > > On 07/12/2012 06:55 AM, Narendra_K@Dell.com wrote:
> > > > Hello,
> > > >
> > > > [Apologies if you are receiving this message twice. I am resending
> > > > the
> > > message, as I got message delivery failure note].
> > > >
> > > > While exploring SR-IOV on Intel 82599EB 10-Gigabit SFP+ adapter, I
> > > > had the
> > > following observation.  I enabled two VFs by passing 'max_vfs=2' to
> > > ixgbe driver. One of the VFs was assigned to a guest.
> > > > In the guest, the ifconfig and ip tools reported 'RX packets' and
> > > > 'TX packets'
> > > as zero, after pinging to a remote host. Looking into it further,
> > > the commit 4197aa7bb81877ebb06e4f2cc1b5fea2da23a7bd implements
> 64
> > > bit per ring statistics. It seemed like the 'total_bytes' and
> > > 'total_packets' of RX and TX ring were being reset to zero by the RX
> > > and TX interrupt handlers, resulting in the user space tools
> > > reporting zero RX and TX bytes.
> > > >
> > > > The attached patch addresses the issue by preventing the resetting
> > > > of RX
> > > and TX ring statistics to zero. The patch was taken against latest
> > > mainline 3.5- rc6 kernel.
> > > >
> > > > I tested the patch by pinging  from the guest OS to a remote host.
> > > >
> > > > ping -f <remote host> -c 10000
> > > >
> > > > The ip and ifcofig showed the statistics increased by 10000
> > > > packets.
> > > >
> > > > # lspci | grep 82599
> > > > 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
> > > > SFP+
> > > Network Connection (rev 01)
> > > > 04:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
> > > > SFP+
> > > Network Connection (rev 01)
> > > > 04:10.0 Ethernet controller: Intel Corporation 82599 Ethernet
> > > > Controller
> > > Virtual Function (rev 01)
> > > > 04:10.1 Ethernet controller: Intel Corporation 82599 Ethernet
> > > > Controller
> > > Virtual Function (rev 01)
> > > > 04:10.2 Ethernet controller: Intel Corporation 82599 Ethernet
> > > > Controller
> > > Virtual Function (rev 01)
> > > > 04:10.3 Ethernet controller: Intel Corporation 82599 Ethernet
> > > > Controller
> > > Virtual Function (rev 01)
> > > >
> > > > # lspci -s 04:00.0 -n
> > > > 04:00.0 0200: 8086:154d (rev 01)
> > > > # lspci -s 04:10.0 -n
> > > > 04:10.0 0200: 8086:10ed (rev 01)
> > > >
> > > > Please let me know if additional details and logs are required.[>]
> > > >
> > > > With regards,
> > > > Narendra K
> > > >
> > > >
> > > >
> > >
> > > Thanks, I will add the patch to my queue
> > >
> > [>]
> >
> > Hi Greg,
> >
> > I was re-looking at why ' rx_ring->total_packets' and '
> > rx_ring->total_bytes' were being set to zero in '
> > ixgbevf_msix_clean_rx'.  It looks like ' rx_ring->total_packets' and '
> > rx_ring->total_packets'  are computed per one run of
> > 'ixgbevf_clean_rx_irq' .  Then in 'ixgbevf_clean_rxonly' if
> > 'adapter->itr_setting & 1' is true, the count is  used in
> > 'ixgbevf_set_itr_msix'. When the interrupts are enabled, the '
> > rx_ring->total_packets'  and ' rx_ring->total_bytes' are set to zero
> > so that they can be re-computed in the poll function and  fed to the
> > 'ixgbevf_set_itr_msix'.
> >
> > This results in statistics reported by 'ip' and 'ifconfig' as zero.
> > The patch addresses the scenario.  But it seems it would change the
> > intended behavior in the scenario  when 'adapter->itr_setting & 1' is
> > true.  It could be addressed by storing the 'total_rx_packets' and
> > 'total_rx_bytes'  computed every time in the poll function in 'struct
> > ixgbevf_adapter' . Then the interrupt handler could reset them to zero
> > instead of resetting  ' rx_ring->total_packets' and  '
> > rx_ring->total_bytes'.
> >
> > Also, I observed that  'adapter->itr_setting & 1'  was not true by
> > default.  I tried setting it by 'ethtool  -C eth0 adaptive-rx on', and
> > it returned 'operation not supported'.
> >
> > I could be missing something here, please let me know.
> 
> Nope, you're correct in your analysis.  The ixgbevf driver hasn't supported
> adaptive interrupt moderation in the past.  However, a set of patches we
> have in the pipeline will turn it on by default.  Also, as a result of those
> patches the bug you've reported will be fixed.
> We'll go ahead and accept your patch for the net tree and then fix up any
> conflicts between that and our new set of patches when they get pushed to
> net-next.
> 
> Thanks for your work on this.
> 
> - Greg

Thank you Greg.

With regards,
Narendra K

^ permalink raw reply

* [PATCH] sctp: Implement quick failover draft from tsvwg
From: Neil Horman @ 2012-07-13 18:26 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
	linux-sctp

I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters.  While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established.  I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
---
 Documentation/networking/ip-sysctl.txt |   14 +++++++++++++
 include/net/sctp/constants.h           |    1 +
 include/net/sctp/structs.h             |    4 +++
 include/net/sctp/user.h                |    1 +
 net/sctp/associola.c                   |   33 +++++++++++++++++++++++++------
 net/sctp/outqueue.c                    |    6 +++-
 net/sctp/sm_sideeffect.c               |   33 ++++++++++++++++++++++++++++---
 net/sctp/sysctl.c                      |    9 ++++++++
 net/sctp/transport.c                   |    3 +-
 9 files changed, 90 insertions(+), 14 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 47b6c79..c636f9c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1408,6 +1408,20 @@ path_max_retrans - INTEGER
 
 	Default: 5
 
+pf_retrans - INTEGER
+	The number of retransmissions that will be attempted on a given path
+	before traffic is redirected to an alternate transport (should one
+	exist).  Note this is distinct from path_max_retrans, as a path that
+	passes the pf_retrans threshold can still be used.  Its only
+	deprioritized when a transmission path is selected by the stack.  This
+	setting is primarily used to enable fast failover mechanisms without
+	having to reduce path_max_retrans to a very low value.  See:
+	http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
+	for details.  Note also that a value of pf_retrans > path_max_retrans
+	disables this feature
+
+	Default: 0
+
 rto_initial - INTEGER
 	The initial round trip timeout value in milliseconds that will be used
 	in calculating round trip times.  This is the initial time interval
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 942b864..d053d2e 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -334,6 +334,7 @@ typedef enum {
 typedef enum {
 	SCTP_TRANSPORT_UP,
 	SCTP_TRANSPORT_DOWN,
+	SCTP_TRANSPORT_PF,
 } sctp_transport_cmd_t;
 
 /* These are the address scopes defined mainly for IPv4 addresses
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index e4652fe..22825abe 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -160,6 +160,7 @@ extern struct sctp_globals {
 	int max_retrans_association;
 	int max_retrans_path;
 	int max_retrans_init;
+	int pf_retrans;
 
 	/*
 	 * Policy for preforming sctp/socket accounting
@@ -258,6 +259,7 @@ extern struct sctp_globals {
 #define sctp_sndbuf_policy	 	(sctp_globals.sndbuf_policy)
 #define sctp_rcvbuf_policy	 	(sctp_globals.rcvbuf_policy)
 #define sctp_max_retrans_path		(sctp_globals.max_retrans_path)
+#define sctp_pf_retrans			(sctp_globals.pf_retrans)
 #define sctp_max_retrans_init		(sctp_globals.max_retrans_init)
 #define sctp_sack_timeout		(sctp_globals.sack_timeout)
 #define sctp_hb_interval		(sctp_globals.hb_interval)
@@ -1660,6 +1662,8 @@ struct sctp_association {
 	 */
 	int max_retrans;
 
+	int pf_retrans;
+
 	/* Maximum number of times the endpoint will retransmit INIT  */
 	__u16 max_init_attempts;
 
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 0842ef0..cece1bf 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -649,6 +649,7 @@ struct sctp_paddrinfo {
  */
 enum sctp_spinfo_state {
 	SCTP_INACTIVE,
+	SCTP_PF,
 	SCTP_ACTIVE,
 	SCTP_UNCONFIRMED,
 	SCTP_UNKNOWN = 0xffff  /* Value used for transport state unknown */
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 5bc9ab1..f3ebc23 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -124,6 +124,8 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 	 * socket values.
 	 */
 	asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt;
+	asoc->pf_retrans  = sctp_pf_retrans;
+
 	asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial);
 	asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max);
 	asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min);
@@ -840,6 +842,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 	struct sctp_ulpevent *event;
 	struct sockaddr_storage addr;
 	int spc_state = 0;
+	bool ulp_notify = true;
 
 	/* Record the transition on the transport.  */
 	switch (command) {
@@ -853,6 +856,14 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 			spc_state = SCTP_ADDR_CONFIRMED;
 		else
 			spc_state = SCTP_ADDR_AVAILABLE;
+		/* Don't inform ULP about transition from PF to
+		 * active state and set cwnd to 1, see SCTP
+		 * Quick failover draft section 5.1, point 5
+		 */
+		if (transport->state == SCTP_PF) {
+			ulp_notify = false;
+			transport->cwnd = 1;
+		}
 		transport->state = SCTP_ACTIVE;
 		break;
 
@@ -871,6 +882,10 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 		spc_state = SCTP_ADDR_UNREACHABLE;
 		break;
 
+	case SCTP_TRANSPORT_PF:
+		transport->state = SCTP_PF;
+		ulp_notify = false;
+		break;
 	default:
 		return;
 	}
@@ -878,12 +893,15 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 	/* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the
 	 * user.
 	 */
-	memset(&addr, 0, sizeof(struct sockaddr_storage));
-	memcpy(&addr, &transport->ipaddr, transport->af_specific->sockaddr_len);
-	event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
-				0, spc_state, error, GFP_ATOMIC);
-	if (event)
-		sctp_ulpq_tail_event(&asoc->ulpq, event);
+	if (ulp_notify) {
+		memset(&addr, 0, sizeof(struct sockaddr_storage));
+		memcpy(&addr, &transport->ipaddr,
+		       transport->af_specific->sockaddr_len);
+		event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
+					0, spc_state, error, GFP_ATOMIC);
+		if (event)
+			sctp_ulpq_tail_event(&asoc->ulpq, event);
+	}
 
 	/* Select new active and retran paths. */
 
@@ -899,7 +917,8 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 			transports) {
 
 		if ((t->state == SCTP_INACTIVE) ||
-		    (t->state == SCTP_UNCONFIRMED))
+		    (t->state == SCTP_UNCONFIRMED) ||
+		    (t->state == SCTP_PF))
 			continue;
 		if (!first || t->last_time_heard > first->last_time_heard) {
 			second = first;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index a0fa19f..e7aa177c 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 			if (!new_transport)
 				new_transport = asoc->peer.active_path;
 		} else if ((new_transport->state == SCTP_INACTIVE) ||
-			   (new_transport->state == SCTP_UNCONFIRMED)) {
+			   (new_transport->state == SCTP_UNCONFIRMED) ||
+			   (new_transport->state == SCTP_PF)) {
 			/* If the chunk is Heartbeat or Heartbeat Ack,
 			 * send it to chunk->transport, even if it's
 			 * inactive.
@@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 			new_transport = chunk->transport;
 			if (!new_transport ||
 			    ((new_transport->state == SCTP_INACTIVE) ||
-			     (new_transport->state == SCTP_UNCONFIRMED)))
+			     (new_transport->state == SCTP_UNCONFIRMED) ||
+			     (new_transport->state == SCTP_PF)))
 				new_transport = asoc->peer.active_path;
 			if (new_transport->state == SCTP_UNCONFIRMED)
 				continue;
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index c96d1a8..285e26a 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type, sctp_subtype_t subtype,
 			     sctp_cmd_seq_t *commands,
 			     gfp_t gfp);
 
+static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
+				     struct sctp_transport *t);
 /********************************************************************
  * Helper functions
  ********************************************************************/
@@ -470,7 +472,8 @@ sctp_timer_event_t *sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = {
  * notification SHOULD be sent to the upper layer.
  *
  */
-static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
+static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
+					 struct sctp_association *asoc,
 					 struct sctp_transport *transport,
 					 int is_hb)
 {
@@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
 			transport->error_count++;
 	}
 
+	/* If the transport error count is greater than the pf_retrans
+	 * threshold, and less than pathmaxrtx, then mark this transport
+	 * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
+	 * point 1
+	 */
+	if ((transport->state != SCTP_PF) &&
+	   (asoc->pf_retrans < transport->pathmaxrxt) &&
+	   (transport->error_count > asoc->pf_retrans)) {
+
+		sctp_assoc_control_transport(asoc, transport,
+					     SCTP_TRANSPORT_PF,
+					     0);
+
+		/* Update the hb timer to resend a heartbeat every rto */
+		sctp_cmd_hb_timer_update(commands, transport);
+	}
+
 	if (transport->state != SCTP_INACTIVE &&
 	    (transport->error_count > transport->pathmaxrxt)) {
 		SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
@@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 					     SCTP_HEARTBEAT_SUCCESS);
 	}
 
+	if (t->state == SCTP_PF)
+		sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP,
+					     SCTP_HEARTBEAT_SUCCESS);
+
 	/* The receiver of the HEARTBEAT ACK should also perform an
 	 * RTT measurement for that destination transport address
 	 * using the time value carried in the HEARTBEAT ACK chunk.
@@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 
 		case SCTP_CMD_STRIKE:
 			/* Mark one strike against a transport.  */
-			sctp_do_8_2_transport_strike(asoc, cmd->obj.transport,
-						    0);
+			sctp_do_8_2_transport_strike(commands, asoc,
+						    cmd->obj.transport, 0);
 			break;
 
 		case SCTP_CMD_TRANSPORT_IDLE:
@@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 
 		case SCTP_CMD_TRANSPORT_HB_SENT:
 			t = cmd->obj.transport;
-			sctp_do_8_2_transport_strike(asoc, t, 1);
+			sctp_do_8_2_transport_strike(commands, asoc,
+						     t, 1);
 			t->hb_sent = 1;
 			break;
 
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index e5fe639..2b2bfe9 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
 		.extra2		= &int_max
 	},
 	{
+		.procname	= "pf_retrans",
+		.data		= &sctp_pf_retrans,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &int_max
+	},
+	{
 		.procname	= "max_init_retransmits",
 		.data		= &sctp_max_retrans_init,
 		.maxlen		= sizeof(int),
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index b026ba0..4639ba2 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -585,7 +585,8 @@ unsigned long sctp_transport_timeout(struct sctp_transport *t)
 {
 	unsigned long timeout;
 	timeout = t->rto + sctp_jitter(t->rto);
-	if (t->state != SCTP_UNCONFIRMED)
+	if ((t->state != SCTP_UNCONFIRMED) &&
+	    (t->state != SCTP_PF))
 		timeout += t->hbinterval;
 	timeout += jiffies;
 	return timeout;
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH] irda: Fix typo in irda
From: Masanari Iida @ 2012-07-13 17:22 UTC (permalink / raw)
  To: netdev, samuel; +Cc: linux-kernel, trivial, Masanari Iida

Correct spelling typo in irda.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
---
 net/irda/af_irda.c              | 2 +-
 net/irda/irlan/irlan_provider.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index bb14c34..bb738c9f 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -955,7 +955,7 @@ out:
  * The main difference with a "standard" connect is that with IrDA we need
  * to resolve the service name into a TSAP selector (in TCP, port number
  * doesn't have to be resolved).
- * Because of this service name resoltion, we can offer "auto-connect",
+ * Because of this service name resolution, we can offer "auto-connect",
  * where we connect to a service without specifying a destination address.
  *
  * Note : by consulting "errno", the user space caller may learn the cause
diff --git a/net/irda/irlan/irlan_provider.c b/net/irda/irlan/irlan_provider.c
index 32dcaac..4664855 100644
--- a/net/irda/irlan/irlan_provider.c
+++ b/net/irda/irlan/irlan_provider.c
@@ -296,7 +296,7 @@ void irlan_provider_send_reply(struct irlan_cb *self, int command,
 	skb = alloc_skb(IRLAN_MAX_HEADER + IRLAN_CMD_HEADER +
 			/* Bigger param length comes from CMD_GET_MEDIA_CHAR */
 			IRLAN_STRING_PARAMETER_LEN("FILTER_TYPE", "DIRECTED") +
-			IRLAN_STRING_PARAMETER_LEN("FILTER_TYPE", "BORADCAST") +
+			IRLAN_STRING_PARAMETER_LEN("FILTER_TYPE", "BROADCAST") +
 			IRLAN_STRING_PARAMETER_LEN("FILTER_TYPE", "MULTICAST") +
 			IRLAN_STRING_PARAMETER_LEN("ACCESS_TYPE", "HOSTED"),
 			GFP_ATOMIC);
-- 
1.7.11.2.138.g2b53359

^ permalink raw reply related

* [PATCH] sctp: fix sparse warning for sctp_init_cause_fixed
From: Ioan Orghici @ 2012-07-13 17:16 UTC (permalink / raw)
  To: vyasevich, sri, davem, netdev; +Cc: Ioan Orghici

Fix the following sparse warning:
	* symbol 'sctp_init_cause_fixed' was not declared. Should it be
	  static?

Signed-off-by: Ioan Orghici <ioanorghici@gmail.com>
---
 net/sctp/sm_make_chunk.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index b6de71e..479a70e 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -132,7 +132,7 @@ void  sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
  * abort chunk.  Differs from sctp_init_cause in that it won't oops
  * if there isn't enough space in the op error chunk
  */
-int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
+static int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
 		      size_t paylen)
 {
 	sctp_errhdr_t err;
-- 
1.7.6.3

^ permalink raw reply related

* Re: [PATCH] ixgbevf - Prevent RX/TX statistics getting reset to zero
From: Greg Rose @ 2012-07-13 17:14 UTC (permalink / raw)
  To: Narendra_K; +Cc: jeffrey.t.kirsher, netdev
In-Reply-To: <E31FB011129F30488D5861F38390491520D0E082C7@BLRX7MCDC201.AMER.DELL.COM>

On Fri, 13 Jul 2012 05:36:37 -0700
<Narendra_K@Dell.com> wrote:

> > -----Original Message-----
> > From: Jeff Kirsher [mailto:tarbal@gmail.com]
> > Sent: Thursday, July 12, 2012 11:56 PM
> > To: K, Narendra; gregory.v.rose@intel.com
> > Cc: netdev@vger.kernel.org
> > Subject: Re: [PATCH] ixgbevf - Prevent RX/TX statistics getting
> > reset to zero
> > 
> > On 07/12/2012 06:55 AM, Narendra_K@Dell.com wrote:
> > > Hello,
> > >
> > > [Apologies if you are receiving this message twice. I am
> > > resending the
> > message, as I got message delivery failure note].
> > >
> > > While exploring SR-IOV on Intel 82599EB 10-Gigabit SFP+ adapter,
> > > I had the
> > following observation.  I enabled two VFs by passing 'max_vfs=2' to
> > ixgbe driver. One of the VFs was assigned to a guest.
> > > In the guest, the ifconfig and ip tools reported 'RX packets' and
> > > 'TX packets'
> > as zero, after pinging to a remote host. Looking into it further,
> > the commit 4197aa7bb81877ebb06e4f2cc1b5fea2da23a7bd implements 64
> > bit per ring statistics. It seemed like the 'total_bytes' and
> > 'total_packets' of RX and TX ring were being reset to zero by the
> > RX and TX interrupt handlers, resulting in the user space tools
> > reporting zero RX and TX bytes.
> > >
> > > The attached patch addresses the issue by preventing the
> > > resetting of RX
> > and TX ring statistics to zero. The patch was taken against latest
> > mainline 3.5- rc6 kernel.
> > >
> > > I tested the patch by pinging  from the guest OS to a remote host.
> > >
> > > ping -f <remote host> -c 10000
> > >
> > > The ip and ifcofig showed the statistics increased by 10000
> > > packets.
> > >
> > > # lspci | grep 82599
> > > 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
> > > SFP+
> > Network Connection (rev 01)
> > > 04:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
> > > SFP+
> > Network Connection (rev 01)
> > > 04:10.0 Ethernet controller: Intel Corporation 82599 Ethernet
> > > Controller
> > Virtual Function (rev 01)
> > > 04:10.1 Ethernet controller: Intel Corporation 82599 Ethernet
> > > Controller
> > Virtual Function (rev 01)
> > > 04:10.2 Ethernet controller: Intel Corporation 82599 Ethernet
> > > Controller
> > Virtual Function (rev 01)
> > > 04:10.3 Ethernet controller: Intel Corporation 82599 Ethernet
> > > Controller
> > Virtual Function (rev 01)
> > >
> > > # lspci -s 04:00.0 -n
> > > 04:00.0 0200: 8086:154d (rev 01)
> > > # lspci -s 04:10.0 -n
> > > 04:10.0 0200: 8086:10ed (rev 01)
> > >
> > > Please let me know if additional details and logs are
> > > required.[>]  
> > >
> > > With regards,
> > > Narendra K
> > >
> > >
> > >
> > 
> > Thanks, I will add the patch to my queue
> > 
> [>] 
>  
> Hi Greg,
> 
> I was re-looking at why ' rx_ring->total_packets' and '
> rx_ring->total_bytes' were being set to zero in '
> ixgbevf_msix_clean_rx'.  It looks like ' rx_ring->total_packets' and
> ' rx_ring->total_packets'  are computed per one run of
> 'ixgbevf_clean_rx_irq' .  Then in 'ixgbevf_clean_rxonly' if
> 'adapter->itr_setting & 1' is true, the count is  used in
> 'ixgbevf_set_itr_msix'. When the interrupts are enabled, the '
> rx_ring->total_packets'  and ' rx_ring->total_bytes' are set to zero
> so that they can be re-computed in the poll function and  fed to the
> 'ixgbevf_set_itr_msix'.
> 
> This results in statistics reported by 'ip' and 'ifconfig' as zero.
> The patch addresses the scenario.  But it seems it would change the
> intended behavior in the scenario  when 'adapter->itr_setting & 1' is
> true.  It could be addressed by storing the 'total_rx_packets' and
> 'total_rx_bytes'  computed every time in the poll function in 'struct
> ixgbevf_adapter' . Then the interrupt handler could reset them to
> zero instead of resetting  ' rx_ring->total_packets' and  '
> rx_ring->total_bytes'.
> 
> Also, I observed that  'adapter->itr_setting & 1'  was not true by
> default.  I tried setting it by 'ethtool  -C eth0 adaptive-rx on',
> and it returned 'operation not supported'. 
> 
> I could be missing something here, please let me know.

Nope, you're correct in your analysis.  The ixgbevf driver hasn't
supported adaptive interrupt moderation in the past.  However, a set of
patches we have in the pipeline will turn it on by default.  Also,
as a result of those patches the bug you've reported will be fixed.
We'll go ahead and accept your patch for the net tree and then fix up
any conflicts between that and our new set of patches when they get
pushed to net-next.

Thanks for your work on this.

- Greg

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox