Netdev List
 help / color / mirror / Atom feed
* Re: [net-next 08/15] i40e: Allow user to change input set mask for flow director
From: David Miller @ 2016-04-27  3:48 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: kiran.patil, netdev, nhorman, sassmann, jogreene
In-Reply-To: <1461704148-114349-9-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 26 Apr 2016 13:55:41 -0700

> From: Kiran Patil <kiran.patil@intel.com>
> 
> This patch implements feature, which allows user to change
> input set mask for flow director using side-band channel.
> This patch adds definition of FLOW_TYPE_MASK into the header file.
> With this patch, user can now specify less than 4 tuple(src ip, dsp ip,
> src port, dst port) for flow type TCP4/UDP4.
> 
> Change-Id: I90052508f8c172c0da5a4b78b949704b4a59ea78
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

If you want to do this, you have to define the semantics generically
in include/uapi/linux/ethtool.h so that other drivers can implement
this too.

Please don't ever implement private, driver specific, interpretations
of the generic ethtool facilitites.

The semantics and interpretations of the values must absolutely be
consistent across all drivers in the tree.

Otherwise the user experience is terrible.

Thanks.

^ permalink raw reply

* Re: [PATCH -next] net: w5100: support W5500
From: David Miller @ 2016-04-27  4:31 UTC (permalink / raw)
  To: akinobu.mita; +Cc: netdev, msink
In-Reply-To: <1461703428-8078-1-git-send-email-akinobu.mita@gmail.com>

From: Akinobu Mita <akinobu.mita@gmail.com>
Date: Wed, 27 Apr 2016 05:43:48 +0900

> This adds support for W5500 chip.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net-rfs: fix false sharing accessing sd->input_queue_head
From: David Miller @ 2016-04-27  4:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1461709807.5535.55.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 26 Apr 2016 15:30:07 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> sd->input_queue_head is incremented for each processed packet
> in process_backlog(), and read from other cpus performing
> Out Of Order avoidance in get_rps_cpu()
> 
> Moving this field in a separate cache line keeps it mostly
> hot for the cpu in process_backlog(), as other cpus will
> only read it.
> 
> In a stress test, process_backlog() was consuming 6.80 % of cpu cycles,
> and the patch reduced the cost to 0.65 %
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, nice work Eric.

^ permalink raw reply

* Re: [PATCH 1/6] bus: Add shared MDIO bus framework
From: Anup Patel @ 2016-04-27  4:46 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Arnd Bergmann, Florian Fainelli, Pramod Kumar, Rob Herring,
	Catalin Marinas, Will Deacon, Masahiro Yamada, Chen-Yu Tsai,
	Mark Rutland, Device Tree, Pawel Moll, Suzuki K Poulose, netdev,
	Punit Agrawal, Linux Kernel, BCM Kernel Feedback,
	Linux ARM Kernel, Kishon Vijay Abraham I
In-Reply-To: <20160426194102.GF30107@lunn.ch>

On Wed, Apr 27, 2016 at 1:11 AM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Tue, Apr 26, 2016 at 09:24:34PM +0200, Arnd Bergmann wrote:
>> On Tuesday 26 April 2016 20:23:35 Andrew Lunn wrote:
>> > > A more complex problem would be having a PHY driver for a device
>> > > that can be either an ethernet phy or some other phy.
>> >
>> > I doubt that ever happens. You can have up to 32 different devices on
>> > an MDIO bus. Since an Ethernet PHY and a "some other sort of PHY" are
>> > completely different things, why would a hardware engineer place them
>> > on the same address? It is like saying your ATA controller and VGA
>> > controller share the same slot on the PCI bus...
>>
>> To clarify: what I meant is a device that is designed as a PHY for
>> similar hardware (e.g. SATA, USB3 and PCIe) and that has a common
>> register set and a single driver, but that driver can operate
>> in multiple modes. You typically have multiple instances of
>> such hardware, with each instance linked to exactly one host
>> device, but one driver for all of them.
>>
>> See Documentation/devicetree/bindings/phy/apm-xgene-phy.txt
>> and drivers/phy/phy-xgene.c for one such example.
>
> Interesting. Also, that this lists SGMII. I assume this is a phy in
> the MAC in order to talk to the Ethernet PHY.
>
> I still don't see it being a big problem if a phy driver implements an
> Ethernet PHY. It just needs to call phy_device_create() and
> phy_device_register().
>
>         Andrew

It is really interesting to see the evolution of MDIO bus:

1. Traditionally, MDIO controller used to be part of each ethernet controller
itself so each ethernet controller used to have it's own 2 wire MDIO bus

2. Next, we saw SoC with multiple ethernet controllers sharing same MDIO
bus. In other words, we saw multiple MDIO bus being muxed over single
MDIO bus with additional bus select lines (I think this is when
drivers/net/phy/mdio-mux.c APIs were implemented but at this point all
PHYs on muxed MDIO bus were ethernet PHYs).

3. Then, we saw SoC with ethernet switch devices also accessible over
shared MDIO bus (or Muxed MDIO bus) along with ethernet PHYs (I guess
this is why we have drivers/net/phy/mdio_device.c which adds
"mdio_device" for non-ethernet-PHY devices on MDIO bus).

4. Now, we have SoC with SATA PHYs, PCIe PHYs, USB PHYs, and Ethernet
PHYs all accessible over same shared MDIO bus (or Muxed MDIO bus). The
SATA PHYs and PCIe PHYs are registered to "Generic PHY framework". For
USB PHYs, we can either register to "Generic PHY framework" or "USB PHY
framework". For Ethernet PHYs, we register MDIO bus instance to "Ethernet
MDIO framework".

The devices on ARM64 SoC has to be within first 4GB and RAM has to start
from first 4GB to be ARM compliant because ARM64 CPUs have 32bit mode and
all devices and RAM should be available to 32bit mode with MMU disabled.
This means that we only have 4GB to fit all devices registers and some
portion of RAM. Some of these non-Ethernet PHYs have tons of registers so
as number of PHYs increase in a SoC they will eat-up lot of first 4GB
address space. Using same MDIO bus for all types of PHYs (SATA, PCIe, USB,
and Ethernet) is actually a good approach because it actually saves lot of
first 4GB address space. In future, more devices will be moved to a shared
MDIO bus which are less frequently accessed.

For Broadcom iProc SoCs, the design choice has already been made to use
shared MDIO bus for all PHYs. In fact, Broadcom iProc NS2 SoC already has
a shared MDIO bus for SATA PHYs, USB PHYs, PCIe PHYs, and Ethernet
PHYs and more Broadcom iProc SoCs are on their way. Of course, there are
few exceptions in iProc SoCs such as SATA PHYs where we also have memory
mapped registers to access PHYs but other PHYs don't have such memory
mapped registers.

Clearly from above, the traditional 2 wire MDIO bus is now a shared MDIO
bus with 2-wire plus additional select lines. Also, now we have SoCs (such
as Broadcom iProc SoCs) which has such shared MDIO bus and I think
in-future we will have SoCs with a shared MDIO bus for variety of devices.

For long term, we really need a clean solution to fit shared MDIO based
PHY drivers in Linux kernel. Also, shared MDIO based PHY drivers should
not be dependent on any particular IO subsystem (such as Linux Ethernet)
because there are lot of use-cases where people want strip down kernel
image by not-compiling IO subsystems which are not required.

IMHO, we have several options in front of us:
1. Use some light-weight framework (such as shared_mdio.c implemented
by this patchset) under drivers/bus
2. Extend "Generic PHY framework" to allow something like shared MDIO
bus (as-per Arnd's suggestion)
3. Move-out "MDIO-mux APIs" from drivers/net/phy to something like
drivers/mdio-mux and make it independent of "Linux Ethernet subsystem".
(... may be more options ...)

Regards,
Anup

^ permalink raw reply

* [PATCH v2] net: Add Qualcomm IPC router
From: Bjorn Andersson @ 2016-04-27  5:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: linux-kernel, netdev, linux-arm-msm, Courtney Cavin,
	Bjorn Andersson

From: Courtney Cavin <courtney.cavin@sonymobile.com>

Add an implementation of Qualcomm's IPC router protocol, used to
communicate with service providing remote processors.

Signed-off-by: Courtney Cavin <courtney.cavin@sonymobile.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
[bjorn: Cope with 0 being a valid node id and implement RTM_NEWADDR]
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---

Changes since v1:
- Made node 0 (normally the Qualcomm modem) a valid node
- Implemented RTM_NEWADDR for specifying the local node id

 include/linux/socket.h    |    4 +-
 include/uapi/linux/qrtr.h |   12 +
 net/Kconfig               |    1 +
 net/Makefile              |    1 +
 net/qrtr/Kconfig          |   24 ++
 net/qrtr/Makefile         |    2 +
 net/qrtr/qrtr.c           | 1007 +++++++++++++++++++++++++++++++++++++++++++++
 net/qrtr/qrtr.h           |   31 ++
 net/qrtr/smd.c            |  117 ++++++
 9 files changed, 1198 insertions(+), 1 deletion(-)
 create mode 100644 include/uapi/linux/qrtr.h
 create mode 100644 net/qrtr/Kconfig
 create mode 100644 net/qrtr/Makefile
 create mode 100644 net/qrtr/qrtr.c
 create mode 100644 net/qrtr/qrtr.h
 create mode 100644 net/qrtr/smd.c

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 73bf6c6a833b..b5cc5a6d7011 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -201,8 +201,9 @@ struct ucred {
 #define AF_NFC		39	/* NFC sockets			*/
 #define AF_VSOCK	40	/* vSockets			*/
 #define AF_KCM		41	/* Kernel Connection Multiplexor*/
+#define AF_QIPCRTR	42	/* Qualcomm IPC Router          */
 
-#define AF_MAX		42	/* For now.. */
+#define AF_MAX		43	/* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC	AF_UNSPEC
@@ -249,6 +250,7 @@ struct ucred {
 #define PF_NFC		AF_NFC
 #define PF_VSOCK	AF_VSOCK
 #define PF_KCM		AF_KCM
+#define PF_QIPCRTR	AF_QIPCRTR
 #define PF_MAX		AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/uapi/linux/qrtr.h b/include/uapi/linux/qrtr.h
new file mode 100644
index 000000000000..66c0748d26e2
--- /dev/null
+++ b/include/uapi/linux/qrtr.h
@@ -0,0 +1,12 @@
+#ifndef _LINUX_QRTR_H
+#define _LINUX_QRTR_H
+
+#include <linux/socket.h>
+
+struct sockaddr_qrtr {
+	__kernel_sa_family_t sq_family;
+	__u32 sq_node;
+	__u32 sq_port;
+};
+
+#endif /* _LINUX_QRTR_H */
diff --git a/net/Kconfig b/net/Kconfig
index a8934d8c8fda..b841c42e5c9b 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -236,6 +236,7 @@ source "net/mpls/Kconfig"
 source "net/hsr/Kconfig"
 source "net/switchdev/Kconfig"
 source "net/l3mdev/Kconfig"
+source "net/qrtr/Kconfig"
 
 config RPS
 	bool
diff --git a/net/Makefile b/net/Makefile
index 81d14119eab5..bdd14553a774 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -78,3 +78,4 @@ endif
 ifneq ($(CONFIG_NET_L3_MASTER_DEV),)
 obj-y				+= l3mdev/
 endif
+obj-$(CONFIG_QRTR)		+= qrtr/
diff --git a/net/qrtr/Kconfig b/net/qrtr/Kconfig
new file mode 100644
index 000000000000..f053cc25f621
--- /dev/null
+++ b/net/qrtr/Kconfig
@@ -0,0 +1,24 @@
+# Qualcomm IPC Router configuration
+#
+
+config QRTR
+	bool "Qualcomm IPC Router support"
+	depends on ARCH_QCOM || COMPILE_TEST
+	---help---
+	  Say Y if you intend to use Qualcomm IPC router protocol.  The
+	  protocol is used to communicate with services provided by other
+	  hardware blocks in the system.
+
+	  In order to do service lookups, a userspace daemon is required to
+	  maintain a service listing.
+
+if QRTR
+
+config QRTR_SMD
+	tristate "SMD IPC Router channels"
+	depends on QRTR && QCOM_SMD && OF
+	---help---
+	  Say Y here to support SMD based ipcrouter channels.  SMD is the
+	  most common transport for IPC Router.
+
+endif # QRTR
diff --git a/net/qrtr/Makefile b/net/qrtr/Makefile
new file mode 100644
index 000000000000..e282a84ffc5c
--- /dev/null
+++ b/net/qrtr/Makefile
@@ -0,0 +1,2 @@
+obj-y := qrtr.o
+obj-$(CONFIG_QRTR_SMD) += smd.o
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
new file mode 100644
index 000000000000..c985ecbe9bd6
--- /dev/null
+++ b/net/qrtr/qrtr.c
@@ -0,0 +1,1007 @@
+/*
+ * Copyright (c) 2015, Sony Mobile Communications Inc.
+ * Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/qrtr.h>
+#include <linux/termios.h>	/* For TIOCINQ/OUTQ */
+
+#include <net/sock.h>
+
+#include "qrtr.h"
+
+#define QRTR_PROTO_VER 1
+
+/* auto-bind range */
+#define QRTR_MIN_EPH_SOCKET 0x4000
+#define QRTR_MAX_EPH_SOCKET 0x7fff
+
+enum qrtr_pkt_type {
+	QRTR_TYPE_DATA		= 1,
+	QRTR_TYPE_HELLO		= 2,
+	QRTR_TYPE_BYE		= 3,
+	QRTR_TYPE_NEW_SERVER	= 4,
+	QRTR_TYPE_DEL_SERVER	= 5,
+	QRTR_TYPE_DEL_CLIENT	= 6,
+	QRTR_TYPE_RESUME_TX	= 7,
+	QRTR_TYPE_EXIT		= 8,
+	QRTR_TYPE_PING		= 9,
+};
+
+/**
+ * struct qrtr_hdr - (I|R)PCrouter packet header
+ * @version: protocol version
+ * @type: packet type; one of QRTR_TYPE_*
+ * @src_node_id: source node
+ * @src_port_id: source port
+ * @confirm_rx: boolean; whether a resume-tx packet should be send in reply
+ * @size: length of packet, excluding this header
+ * @dst_node_id: destination node
+ * @dst_port_id: destination port
+ */
+struct qrtr_hdr {
+	__le32 version;
+	__le32 type;
+	__le32 src_node_id;
+	__le32 src_port_id;
+	__le32 confirm_rx;
+	__le32 size;
+	__le32 dst_node_id;
+	__le32 dst_port_id;
+} __packed;
+
+#define QRTR_HDR_SIZE sizeof(struct qrtr_hdr)
+#define QRTR_NODE_BCAST ((unsigned int)-1)
+#define QRTR_PORT_CTRL ((unsigned int)-2)
+
+struct qrtr_sock {
+	/* WARNING: sk must be the first member */
+	struct sock sk;
+	struct sockaddr_qrtr us;
+	struct sockaddr_qrtr peer;
+};
+
+static inline struct qrtr_sock *qrtr_sk(struct sock *sk)
+{
+	BUILD_BUG_ON(offsetof(struct qrtr_sock, sk) != 0);
+	return container_of(sk, struct qrtr_sock, sk);
+}
+
+static unsigned int qrtr_local_nid = -1;
+
+/* for node ids */
+static RADIX_TREE(qrtr_nodes, GFP_KERNEL);
+/* broadcast list */
+static LIST_HEAD(qrtr_all_nodes);
+/* lock for qrtr_nodes, qrtr_all_nodes and node reference */
+static DEFINE_MUTEX(qrtr_node_lock);
+
+/* local port allocation management */
+static DEFINE_IDR(qrtr_ports);
+static DEFINE_MUTEX(qrtr_port_lock);
+
+/**
+ * struct qrtr_node - endpoint node
+ * @ep_lock: lock for endpoint management and callbacks
+ * @ep: endpoint
+ * @ref: reference count for node
+ * @nid: node id
+ * @rx_queue: receive queue
+ * @work: scheduled work struct for recv work
+ * @item: list item for broadcast list
+ */
+struct qrtr_node {
+	struct mutex ep_lock;
+	struct qrtr_endpoint *ep;
+	struct kref ref;
+	unsigned int nid;
+
+	struct sk_buff_head rx_queue;
+	struct work_struct work;
+	struct list_head item;
+};
+
+/* Release node resources and free the node.
+ *
+ * Do not call directly, use qrtr_node_release.  To be used with
+ * kref_put_mutex.  As such, the node mutex is expected to be locked on call.
+ */
+static void __qrtr_node_release(struct kref *kref)
+{
+	struct qrtr_node *node = container_of(kref, struct qrtr_node, ref);
+
+	if (node->nid != QRTR_EP_NID_AUTO)
+		radix_tree_delete(&qrtr_nodes, node->nid);
+
+	list_del(&node->item);
+	mutex_unlock(&qrtr_node_lock);
+
+	skb_queue_purge(&node->rx_queue);
+	kfree(node);
+}
+
+/* Increment reference to node. */
+static struct qrtr_node *qrtr_node_acquire(struct qrtr_node *node)
+{
+	if (node)
+		kref_get(&node->ref);
+	return node;
+}
+
+/* Decrement reference to node and release as necessary. */
+static void qrtr_node_release(struct qrtr_node *node)
+{
+	if (!node)
+		return;
+	kref_put_mutex(&node->ref, __qrtr_node_release, &qrtr_node_lock);
+}
+
+/* Pass an outgoing packet socket buffer to the endpoint driver. */
+static int qrtr_node_enqueue(struct qrtr_node *node, struct sk_buff *skb)
+{
+	int rc = -ENODEV;
+
+	mutex_lock(&node->ep_lock);
+	if (node->ep)
+		rc = node->ep->xmit(node->ep, skb);
+	else
+		kfree_skb(skb);
+	mutex_unlock(&node->ep_lock);
+
+	return rc;
+}
+
+/* Lookup node by id.
+ *
+ * callers must release with qrtr_node_release()
+ */
+static struct qrtr_node *qrtr_node_lookup(unsigned int nid)
+{
+	struct qrtr_node *node;
+
+	mutex_lock(&qrtr_node_lock);
+	node = radix_tree_lookup(&qrtr_nodes, nid);
+	node = qrtr_node_acquire(node);
+	mutex_unlock(&qrtr_node_lock);
+
+	return node;
+}
+
+/* Assign node id to node.
+ *
+ * This is mostly useful for automatic node id assignment, based on
+ * the source id in the incoming packet.
+ */
+static void qrtr_node_assign(struct qrtr_node *node, unsigned int nid)
+{
+	if (node->nid != QRTR_EP_NID_AUTO || nid == QRTR_EP_NID_AUTO)
+		return;
+
+	mutex_lock(&qrtr_node_lock);
+	radix_tree_insert(&qrtr_nodes, nid, node);
+	node->nid = nid;
+	mutex_unlock(&qrtr_node_lock);
+}
+
+/**
+ * qrtr_endpoint_post() - post incoming data
+ * @ep: endpoint handle
+ * @data: data pointer
+ * @len: size of data in bytes
+ *
+ * Return: 0 on success; negative error code on failure
+ */
+int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len)
+{
+	struct qrtr_node *node = ep->node;
+	const struct qrtr_hdr *phdr = data;
+	struct sk_buff *skb;
+	unsigned int psize;
+	unsigned int size;
+	unsigned int type;
+	unsigned int ver;
+	unsigned int dst;
+
+	if (len < QRTR_HDR_SIZE || len & 3)
+		return -EINVAL;
+
+	ver = le32_to_cpu(phdr->version);
+	size = le32_to_cpu(phdr->size);
+	type = le32_to_cpu(phdr->type);
+	dst = le32_to_cpu(phdr->dst_port_id);
+
+	psize = (size + 3) & ~3;
+
+	if (ver != QRTR_PROTO_VER)
+		return -EINVAL;
+
+	if (len != psize + QRTR_HDR_SIZE)
+		return -EINVAL;
+
+	if (dst != QRTR_PORT_CTRL && type != QRTR_TYPE_DATA)
+		return -EINVAL;
+
+	skb = netdev_alloc_skb(NULL, len);
+	if (!skb)
+		return -ENOMEM;
+
+	skb_reset_transport_header(skb);
+	memcpy(skb_put(skb, len), data, len);
+
+	skb_queue_tail(&node->rx_queue, skb);
+	schedule_work(&node->work);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(qrtr_endpoint_post);
+
+/* Allocate and construct a resume-tx packet. */
+static struct sk_buff *qrtr_alloc_resume_tx(u32 src_node,
+					    u32 dst_node, u32 port)
+{
+	const int pkt_len = 20;
+	struct qrtr_hdr *hdr;
+	struct sk_buff *skb;
+	u32 *buf;
+
+	skb = alloc_skb(QRTR_HDR_SIZE + pkt_len, GFP_KERNEL);
+	if (!skb)
+		return NULL;
+	skb_reset_transport_header(skb);
+
+	hdr = (struct qrtr_hdr *)skb_put(skb, QRTR_HDR_SIZE);
+	hdr->version = cpu_to_le32(QRTR_PROTO_VER);
+	hdr->type = cpu_to_le32(QRTR_TYPE_RESUME_TX);
+	hdr->src_node_id = cpu_to_le32(src_node);
+	hdr->src_port_id = cpu_to_le32(QRTR_PORT_CTRL);
+	hdr->confirm_rx = cpu_to_le32(0);
+	hdr->size = cpu_to_le32(pkt_len);
+	hdr->dst_node_id = cpu_to_le32(dst_node);
+	hdr->dst_port_id = cpu_to_le32(QRTR_PORT_CTRL);
+
+	buf = (u32 *)skb_put(skb, pkt_len);
+	memset(buf, 0, pkt_len);
+	buf[0] = cpu_to_le32(QRTR_TYPE_RESUME_TX);
+	buf[1] = cpu_to_le32(src_node);
+	buf[2] = cpu_to_le32(port);
+
+	return skb;
+}
+
+static struct qrtr_sock *qrtr_port_lookup(int port);
+static void qrtr_port_put(struct qrtr_sock *ipc);
+
+/* Handle and route a received packet.
+ *
+ * This will auto-reply with resume-tx packet as necessary.
+ */
+static void qrtr_node_rx_work(struct work_struct *work)
+{
+	struct qrtr_node *node = container_of(work, struct qrtr_node, work);
+	struct sk_buff *skb;
+
+	while ((skb = skb_dequeue(&node->rx_queue)) != NULL) {
+		const struct qrtr_hdr *phdr;
+		u32 dst_node, dst_port;
+		struct qrtr_sock *ipc;
+		u32 src_node;
+		int confirm;
+
+		phdr = (const struct qrtr_hdr *)skb_transport_header(skb);
+		src_node = le32_to_cpu(phdr->src_node_id);
+		dst_node = le32_to_cpu(phdr->dst_node_id);
+		dst_port = le32_to_cpu(phdr->dst_port_id);
+		confirm = !!phdr->confirm_rx;
+
+		qrtr_node_assign(node, src_node);
+
+		ipc = qrtr_port_lookup(dst_port);
+		if (!ipc) {
+			kfree_skb(skb);
+		} else {
+			if (sock_queue_rcv_skb(&ipc->sk, skb))
+				kfree_skb(skb);
+
+			qrtr_port_put(ipc);
+		}
+
+		if (confirm) {
+			skb = qrtr_alloc_resume_tx(dst_node, node->nid, dst_port);
+			if (!skb)
+				break;
+			if (qrtr_node_enqueue(node, skb))
+				break;
+		}
+	}
+}
+
+/**
+ * qrtr_endpoint_register() - register a new endpoint
+ * @ep: endpoint to register
+ * @nid: desired node id; may be QRTR_EP_NID_AUTO for auto-assignment
+ * Return: 0 on success; negative error code on failure
+ *
+ * The specified endpoint must have the xmit function pointer set on call.
+ */
+int qrtr_endpoint_register(struct qrtr_endpoint *ep, unsigned int nid)
+{
+	struct qrtr_node *node;
+
+	if (!ep || !ep->xmit)
+		return -EINVAL;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	INIT_WORK(&node->work, qrtr_node_rx_work);
+	kref_init(&node->ref);
+	mutex_init(&node->ep_lock);
+	skb_queue_head_init(&node->rx_queue);
+	node->nid = QRTR_EP_NID_AUTO;
+	node->ep = ep;
+
+	qrtr_node_assign(node, nid);
+
+	mutex_lock(&qrtr_node_lock);
+	list_add(&node->item, &qrtr_all_nodes);
+	mutex_unlock(&qrtr_node_lock);
+	ep->node = node;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(qrtr_endpoint_register);
+
+/**
+ * qrtr_endpoint_unregister - unregister endpoint
+ * @ep: endpoint to unregister
+ */
+void qrtr_endpoint_unregister(struct qrtr_endpoint *ep)
+{
+	struct qrtr_node *node = ep->node;
+
+	mutex_lock(&node->ep_lock);
+	node->ep = NULL;
+	mutex_unlock(&node->ep_lock);
+
+	qrtr_node_release(node);
+	ep->node = NULL;
+}
+EXPORT_SYMBOL_GPL(qrtr_endpoint_unregister);
+
+/* Lookup socket by port.
+ *
+ * Callers must release with qrtr_port_put()
+ */
+static struct qrtr_sock *qrtr_port_lookup(int port)
+{
+	struct qrtr_sock *ipc;
+
+	if (port == QRTR_PORT_CTRL)
+		port = 0;
+
+	mutex_lock(&qrtr_port_lock);
+	ipc = idr_find(&qrtr_ports, port);
+	if (ipc)
+		sock_hold(&ipc->sk);
+	mutex_unlock(&qrtr_port_lock);
+
+	return ipc;
+}
+
+/* Release acquired socket. */
+static void qrtr_port_put(struct qrtr_sock *ipc)
+{
+	sock_put(&ipc->sk);
+}
+
+/* Remove port assignment. */
+static void qrtr_port_remove(struct qrtr_sock *ipc)
+{
+	int port = ipc->us.sq_port;
+
+	if (port == QRTR_PORT_CTRL)
+		port = 0;
+
+	__sock_put(&ipc->sk);
+
+	mutex_lock(&qrtr_port_lock);
+	idr_remove(&qrtr_ports, port);
+	mutex_unlock(&qrtr_port_lock);
+}
+
+/* Assign port number to socket.
+ *
+ * Specify port in the integer pointed to by port, and it will be adjusted
+ * on return as necesssary.
+ *
+ * Port may be:
+ *   0: Assign ephemeral port in [QRTR_MIN_EPH_SOCKET, QRTR_MAX_EPH_SOCKET]
+ *   <QRTR_MIN_EPH_SOCKET: Specified; requires CAP_NET_ADMIN
+ *   >QRTR_MIN_EPH_SOCKET: Specified; available to all
+ */
+static int qrtr_port_assign(struct qrtr_sock *ipc, int *port)
+{
+	int rc;
+
+	mutex_lock(&qrtr_port_lock);
+	if (!*port) {
+		rc = idr_alloc(&qrtr_ports, ipc,
+			       QRTR_MIN_EPH_SOCKET, QRTR_MAX_EPH_SOCKET + 1,
+			       GFP_ATOMIC);
+		if (rc >= 0)
+			*port = rc;
+	} else if (*port < QRTR_MIN_EPH_SOCKET && !capable(CAP_NET_ADMIN)) {
+		rc = -EACCES;
+	} else if (*port == QRTR_PORT_CTRL) {
+		rc = idr_alloc(&qrtr_ports, ipc, 0, 1, GFP_ATOMIC);
+	} else {
+		rc = idr_alloc(&qrtr_ports, ipc, *port, *port + 1, GFP_ATOMIC);
+		if (rc >= 0)
+			*port = rc;
+	}
+	mutex_unlock(&qrtr_port_lock);
+
+	if (rc == -ENOSPC)
+		return -EADDRINUSE;
+	else if (rc < 0)
+		return rc;
+
+	sock_hold(&ipc->sk);
+
+	return 0;
+}
+
+/* Bind socket to address.
+ *
+ * Socket should be locked upon call.
+ */
+static int __qrtr_bind(struct socket *sock,
+		       const struct sockaddr_qrtr *addr, int zapped)
+{
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	int port;
+	int rc;
+
+	/* rebinding ok */
+	if (!zapped && addr->sq_port == ipc->us.sq_port)
+		return 0;
+
+	port = addr->sq_port;
+	rc = qrtr_port_assign(ipc, &port);
+	if (rc)
+		return rc;
+
+	/* unbind previous, if any */
+	if (!zapped)
+		qrtr_port_remove(ipc);
+	ipc->us.sq_port = port;
+
+	sock_reset_flag(sk, SOCK_ZAPPED);
+
+	return 0;
+}
+
+/* Auto bind to an ephemeral port. */
+static int qrtr_autobind(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	struct sockaddr_qrtr addr;
+
+	if (!sock_flag(sk, SOCK_ZAPPED))
+		return 0;
+
+	addr.sq_family = AF_QIPCRTR;
+	addr.sq_node = qrtr_local_nid;
+	addr.sq_port = 0;
+
+	return __qrtr_bind(sock, &addr, 1);
+}
+
+/* Bind socket to specified sockaddr. */
+static int qrtr_bind(struct socket *sock, struct sockaddr *saddr, int len)
+{
+	DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, saddr);
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	int rc;
+
+	if (len < sizeof(*addr) || addr->sq_family != AF_QIPCRTR)
+		return -EINVAL;
+
+	if (addr->sq_node != ipc->us.sq_node)
+		return -EINVAL;
+
+	lock_sock(sk);
+	rc = __qrtr_bind(sock, addr, sock_flag(sk, SOCK_ZAPPED));
+	release_sock(sk);
+
+	return rc;
+}
+
+/* Queue packet to local peer socket. */
+static int qrtr_local_enqueue(struct qrtr_node *node, struct sk_buff *skb)
+{
+	const struct qrtr_hdr *phdr;
+	struct qrtr_sock *ipc;
+
+	phdr = (const struct qrtr_hdr *)skb_transport_header(skb);
+
+	ipc = qrtr_port_lookup(le32_to_cpu(phdr->dst_port_id));
+	if (!ipc || &ipc->sk == skb->sk) { /* do not send to self */
+		kfree_skb(skb);
+		return -ENODEV;
+	}
+
+	if (sock_queue_rcv_skb(&ipc->sk, skb)) {
+		qrtr_port_put(ipc);
+		kfree_skb(skb);
+		return -ENOSPC;
+	}
+
+	qrtr_port_put(ipc);
+
+	return 0;
+}
+
+/* Queue packet for broadcast. */
+static int qrtr_bcast_enqueue(struct qrtr_node *node, struct sk_buff *skb)
+{
+	struct sk_buff *skbn;
+
+	mutex_lock(&qrtr_node_lock);
+	list_for_each_entry(node, &qrtr_all_nodes, item) {
+		skbn = skb_clone(skb, GFP_KERNEL);
+		if (!skbn)
+			break;
+		skb_set_owner_w(skbn, skb->sk);
+		qrtr_node_enqueue(node, skbn);
+	}
+	mutex_unlock(&qrtr_node_lock);
+
+	qrtr_local_enqueue(node, skb);
+
+	return 0;
+}
+
+static int qrtr_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
+{
+	DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, msg->msg_name);
+	int (*enqueue_fn)(struct qrtr_node *, struct sk_buff *);
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	struct qrtr_node *node;
+	struct qrtr_hdr *hdr;
+	struct sk_buff *skb;
+	size_t plen;
+	int rc;
+
+	if (msg->msg_flags & ~(MSG_DONTWAIT))
+		return -EINVAL;
+
+	if (len > 65535)
+		return -EMSGSIZE;
+
+	lock_sock(sk);
+
+	if (addr) {
+		if (msg->msg_namelen < sizeof(*addr)) {
+			release_sock(sk);
+			return -EINVAL;
+		}
+
+		if (addr->sq_family != AF_QIPCRTR) {
+			release_sock(sk);
+			return -EINVAL;
+		}
+
+		rc = qrtr_autobind(sock);
+		if (rc) {
+			release_sock(sk);
+			return rc;
+		}
+	} else if (sk->sk_state == TCP_ESTABLISHED) {
+		addr = &ipc->peer;
+	} else {
+		release_sock(sk);
+		return -ENOTCONN;
+	}
+
+	node = NULL;
+	if (addr->sq_node == QRTR_NODE_BCAST) {
+		enqueue_fn = qrtr_bcast_enqueue;
+	} else if (addr->sq_node == ipc->us.sq_node) {
+		enqueue_fn = qrtr_local_enqueue;
+	} else {
+		enqueue_fn = qrtr_node_enqueue;
+		node = qrtr_node_lookup(addr->sq_node);
+		if (!node) {
+			release_sock(sk);
+			return -ECONNRESET;
+		}
+	}
+
+	plen = (len + 3) & ~3;
+	skb = sock_alloc_send_skb(sk, plen + QRTR_HDR_SIZE,
+				  msg->msg_flags & MSG_DONTWAIT, &rc);
+	if (!skb)
+		goto out_node;
+
+	skb_reset_transport_header(skb);
+	skb_put(skb, len + QRTR_HDR_SIZE);
+
+	hdr = (struct qrtr_hdr *)skb_transport_header(skb);
+	hdr->version = cpu_to_le32(QRTR_PROTO_VER);
+	hdr->src_node_id = cpu_to_le32(ipc->us.sq_node);
+	hdr->src_port_id = cpu_to_le32(ipc->us.sq_port);
+	hdr->confirm_rx = cpu_to_le32(0);
+	hdr->size = cpu_to_le32(len);
+	hdr->dst_node_id = cpu_to_le32(addr->sq_node);
+	hdr->dst_port_id = cpu_to_le32(addr->sq_port);
+
+	rc = skb_copy_datagram_from_iter(skb, QRTR_HDR_SIZE,
+					 &msg->msg_iter, len);
+	if (rc) {
+		kfree_skb(skb);
+		goto out_node;
+	}
+
+	if (plen != len) {
+		skb_pad(skb, plen - len);
+		skb_put(skb, plen - len);
+	}
+
+	if (ipc->us.sq_port == QRTR_PORT_CTRL) {
+		if (len < 4) {
+			rc = -EINVAL;
+			kfree_skb(skb);
+			goto out_node;
+		}
+
+		/* control messages already require the type as 'command' */
+		skb_copy_bits(skb, QRTR_HDR_SIZE, &hdr->type, 4);
+	} else {
+		hdr->type = cpu_to_le32(QRTR_TYPE_DATA);
+	}
+
+	rc = enqueue_fn(node, skb);
+	if (rc >= 0)
+		rc = len;
+
+out_node:
+	qrtr_node_release(node);
+	release_sock(sk);
+
+	return rc;
+}
+
+static int qrtr_recvmsg(struct socket *sock, struct msghdr *msg,
+			size_t size, int flags)
+{
+	DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, msg->msg_name);
+	const struct qrtr_hdr *phdr;
+	struct sock *sk = sock->sk;
+	struct sk_buff *skb;
+	int copied, rc;
+
+	lock_sock(sk);
+
+	if (sock_flag(sk, SOCK_ZAPPED)) {
+		release_sock(sk);
+		return -EADDRNOTAVAIL;
+	}
+
+	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
+				flags & MSG_DONTWAIT, &rc);
+	if (!skb) {
+		release_sock(sk);
+		return rc;
+	}
+
+	phdr = (const struct qrtr_hdr *)skb_transport_header(skb);
+	copied = le32_to_cpu(phdr->size);
+	if (copied > size) {
+		copied = size;
+		msg->msg_flags |= MSG_TRUNC;
+	}
+
+	rc = skb_copy_datagram_msg(skb, QRTR_HDR_SIZE, msg, copied);
+	if (rc < 0)
+		goto out;
+	rc = copied;
+
+	if (addr) {
+		addr->sq_family = AF_QIPCRTR;
+		addr->sq_node = le32_to_cpu(phdr->src_node_id);
+		addr->sq_port = le32_to_cpu(phdr->src_port_id);
+		msg->msg_namelen = sizeof(*addr);
+	}
+
+out:
+	skb_free_datagram(sk, skb);
+	release_sock(sk);
+
+	return rc;
+}
+
+static int qrtr_connect(struct socket *sock, struct sockaddr *saddr,
+			int len, int flags)
+{
+	DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, saddr);
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	int rc;
+
+	if (len < sizeof(*addr) || addr->sq_family != AF_QIPCRTR)
+		return -EINVAL;
+
+	lock_sock(sk);
+
+	sk->sk_state = TCP_CLOSE;
+	sock->state = SS_UNCONNECTED;
+
+	rc = qrtr_autobind(sock);
+	if (rc) {
+		release_sock(sk);
+		return rc;
+	}
+
+	ipc->peer = *addr;
+	sock->state = SS_CONNECTED;
+	sk->sk_state = TCP_ESTABLISHED;
+
+	release_sock(sk);
+
+	return 0;
+}
+
+static int qrtr_getname(struct socket *sock, struct sockaddr *saddr,
+			int *len, int peer)
+{
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sockaddr_qrtr qaddr;
+	struct sock *sk = sock->sk;
+
+	lock_sock(sk);
+	if (peer) {
+		if (sk->sk_state != TCP_ESTABLISHED) {
+			release_sock(sk);
+			return -ENOTCONN;
+		}
+
+		qaddr = ipc->peer;
+	} else {
+		qaddr = ipc->us;
+	}
+	release_sock(sk);
+
+	*len = sizeof(qaddr);
+	qaddr.sq_family = AF_QIPCRTR;
+
+	memcpy(saddr, &qaddr, sizeof(qaddr));
+
+	return 0;
+}
+
+static int qrtr_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	struct qrtr_sock *ipc = qrtr_sk(sock->sk);
+	struct sock *sk = sock->sk;
+	struct sockaddr_qrtr *sq;
+	struct sk_buff *skb;
+	struct ifreq ifr;
+	long len = 0;
+	int rc = 0;
+
+	lock_sock(sk);
+
+	switch (cmd) {
+	case TIOCOUTQ:
+		len = sk->sk_sndbuf - sk_wmem_alloc_get(sk);
+		if (len < 0)
+			len = 0;
+		rc = put_user(len, (int __user *)argp);
+		break;
+	case TIOCINQ:
+		skb = skb_peek(&sk->sk_receive_queue);
+		if (skb)
+			len = skb->len - QRTR_HDR_SIZE;
+		rc = put_user(len, (int __user *)argp);
+		break;
+	case SIOCGIFADDR:
+		if (copy_from_user(&ifr, argp, sizeof(ifr))) {
+			rc = -EFAULT;
+			break;
+		}
+
+		sq = (struct sockaddr_qrtr *)&ifr.ifr_addr;
+		*sq = ipc->us;
+		if (copy_to_user(argp, &ifr, sizeof(ifr))) {
+			rc = -EFAULT;
+			break;
+		}
+		break;
+	case SIOCGSTAMP:
+		rc = sock_get_timestamp(sk, argp);
+		break;
+	case SIOCADDRT:
+	case SIOCDELRT:
+	case SIOCSIFADDR:
+	case SIOCGIFDSTADDR:
+	case SIOCSIFDSTADDR:
+	case SIOCGIFBRDADDR:
+	case SIOCSIFBRDADDR:
+	case SIOCGIFNETMASK:
+	case SIOCSIFNETMASK:
+		rc = -EINVAL;
+		break;
+	default:
+		rc = -ENOIOCTLCMD;
+		break;
+	}
+
+	release_sock(sk);
+
+	return rc;
+}
+
+static int qrtr_release(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	struct qrtr_sock *ipc;
+
+	if (!sk)
+		return 0;
+
+	lock_sock(sk);
+
+	ipc = qrtr_sk(sk);
+	sk->sk_shutdown = SHUTDOWN_MASK;
+	if (!sock_flag(sk, SOCK_DEAD))
+		sk->sk_state_change(sk);
+
+	sock_set_flag(sk, SOCK_DEAD);
+	sock->sk = NULL;
+
+	if (!sock_flag(sk, SOCK_ZAPPED))
+		qrtr_port_remove(ipc);
+
+	skb_queue_purge(&sk->sk_receive_queue);
+
+	release_sock(sk);
+	sock_put(sk);
+
+	return 0;
+}
+
+static const struct proto_ops qrtr_proto_ops = {
+	.owner		= THIS_MODULE,
+	.family		= AF_QIPCRTR,
+	.bind		= qrtr_bind,
+	.connect	= qrtr_connect,
+	.socketpair	= sock_no_socketpair,
+	.accept		= sock_no_accept,
+	.listen		= sock_no_listen,
+	.sendmsg	= qrtr_sendmsg,
+	.recvmsg	= qrtr_recvmsg,
+	.getname	= qrtr_getname,
+	.ioctl		= qrtr_ioctl,
+	.poll		= datagram_poll,
+	.shutdown	= sock_no_shutdown,
+	.setsockopt	= sock_no_setsockopt,
+	.getsockopt	= sock_no_getsockopt,
+	.release	= qrtr_release,
+	.mmap		= sock_no_mmap,
+	.sendpage	= sock_no_sendpage,
+};
+
+static struct proto qrtr_proto = {
+	.name		= "QIPCRTR",
+	.owner		= THIS_MODULE,
+	.obj_size	= sizeof(struct qrtr_sock),
+};
+
+static int qrtr_create(struct net *net, struct socket *sock,
+		       int protocol, int kern)
+{
+	struct qrtr_sock *ipc;
+	struct sock *sk;
+
+	if (sock->type != SOCK_DGRAM)
+		return -EPROTOTYPE;
+
+	sk = sk_alloc(net, AF_QIPCRTR, GFP_KERNEL, &qrtr_proto, kern);
+	if (!sk)
+		return -ENOMEM;
+
+	sock_set_flag(sk, SOCK_ZAPPED);
+
+	sock_init_data(sock, sk);
+	sock->ops = &qrtr_proto_ops;
+
+	ipc = qrtr_sk(sk);
+	ipc->us.sq_family = AF_QIPCRTR;
+	ipc->us.sq_node = qrtr_local_nid;
+	ipc->us.sq_port = 0;
+
+	return 0;
+}
+
+static const struct nla_policy qrtr_policy[IFA_MAX + 1] = {
+	[IFA_LOCAL] = { .type = NLA_U32 },
+};
+
+static int qrtr_addr_doit(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct nlattr *tb[IFA_MAX + 1];
+	struct ifaddrmsg *ifm;
+	int rc;
+
+	if (!netlink_capable(skb, CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (!netlink_capable(skb, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	ASSERT_RTNL();
+
+	rc = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, qrtr_policy);
+	if (rc < 0)
+		return rc;
+
+	ifm = nlmsg_data(nlh);
+	if (!tb[IFA_LOCAL])
+		return -EINVAL;
+
+	qrtr_local_nid = nla_get_u32(tb[IFA_LOCAL]);
+	return 0;
+}
+
+static const struct net_proto_family qrtr_family = {
+	.owner	= THIS_MODULE,
+	.family	= AF_QIPCRTR,
+	.create	= qrtr_create,
+};
+
+static int __init qrtr_proto_init(void)
+{
+	int rc;
+
+	rc = proto_register(&qrtr_proto, 1);
+	if (rc)
+		return rc;
+
+	rc = sock_register(&qrtr_family);
+	if (rc) {
+		proto_unregister(&qrtr_proto);
+		return rc;
+	}
+
+	rtnl_register(PF_QIPCRTR, RTM_NEWADDR, qrtr_addr_doit, NULL, NULL);
+
+	return 0;
+}
+module_init(qrtr_proto_init);
+
+static void __exit qrtr_proto_fini(void)
+{
+	rtnl_unregister(PF_QIPCRTR, RTM_NEWADDR);
+	sock_unregister(qrtr_family.family);
+	proto_unregister(&qrtr_proto);
+}
+module_exit(qrtr_proto_fini);
+
+MODULE_DESCRIPTION("Qualcomm IPC-router driver");
+MODULE_LICENSE("GPL v2");
diff --git a/net/qrtr/qrtr.h b/net/qrtr/qrtr.h
new file mode 100644
index 000000000000..2b848718f8fe
--- /dev/null
+++ b/net/qrtr/qrtr.h
@@ -0,0 +1,31 @@
+#ifndef __QRTR_H_
+#define __QRTR_H_
+
+#include <linux/types.h>
+
+struct sk_buff;
+
+/* endpoint node id auto assignment */
+#define QRTR_EP_NID_AUTO (-1)
+
+/**
+ * struct qrtr_endpoint - endpoint handle
+ * @xmit: Callback for outgoing packets
+ *
+ * The socket buffer passed to the xmit function becomes owned by the endpoint
+ * driver.  As such, when the driver is done with the buffer, it should
+ * call kfree_skb() on failure, or consume_skb() on success.
+ */
+struct qrtr_endpoint {
+	int (*xmit)(struct qrtr_endpoint *ep, struct sk_buff *skb);
+	/* private: not for endpoint use */
+	struct qrtr_node *node;
+};
+
+int qrtr_endpoint_register(struct qrtr_endpoint *ep, unsigned int nid);
+
+void qrtr_endpoint_unregister(struct qrtr_endpoint *ep);
+
+int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len);
+
+#endif
diff --git a/net/qrtr/smd.c b/net/qrtr/smd.c
new file mode 100644
index 000000000000..84ebce73aa23
--- /dev/null
+++ b/net/qrtr/smd.c
@@ -0,0 +1,117 @@
+/*
+ * Copyright (c) 2015, Sony Mobile Communications Inc.
+ * Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/soc/qcom/smd.h>
+
+#include "qrtr.h"
+
+struct qrtr_smd_dev {
+	struct qrtr_endpoint ep;
+	struct qcom_smd_channel *channel;
+};
+
+/* from smd to qrtr */
+static int qcom_smd_qrtr_callback(struct qcom_smd_device *sdev,
+				  const void *data, size_t len)
+{
+	struct qrtr_smd_dev *qdev = dev_get_drvdata(&sdev->dev);
+	int rc;
+
+	if (!qdev)
+		return -EAGAIN;
+
+	rc = qrtr_endpoint_post(&qdev->ep, data, len);
+	if (rc == -EINVAL) {
+		dev_err(&sdev->dev, "invalid ipcrouter packet\n");
+		/* return 0 to let smd drop the packet */
+		rc = 0;
+	}
+
+	return rc;
+}
+
+/* from qrtr to smd */
+static int qcom_smd_qrtr_send(struct qrtr_endpoint *ep, struct sk_buff *skb)
+{
+	struct qrtr_smd_dev *qdev = container_of(ep, struct qrtr_smd_dev, ep);
+	int rc;
+
+	rc = skb_linearize(skb);
+	if (rc)
+		goto out;
+
+	rc = qcom_smd_send(qdev->channel, skb->data, skb->len);
+
+out:
+	if (rc)
+		kfree_skb(skb);
+	else
+		consume_skb(skb);
+	return rc;
+}
+
+static int qcom_smd_qrtr_probe(struct qcom_smd_device *sdev)
+{
+	struct qrtr_smd_dev *qdev;
+	int rc;
+
+	qdev = devm_kzalloc(&sdev->dev, sizeof(*qdev), GFP_KERNEL);
+	if (!qdev)
+		return -ENOMEM;
+
+	qdev->channel = sdev->channel;
+	qdev->ep.xmit = qcom_smd_qrtr_send;
+
+	rc = qrtr_endpoint_register(&qdev->ep, QRTR_EP_NID_AUTO);
+	if (rc)
+		return rc;
+
+	dev_set_drvdata(&sdev->dev, qdev);
+
+	dev_dbg(&sdev->dev, "Qualcomm SMD QRTR driver probed\n");
+
+	return 0;
+}
+
+static void qcom_smd_qrtr_remove(struct qcom_smd_device *sdev)
+{
+	struct qrtr_smd_dev *qdev = dev_get_drvdata(&sdev->dev);
+
+	qrtr_endpoint_unregister(&qdev->ep);
+
+	dev_set_drvdata(&sdev->dev, NULL);
+}
+
+static const struct qcom_smd_id qcom_smd_qrtr_smd_match[] = {
+	{ "IPCRTR" },
+	{}
+};
+
+static struct qcom_smd_driver qcom_smd_qrtr_driver = {
+	.probe = qcom_smd_qrtr_probe,
+	.remove = qcom_smd_qrtr_remove,
+	.callback = qcom_smd_qrtr_callback,
+	.smd_match_table = qcom_smd_qrtr_smd_match,
+	.driver = {
+		.name = "qcom_smd_qrtr",
+		.owner = THIS_MODULE,
+	},
+};
+
+module_qcom_smd_driver(qcom_smd_qrtr_driver);
+
+MODULE_DESCRIPTION("Qualcomm IPC-Router SMD interface driver");
+MODULE_LICENSE("GPL v2");
-- 
2.5.0

^ permalink raw reply related

* Re: [RFC PATCH 4/5] bnxt: Add support for segmentation of tunnels with outer checksums
From: Michael Chan @ 2016-04-27  5:55 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: eugenia, bruce.w.allan, saeedm, netdev, intel-wired-lan,
	ariel.elior, Michael Chan
In-Reply-To: <20160419190615.11723.53966.stgit@ahduyck-xeon-server>

On Tue, Apr 19, 2016 at 12:06 PM, Alexander Duyck <aduyck@mirantis.com> wrote:
> This patch assumes that the bnxt hardware will ignore existing IPv4/v6
> header fields for length and checksum as well as the length and checksum
> fields for outer UDP and GRE headers.
>
> I have no means of testing this as I do not have any bnx2x hardware but
> thought I would submit it as an RFC to see if anyone out there wants to
> test this and see if this does in fact enable this functionality allowing
> us to to segment tunneled frames that have an outer checksum.
>
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>

Hi Alex, I just did a very quick test of this patch on our bnxt
hardware and it seemed to work.

I created a vxlan endpoint with udpcsum enabled and I saw TSO packets
getting through.  I've verified that our hardware can be programmed to
either ignore outer UDP checksum or to calculate it.  Current default
is to ignore ipv4 UDP checksum and calculate ipv6 UDP checksum.
Thanks.

> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>

^ permalink raw reply

* Re: [net-next PATCH V2 3/5] samples/bpf: add a README file to get users started
From: Jesper Dangaard Brouer @ 2016-04-27  6:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: brouer, netdev, linux-kbuild, bblanco, naveen.n.rao, borkmann
In-Reply-To: <20160426173105.GD42777@ast-mbp.thefacebook.com>

On Tue, 26 Apr 2016 10:31:06 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Apr 26, 2016 at 06:27:22PM +0200, Jesper Dangaard Brouer wrote:
> > +
> > +Manually compiling LLVM with 'bpf' support
> > +------------------------------------------
> > +
> > +In some LLVM versions the BPF target were marked experimental. They
> > +needed the 'cmake .. -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=BPF'.  Since
> > +version 3.7.1, LLVM adds a proper LLVM backend target for the BPF
> > +bytecode architecture.  
> 
> it's actually non-experimental since 3.7.0.
> It was experimental after 3.6 was released during development of 3.7.
> I doubt you can find this anywhere, so I suggest to just drop this paragraph.

Okay lets drop this paragraph, given the LLVM period was so short, it
does not make sense to mention here.

I will send a V3 patch series, as DaveM usually likes a full resubmit
(I'll add your acks to the other patches, but you need to ack this one).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net] net/mlx4: Avoid wrong virtual mappings
From: Leon Romanovsky @ 2016-04-27  6:32 UTC (permalink / raw)
  To: Haggai Abramovsky
  Cc: David S. Miller, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Sinan Kaya, Timur Tabi, Eli Cohen,
	Or Gerlitz, Eran Ben Elisha, Yishai Hadas, Tal Alon,
	Saeed Mahameed
In-Reply-To: <1461591287-11595-1-git-send-email-hagaya-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

On Mon, Apr 25, 2016 at 04:34:47PM +0300, Haggai Abramovsky wrote:
> -int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
> -		   struct mlx4_buf *buf, gfp_t gfp)
> +static int mlx4_buf_direct_alloc(struct mlx4_dev *dev, int size,
> +				 struct mlx4_buf *buf, gfp_t gfp)
>  {
> -	dma_addr_t t;
> +		dma_addr_t t;
>  

You have wrong indentation in whole this function.

> -	if (size <= max_direct) {
>  		buf->nbufs        = 1;
>  		buf->npages       = 1;
>  		buf->page_shift   = get_order(size) + PAGE_SHIFT;
> -		buf->direct.buf   = dma_alloc_coherent(&dev->persist->pdev->dev,
> -						       size, &t, gfp);
> +		buf->direct.buf   =
> +			dma_zalloc_coherent(&dev->persist->pdev->dev,
> +					    size, &t, gfp);
>  		if (!buf->direct.buf)
>  			return -ENOMEM;

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* (unknown)
From: Leon Romanovsky @ 2016-04-27  6:38 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	FaisalLatif-2ukJVAZIZ/Y

[-- Attachment #1: Type: text/plain, Size: 2374 bytes --]

<faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Bcc: 
Subject: Re: [PATCH net] RDMA/nes: don't leak skb if carrier down
Reply-To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
In-Reply-To: <1461529139-28582-1-git-send-email-fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org>

On Sun, Apr 24, 2016 at 10:18:59PM +0200, Florian Westphal wrote:
> Alternatively one could free the skb, OTOH I don't think this test is
> useful so just remove it.
> 
> Cc: <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> Signed-off-by: Florian Westphal <fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org>

I don't know why did you decide to send it to netdev and not to relevant
maintainers, but the relevant mailing list is linux-rdma and Faisal
needs to Review/Acknowledge it.

➜  linux-rdma git:(master) ./scripts/get_maintainer.pl -f
drivers/infiniband/hw/nes/nes_nic.c
Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> (supporter:NETEFFECT IWARP RNIC
DRIVER (IW_NES))
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> (supporter:INFINIBAND SUBSYSTEM)
Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> (supporter:INFINIBAND SUBSYSTEM)
Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> (supporter:INFINIBAND
SUBSYSTEM)
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (open list:NETEFFECT IWARP RNIC DRIVER
(IW_NES))
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org (open list)


> ---
>  Noticed this while working on the TX_LOCKED removal.
> 
> diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
> index 3ea9e05..9291453 100644
> --- a/drivers/infiniband/hw/nes/nes_nic.c
> +++ b/drivers/infiniband/hw/nes/nes_nic.c
> @@ -500,9 +500,6 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev)
>  	 *		skb_shinfo(skb)->nr_frags, skb_is_gso(skb));
>  	 */
>  
> -	if (!netif_carrier_ok(netdev))
> -		return NETDEV_TX_OK;
> -
>  	if (netif_queue_stopped(netdev))
>  		return NETDEV_TX_BUSY;
>  
> -- 
> 2.7.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [net-next PATCH V2 5/5] samples/bpf: like LLC also verify and allow redefining CLANG command
From: Jesper Dangaard Brouer @ 2016-04-27  6:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: brouer, netdev, linux-kbuild, bblanco, naveen.n.rao, borkmann
In-Reply-To: <20160426173608.GF42777@ast-mbp.thefacebook.com>

On Tue, 26 Apr 2016 10:36:10 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Apr 26, 2016 at 06:27:32PM +0200, Jesper Dangaard Brouer wrote:
> > Users are likely to manually compile both LLVM 'llc' and 'clang'
> > tools.  Thus, also allow redefining CLANG and verify command exist.
> > 
> > Makefile implementation wise, the target that verify the command have
> > been generalized.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  samples/bpf/Makefile   |   23 +++++++++++++----------
> >  samples/bpf/README.rst |    6 +++---
> >  2 files changed, 16 insertions(+), 13 deletions(-)
> > 
> > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > index dd63521832d8..c02ea9d2a248 100644
> > --- a/samples/bpf/Makefile
> > +++ b/samples/bpf/Makefile
> > @@ -81,9 +81,10 @@ HOSTLOADLIBES_spintest += -lelf
> >  HOSTLOADLIBES_map_perf_test += -lelf -lrt
> >  HOSTLOADLIBES_test_overhead += -lelf -lrt
> >  
> > -# Allows pointing LLC to a LLVM backend with bpf support, redefine on cmdline:
> > -#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc
> > +# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
> > +#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
> >  LLC ?= llc
> > +CLANG ?= clang
> >  
> >  # Trick to allow make to be run from this directory
> >  all:
> > @@ -94,15 +95,17 @@ clean:
> >  	@rm -f *~
> >  
> >  # Verify LLVM compiler is available and bpf target is supported
> > -.PHONY: verify_cmd_llc verify_target_bpf
> > +.PHONY: verify_cmd_llc verify_target_bpf $(CLANG) $(LLC)
> >  
> > -verify_cmd_llc:
> > -	@if ! (which "${LLC}" > /dev/null 2>&1); then \
> > -		echo "*** ERROR: Cannot find LLVM tool 'llc' (${LLC})" ;\
> > -		exit 1; \
> > -	else true; fi
> > +verify_cmds: $(CLANG) $(LLC)
> > +	@for TOOL in $^ ; do \
> > +		if ! (which "$${TOOL}" > /dev/null 2>&1); then \
> > +			echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\
> > +			exit 1; \
> > +		else true; fi; \
> > +	done
> >  
> > -verify_target_bpf: verify_cmd_llc
> > +verify_target_bpf: verify_cmds
> >  	@if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \
> >  		echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\
> >  		echo "   NOTICE: LLVM version >= 3.7.1 required" ;\  
> 
> If I read the patch correctly, it only checks that any version
> of clang is available and llc supports -march=bpf.
> That's correct.

Yes, you read the patch correctly :-)

> There is no need to build the latest clang most of the time.
> clang 3.4 and 3.5 are fine to compile samples/bpf/
> since llvm ir is mostly compatible with llc from 3.7 or 3.8

Good to get confirmation. I was testing with clang 3.5.0, and llc 3.9-git.

> Acked-by: Alexei Starovoitov <ast@kernel.org>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net] net/mlx4: Avoid wrong virtual mappings
From: Haggai Abramovsky @ 2016-04-27  7:06 UTC (permalink / raw)
  To: leon
  Cc: Haggai Abramovsky, David S. Miller, Doug Ledford, linux-rdma,
	netdev, Sinan Kaya, Timur Tabi, Eli Cohen, Or Gerlitz,
	Eran Ben Elisha, Yishai Hadas, Tal Alon, Saeed Mahameed
In-Reply-To: <20160427063254.GM7974@leon.nu>

On Wed, Apr 27, 2016 at 9:32 AM, Leon Romanovsky <leon@kernel.org> wrote:
> On Mon, Apr 25, 2016 at 04:34:47PM +0300, Haggai Abramovsky wrote:
>> -int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
>> -                struct mlx4_buf *buf, gfp_t gfp)
>> +static int mlx4_buf_direct_alloc(struct mlx4_dev *dev, int size,
>> +                              struct mlx4_buf *buf, gfp_t gfp)
>>  {
>> -     dma_addr_t t;
>> +             dma_addr_t t;
>>
>
> You have wrong indentation in whole this function.

Thanks, I will fix this and send v1.

>
>> -     if (size <= max_direct) {
>>               buf->nbufs        = 1;
>>               buf->npages       = 1;
>>               buf->page_shift   = get_order(size) + PAGE_SHIFT;
>> -             buf->direct.buf   = dma_alloc_coherent(&dev->persist->pdev->dev,
>> -                                                    size, &t, gfp);
>> +             buf->direct.buf   =
>> +                     dma_zalloc_coherent(&dev->persist->pdev->dev,
>> +                                         size, &t, gfp);
>>               if (!buf->direct.buf)
>>                       return -ENOMEM;

^ permalink raw reply

* [PATCH v1 net] net/mlx4: Avoid wrong virtual mappings
From: Haggai Abramovsky @ 2016-04-27  7:07 UTC (permalink / raw)
  To: David S. Miller, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Sinan Kaya, Timur Tabi, Eli Cohen, Or Gerlitz, Eran Ben Elisha,
	Yishai Hadas, Tal Alon, Saeed Mahameed, Haggai Abramovsky

The dma_alloc_coherent() function returns a virtual address which can
be used for coherent access to the underlying memory.  On some
architectures, like arm64, undefined behavior results if this memory is
also accessed via virtual mappings that are not coherent.  Because of
their undefined nature, operations like virt_to_page() return garbage
when passed virtual addresses obtained from dma_alloc_coherent().  Any
subsequent mappings via vmap() of the garbage page values are unusable
and result in bad things like bus errors (synchronous aborts in ARM64
speak).

The mlx4 driver contains code that does the equivalent of:
vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
device is opened.

Prevent Ethernet driver to run this problematic code by forcing it to
allocate contiguous memory. As for the Infiniband driver, at first we
are trying to allocate contiguous memory, but in case of failure roll
back to work with fragmented memory.

Signed-off-by: Haggai Abramovsky <hagaya-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reported-by: David Daney <david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Tested-by: Sinan Kaya <okaya-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
Changes from v0:
	Fix indentation error raised by LeonR

 drivers/infiniband/hw/mlx4/qp.c                   | 26 +++++--
 drivers/net/ethernet/mellanox/mlx4/alloc.c        | 93 ++++++++++-------------
 drivers/net/ethernet/mellanox/mlx4/en_cq.c        |  9 +--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c    |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_resources.c | 31 --------
 drivers/net/ethernet/mellanox/mlx4/en_rx.c        | 11 +--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c        | 14 +---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h      |  2 -
 include/linux/mlx4/device.h                       |  4 +-
 9 files changed, 67 insertions(+), 125 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index fd97534..842a6da 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -419,7 +419,8 @@ static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 }
 
 static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
-			      enum mlx4_ib_qp_type type, struct mlx4_ib_qp *qp)
+			      enum mlx4_ib_qp_type type, struct mlx4_ib_qp *qp,
+			      int shrink_wqe)
 {
 	int s;
 
@@ -477,7 +478,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	 * We set WQE size to at least 64 bytes, this way stamping
 	 * invalidates each WQE.
 	 */
-	if (dev->dev->caps.fw_ver >= MLX4_FW_VER_WQE_CTRL_NEC &&
+	if (shrink_wqe && dev->dev->caps.fw_ver >= MLX4_FW_VER_WQE_CTRL_NEC &&
 	    qp->sq_signal_bits && BITS_PER_LONG == 64 &&
 	    type != MLX4_IB_QPT_SMI && type != MLX4_IB_QPT_GSI &&
 	    !(type & (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_PROXY_SMI |
@@ -642,6 +643,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 {
 	int qpn;
 	int err;
+	struct ib_qp_cap backup_cap;
 	struct mlx4_ib_sqp *sqp;
 	struct mlx4_ib_qp *qp;
 	enum mlx4_ib_qp_type qp_type = (enum mlx4_ib_qp_type) init_attr->qp_type;
@@ -775,7 +777,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 				goto err;
 		}
 
-		err = set_kernel_sq_size(dev, &init_attr->cap, qp_type, qp);
+		memcpy(&backup_cap, &init_attr->cap, sizeof(backup_cap));
+		err = set_kernel_sq_size(dev, &init_attr->cap, qp_type, qp, 1);
 		if (err)
 			goto err;
 
@@ -787,9 +790,20 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			*qp->db.db = 0;
 		}
 
-		if (mlx4_buf_alloc(dev->dev, qp->buf_size, PAGE_SIZE * 2, &qp->buf, gfp)) {
-			err = -ENOMEM;
-			goto err_db;
+		if (mlx4_buf_alloc(dev->dev, qp->buf_size, qp->buf_size,
+				   &qp->buf, gfp)) {
+			memcpy(&init_attr->cap, &backup_cap,
+			       sizeof(backup_cap));
+			err = set_kernel_sq_size(dev, &init_attr->cap, qp_type,
+						 qp, 0);
+			if (err)
+				goto err_db;
+
+			if (mlx4_buf_alloc(dev->dev, qp->buf_size,
+					   PAGE_SIZE * 2, &qp->buf, gfp)) {
+				err = -ENOMEM;
+				goto err_db;
+			}
 		}
 
 		err = mlx4_mtt_init(dev->dev, qp->buf.npages, qp->buf.page_shift,
diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c b/drivers/net/ethernet/mellanox/mlx4/alloc.c
index 0c51c69..249a458 100644
--- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
@@ -576,41 +576,48 @@ out:
 
 	return res;
 }
-/*
- * Handling for queue buffers -- we allocate a bunch of memory and
- * register it in a memory region at HCA virtual address 0.  If the
- * requested size is > max_direct, we split the allocation into
- * multiple pages, so we don't require too much contiguous memory.
- */
 
-int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
-		   struct mlx4_buf *buf, gfp_t gfp)
+static int mlx4_buf_direct_alloc(struct mlx4_dev *dev, int size,
+				 struct mlx4_buf *buf, gfp_t gfp)
 {
 	dma_addr_t t;
 
-	if (size <= max_direct) {
-		buf->nbufs        = 1;
-		buf->npages       = 1;
-		buf->page_shift   = get_order(size) + PAGE_SHIFT;
-		buf->direct.buf   = dma_alloc_coherent(&dev->persist->pdev->dev,
-						       size, &t, gfp);
-		if (!buf->direct.buf)
-			return -ENOMEM;
+	buf->nbufs        = 1;
+	buf->npages       = 1;
+	buf->page_shift   = get_order(size) + PAGE_SHIFT;
+	buf->direct.buf   =
+		dma_zalloc_coherent(&dev->persist->pdev->dev,
+				    size, &t, gfp);
+	if (!buf->direct.buf)
+		return -ENOMEM;
 
-		buf->direct.map = t;
+	buf->direct.map = t;
 
-		while (t & ((1 << buf->page_shift) - 1)) {
-			--buf->page_shift;
-			buf->npages *= 2;
-		}
+	while (t & ((1 << buf->page_shift) - 1)) {
+		--buf->page_shift;
+		buf->npages *= 2;
+	}
 
-		memset(buf->direct.buf, 0, size);
+	return 0;
+}
+
+/* Handling for queue buffers -- we allocate a bunch of memory and
+ * register it in a memory region at HCA virtual address 0. If the
+ *  requested size is > max_direct, we split the allocation into
+ *  multiple pages, so we don't require too much contiguous memory.
+ */
+int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
+		   struct mlx4_buf *buf, gfp_t gfp)
+{
+	if (size <= max_direct) {
+		return mlx4_buf_direct_alloc(dev, size, buf, gfp);
 	} else {
+		dma_addr_t t;
 		int i;
 
-		buf->direct.buf  = NULL;
-		buf->nbufs       = (size + PAGE_SIZE - 1) / PAGE_SIZE;
-		buf->npages      = buf->nbufs;
+		buf->direct.buf = NULL;
+		buf->nbufs	= (size + PAGE_SIZE - 1) / PAGE_SIZE;
+		buf->npages	= buf->nbufs;
 		buf->page_shift  = PAGE_SHIFT;
 		buf->page_list   = kcalloc(buf->nbufs, sizeof(*buf->page_list),
 					   gfp);
@@ -619,28 +626,12 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
 
 		for (i = 0; i < buf->nbufs; ++i) {
 			buf->page_list[i].buf =
-				dma_alloc_coherent(&dev->persist->pdev->dev,
-						   PAGE_SIZE,
-						   &t, gfp);
+				dma_zalloc_coherent(&dev->persist->pdev->dev,
+						    PAGE_SIZE, &t, gfp);
 			if (!buf->page_list[i].buf)
 				goto err_free;
 
 			buf->page_list[i].map = t;
-
-			memset(buf->page_list[i].buf, 0, PAGE_SIZE);
-		}
-
-		if (BITS_PER_LONG == 64) {
-			struct page **pages;
-			pages = kmalloc(sizeof *pages * buf->nbufs, gfp);
-			if (!pages)
-				goto err_free;
-			for (i = 0; i < buf->nbufs; ++i)
-				pages[i] = virt_to_page(buf->page_list[i].buf);
-			buf->direct.buf = vmap(pages, buf->nbufs, VM_MAP, PAGE_KERNEL);
-			kfree(pages);
-			if (!buf->direct.buf)
-				goto err_free;
 		}
 	}
 
@@ -655,15 +646,11 @@ EXPORT_SYMBOL_GPL(mlx4_buf_alloc);
 
 void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf)
 {
-	int i;
-
-	if (buf->nbufs == 1)
+	if (buf->nbufs == 1) {
 		dma_free_coherent(&dev->persist->pdev->dev, size,
-				  buf->direct.buf,
-				  buf->direct.map);
-	else {
-		if (BITS_PER_LONG == 64)
-			vunmap(buf->direct.buf);
+				  buf->direct.buf, buf->direct.map);
+	} else {
+		int i;
 
 		for (i = 0; i < buf->nbufs; ++i)
 			if (buf->page_list[i].buf)
@@ -789,7 +776,7 @@ void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db)
 EXPORT_SYMBOL_GPL(mlx4_db_free);
 
 int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
-		       int size, int max_direct)
+		       int size)
 {
 	int err;
 
@@ -799,7 +786,7 @@ int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
 
 	*wqres->db.db = 0;
 
-	err = mlx4_buf_alloc(dev, size, max_direct, &wqres->buf, GFP_KERNEL);
+	err = mlx4_buf_direct_alloc(dev, size, &wqres->buf, GFP_KERNEL);
 	if (err)
 		goto err_db;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index af975a2..132cea6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -73,22 +73,16 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv,
 	 */
 	set_dev_node(&mdev->dev->persist->pdev->dev, node);
 	err = mlx4_alloc_hwq_res(mdev->dev, &cq->wqres,
-				cq->buf_size, 2 * PAGE_SIZE);
+				cq->buf_size);
 	set_dev_node(&mdev->dev->persist->pdev->dev, mdev->dev->numa_node);
 	if (err)
 		goto err_cq;
 
-	err = mlx4_en_map_buffer(&cq->wqres.buf);
-	if (err)
-		goto err_res;
-
 	cq->buf = (struct mlx4_cqe *)cq->wqres.buf.direct.buf;
 	*pcq = cq;
 
 	return 0;
 
-err_res:
-	mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size);
 err_cq:
 	kfree(cq);
 	*pcq = NULL;
@@ -177,7 +171,6 @@ void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq **pcq)
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_cq *cq = *pcq;
 
-	mlx4_en_unmap_buffer(&cq->wqres.buf);
 	mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size);
 	if (mlx4_is_eq_vector_valid(mdev->dev, priv->port, cq->vector) &&
 	    cq->is_tx == RX)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index b4b258c..5b19178 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2907,7 +2907,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 
 	/* Allocate page for receive rings */
 	err = mlx4_alloc_hwq_res(mdev->dev, &priv->res,
-				MLX4_EN_PAGE_SIZE, MLX4_EN_PAGE_SIZE);
+				MLX4_EN_PAGE_SIZE);
 	if (err) {
 		en_err(priv, "Failed to allocate page for rx qps\n");
 		goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_resources.c b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
index 02e925d..a6b0db0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_resources.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
@@ -107,37 +107,6 @@ int mlx4_en_change_mcast_lb(struct mlx4_en_priv *priv, struct mlx4_qp *qp,
 	return ret;
 }
 
-int mlx4_en_map_buffer(struct mlx4_buf *buf)
-{
-	struct page **pages;
-	int i;
-
-	if (BITS_PER_LONG == 64 || buf->nbufs == 1)
-		return 0;
-
-	pages = kmalloc(sizeof *pages * buf->nbufs, GFP_KERNEL);
-	if (!pages)
-		return -ENOMEM;
-
-	for (i = 0; i < buf->nbufs; ++i)
-		pages[i] = virt_to_page(buf->page_list[i].buf);
-
-	buf->direct.buf = vmap(pages, buf->nbufs, VM_MAP, PAGE_KERNEL);
-	kfree(pages);
-	if (!buf->direct.buf)
-		return -ENOMEM;
-
-	return 0;
-}
-
-void mlx4_en_unmap_buffer(struct mlx4_buf *buf)
-{
-	if (BITS_PER_LONG == 64 || buf->nbufs == 1)
-		return;
-
-	vunmap(buf->direct.buf);
-}
-
 void mlx4_en_sqp_event(struct mlx4_qp *qp, enum mlx4_event event)
 {
     return;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 7d25bc9..89775bb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -394,17 +394,11 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 
 	/* Allocate HW buffers on provided NUMA node */
 	set_dev_node(&mdev->dev->persist->pdev->dev, node);
-	err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres,
-				 ring->buf_size, 2 * PAGE_SIZE);
+	err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres, ring->buf_size);
 	set_dev_node(&mdev->dev->persist->pdev->dev, mdev->dev->numa_node);
 	if (err)
 		goto err_info;
 
-	err = mlx4_en_map_buffer(&ring->wqres.buf);
-	if (err) {
-		en_err(priv, "Failed to map RX buffer\n");
-		goto err_hwq;
-	}
 	ring->buf = ring->wqres.buf.direct.buf;
 
 	ring->hwtstamp_rx_filter = priv->hwtstamp_config.rx_filter;
@@ -412,8 +406,6 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 	*pring = ring;
 	return 0;
 
-err_hwq:
-	mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size);
 err_info:
 	vfree(ring->rx_info);
 	ring->rx_info = NULL;
@@ -517,7 +509,6 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_rx_ring *ring = *pring;
 
-	mlx4_en_unmap_buffer(&ring->wqres.buf);
 	mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
 	vfree(ring->rx_info);
 	ring->rx_info = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index c0d7b72..b9ab646 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -93,20 +93,13 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 
 	/* Allocate HW buffers on provided NUMA node */
 	set_dev_node(&mdev->dev->persist->pdev->dev, node);
-	err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres, ring->buf_size,
-				 2 * PAGE_SIZE);
+	err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres, ring->buf_size);
 	set_dev_node(&mdev->dev->persist->pdev->dev, mdev->dev->numa_node);
 	if (err) {
 		en_err(priv, "Failed allocating hwq resources\n");
 		goto err_bounce;
 	}
 
-	err = mlx4_en_map_buffer(&ring->wqres.buf);
-	if (err) {
-		en_err(priv, "Failed to map TX buffer\n");
-		goto err_hwq_res;
-	}
-
 	ring->buf = ring->wqres.buf.direct.buf;
 
 	en_dbg(DRV, priv, "Allocated TX ring (addr:%p) - buf:%p size:%d buf_size:%d dma:%llx\n",
@@ -117,7 +110,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 				    MLX4_RESERVE_ETH_BF_QP);
 	if (err) {
 		en_err(priv, "failed reserving qp for TX ring\n");
-		goto err_map;
+		goto err_hwq_res;
 	}
 
 	err = mlx4_qp_alloc(mdev->dev, ring->qpn, &ring->qp, GFP_KERNEL);
@@ -154,8 +147,6 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 
 err_reserve:
 	mlx4_qp_release_range(mdev->dev, ring->qpn, 1);
-err_map:
-	mlx4_en_unmap_buffer(&ring->wqres.buf);
 err_hwq_res:
 	mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size);
 err_bounce:
@@ -182,7 +173,6 @@ void mlx4_en_destroy_tx_ring(struct mlx4_en_priv *priv,
 	mlx4_qp_remove(mdev->dev, &ring->qp);
 	mlx4_qp_free(mdev->dev, &ring->qp);
 	mlx4_qp_release_range(priv->mdev->dev, ring->qpn, 1);
-	mlx4_en_unmap_buffer(&ring->wqres.buf);
 	mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size);
 	kfree(ring->bounce_buf);
 	ring->bounce_buf = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index d12ab6a..a70e2d0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -671,8 +671,6 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride,
 		int is_tx, int rss, int qpn, int cqn, int user_prio,
 		struct mlx4_qp_context *context);
 void mlx4_en_sqp_event(struct mlx4_qp *qp, enum mlx4_event event);
-int mlx4_en_map_buffer(struct mlx4_buf *buf);
-void mlx4_en_unmap_buffer(struct mlx4_buf *buf);
 int mlx4_en_change_mcast_lb(struct mlx4_en_priv *priv, struct mlx4_qp *qp,
 			    int loopback);
 void mlx4_en_calc_rx_buf(struct net_device *dev);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 8541a91..72da65f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1051,7 +1051,7 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
 void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf);
 static inline void *mlx4_buf_offset(struct mlx4_buf *buf, int offset)
 {
-	if (BITS_PER_LONG == 64 || buf->nbufs == 1)
+	if (buf->nbufs == 1)
 		return buf->direct.buf + offset;
 	else
 		return buf->page_list[offset >> PAGE_SHIFT].buf +
@@ -1091,7 +1091,7 @@ int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order,
 void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);
 
 int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
-		       int size, int max_direct);
+		       int size);
 void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,
 		       int size);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] ARM: dts: at91: VInCo: fix phy reset gpio flag
From: Nicolas Ferre @ 2016-04-27  7:15 UTC (permalink / raw)
  To: Sergei Shtylyov, David Miller
  Cc: linux-kernel, linux-arm-kernel, gregory.clement,
	alexandre.belloni, netdev, andrew
In-Reply-To: <571FB305.4060304@cogentembedded.com>

Le 26/04/2016 20:27, Sergei Shtylyov a écrit :
> On 04/26/2016 08:17 PM, David Miller wrote:
> 
>>> I plan to queue this patch through arm-soc for 4.7.
>>
>> Ok.
> 
>     How about this patch going thru your net-next repo instead?
> I'd like to keep the kernel bisectable... if my phylib/macb patches get merged 
> earlier than this one, that board would be broken...

Sergei, David,

I don't think that there is a big risk for this board to be tested in
the meantime as it's not widely deployed yet.
And as I'm aware of the issue and basically maintaining this DT file, I
think that I'll be informed if people try an unlikely arrangement of
patches on this board.

So either way, I'm okay. But I do think it's not worth thinking too much
about this case.

Bye,
-- 
Nicolas Ferre

^ permalink raw reply

* Re: [PATCH net] net/mlx4: Avoid wrong virtual mappings
From: Christoph Hellwig @ 2016-04-27  7:19 UTC (permalink / raw)
  To: Haggai Abramovsky
  Cc: David S. Miller, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Sinan Kaya, Timur Tabi, Eli Cohen,
	Or Gerlitz, Eran Ben Elisha, Yishai Hadas, Tal Alon,
	Saeed Mahameed
In-Reply-To: <1461591287-11595-1-git-send-email-hagaya-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Hi Haggai,

I've taken a very quick look because I'm a little too busy this week,
but I failed to grasp the patch, as it seems to do a few too many
things.  E.g. the whole wqe_shrink thing doesn't correspond to anything
in the description.

How about you split it into easily understanable chunks?

Also now that you split the few places that actually split
the allocation into chunks to be handled special I think the whole
mlx4_buf abstraction should go away, as it just obsfucates
how different the different use cases are.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 9/9] taskstats: use the libnl API to align nlattr on 64-bit
From: Nicolas Dichtel @ 2016-04-27  7:29 UTC (permalink / raw)
  To: Balbir Singh, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ,
	steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ,
	herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	aar-bIcnvbaLZ9MEGnE8C9+IrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, yoshfuji-VfPWfsRibaP+Ru+s062T9g,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	kadlec-K40Dz/62t/MgiyqX0sVFJYdd74u8MsAO,
	kuznet-v/Mj1YrvjDBInbfyfbPRSQ, jmorris-gx6/JNMH7DfYtjvyW6yDsg,
	linux-wpan-u79uwXL29TY76Z2rM5mHXA, kaber-dcUjhNyLwpNeoWH0uzbU5w,
	pablo-Cap9r6Oaw4JrovVCs/uTlw
In-Reply-To: <57201287.80002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Le 27/04/2016 03:14, Balbir Singh a écrit :
> 
> 
> On 23/04/16 01:31, Nicolas Dichtel wrote:
>> Goal of this patch is to use the new libnl API to align netlink attribute
>> when needed.
>> The layout of the netlink message will be a bit different after the patch,
>> because the padattr (TASKSTATS_TYPE_STATS) will be inside the nested
>> attribute instead of before it.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> 
> The layout will change/break user space -- I've not tested the patch though..
Sigh.

I quote David:
"All userspace components using netlink should always ignore attributes
they do not recognize in dumps.

This is one of the most basic principles of netlink"

Do you have some pointers so I can made some tests?


Regards,
Nicolas
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply

* [net-next PATCH V3 0/5] samples/bpf: Improve user experience
From: Jesper Dangaard Brouer @ 2016-04-27  7:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kbuild, bblanco, Jesper Dangaard Brouer, naveen.n.rao,
	borkmann, alexei.starovoitov

It is a steep learning curve getting started with using the eBPF
examples in samples/bpf/.  There are several dependencies, and
specific versions of these dependencies.  Invoking make in the correct
manor is also slightly obscure.

This patchset cleanup, document and hopefully improves the first time
user experience with the eBPF samples directory by auto-detecting
certain scenarios.

V3:
 - Add Alexei's ACKs
 - Remove README paragraph about LLVM experimental BPF target
   as it only existed between LLVM version 3.6 to 3.7.

V2:
 - Adjusted recommend minimum versions to 3.7.1
 - Included clang build instructions
 - New patch adding CLANG variable and validation of command

---

Jesper Dangaard Brouer (5):
      samples/bpf: add back functionality to redefine LLC command
      samples/bpf: Makefile verify LLVM compiler avail and bpf target is supported
      samples/bpf: add a README file to get users started
      samples/bpf: allow make to be run from samples/bpf/ directory
      samples/bpf: like LLC also verify and allow redefining CLANG command


 samples/bpf/Makefile   |   37 ++++++++++++++++++++++-
 samples/bpf/README.rst |   78 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 113 insertions(+), 2 deletions(-)
 create mode 100644 samples/bpf/README.rst

^ permalink raw reply

* [net-next PATCH V3 1/5] samples/bpf: add back functionality to redefine LLC command
From: Jesper Dangaard Brouer @ 2016-04-27  7:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kbuild, bblanco, Jesper Dangaard Brouer, naveen.n.rao,
	borkmann, alexei.starovoitov
In-Reply-To: <20160427072855.29959.70252.stgit@firesoul>

It is practical to be-able-to redefine the location of the LLVM
command 'llc', because not all distros have a LLVM version with bpf
target support.  Thus, it is sometimes required to compile LLVM from
source, and sometimes it is not desired to overwrite the distros
default LLVM version.

This feature was removed with 128d1514be35 ("samples/bpf: Use llc in
PATH, rather than a hardcoded value").

Add this features back. Note that it is possible to redefine the LLC
on the make command like:

 make samples/bpf/ LLC=~/git/llvm/build/bin/llc

Fixes: 128d1514be35 ("samples/bpf: Use llc in PATH, rather than a hardcoded value")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/Makefile |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 744dd7a16144..5bae9536f100 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -81,10 +81,14 @@ HOSTLOADLIBES_spintest += -lelf
 HOSTLOADLIBES_map_perf_test += -lelf -lrt
 HOSTLOADLIBES_test_overhead += -lelf -lrt
 
+# Allows pointing LLC to a LLVM backend with bpf support, redefine on cmdline:
+#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc
+LLC ?= llc
+
 # asm/sysreg.h - inline assembly used by it is incompatible with llvm.
 # But, there is no easy way to fix it, so just exclude it since it is
 # useless for BPF samples.
 $(obj)/%.o: $(src)/%.c
 	clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
 		-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
-		-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=obj -o $@
+		-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@

^ permalink raw reply related

* [net-next PATCH V3 2/5] samples/bpf: Makefile verify LLVM compiler avail and bpf target is supported
From: Jesper Dangaard Brouer @ 2016-04-27  7:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kbuild, bblanco, Jesper Dangaard Brouer, naveen.n.rao,
	borkmann, alexei.starovoitov
In-Reply-To: <20160427072855.29959.70252.stgit@firesoul>

Make compiling samples/bpf more user friendly, by detecting if LLVM
compiler tool 'llc' is available, and also detect if the 'bpf' target
is available in this version of LLVM.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/Makefile |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 5bae9536f100..45859c99f573 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -85,6 +85,24 @@ HOSTLOADLIBES_test_overhead += -lelf -lrt
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc
 LLC ?= llc
 
+# Verify LLVM compiler is available and bpf target is supported
+.PHONY: verify_cmd_llc verify_target_bpf
+
+verify_cmd_llc:
+	@if ! (which "${LLC}" > /dev/null 2>&1); then \
+		echo "*** ERROR: Cannot find LLVM tool 'llc' (${LLC})" ;\
+		exit 1; \
+	else true; fi
+
+verify_target_bpf: verify_cmd_llc
+	@if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \
+		echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\
+		echo "   NOTICE: LLVM version >= 3.7.1 required" ;\
+		exit 2; \
+	else true; fi
+
+$(src)/*.c: verify_target_bpf
+
 # asm/sysreg.h - inline assembly used by it is incompatible with llvm.
 # But, there is no easy way to fix it, so just exclude it since it is
 # useless for BPF samples.


^ permalink raw reply related

* [net-next PATCH V3 3/5] samples/bpf: add a README file to get users started
From: Jesper Dangaard Brouer @ 2016-04-27  7:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kbuild, bblanco, Jesper Dangaard Brouer, naveen.n.rao,
	borkmann, alexei.starovoitov
In-Reply-To: <20160427072855.29959.70252.stgit@firesoul>

Getting started with using examples in samples/bpf/ is not
straightforward.  There are several dependencies, and specific
versions of these dependencies.

Just compiling the example tool is also slightly obscure, e.g. one
need to call make like:

 make samples/bpf/

Do notice the "/" slash after the directory name.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 samples/bpf/README.rst |   75 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 samples/bpf/README.rst

diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
new file mode 100644
index 000000000000..1fa157db905b
--- /dev/null
+++ b/samples/bpf/README.rst
@@ -0,0 +1,75 @@
+eBPF sample programs
+====================
+
+This kernel samples/bpf directory contains a mini eBPF library, test
+stubs, verifier test-suite and examples for using eBPF.
+
+Build dependencies
+==================
+
+Compiling requires having installed:
+ * clang >= version 3.4.0
+ * llvm >= version 3.7.1
+
+Note that LLVM's tool 'llc' must support target 'bpf', list with command::
+
+ $ llc --version
+ LLVM (http://llvm.org/):
+  LLVM version 3.x.y
+  [...]
+  Host CPU: xxx
+
+  Registered Targets:
+    [...]
+    bpf        - BPF (host endian)
+    bpfeb      - BPF (big endian)
+    bpfel      - BPF (little endian)
+    [...]
+
+Kernel headers
+--------------
+
+There are usually dependencies to header files of the current kernel.
+To avoid installing devel kernel headers system wide, as a normal
+user, simply call::
+
+ make headers_install
+
+This will creates a local "usr/include" directory in the git/build top
+level directory, that the make system automatically pickup first.
+
+Compiling
+=========
+
+For compiling goto kernel top level build directory and run make like::
+
+ make samples/bpf/
+
+Do notice the "/" slash after the directory name.
+
+Manually compiling LLVM with 'bpf' support
+------------------------------------------
+
+Since version 3.7.0, LLVM adds a proper LLVM backend target for the
+BPF bytecode architecture.
+
+By default llvm will build all non-experimental backends including bpf.
+To generate a smaller llc binary one can use::
+
+ -DLLVM_TARGETS_TO_BUILD="BPF;X86"
+
+Quick sniplet for manually compiling LLVM and clang
+(build dependencies are cmake and gcc-c++)::
+
+ $ git clone http://llvm.org/git/llvm.git
+ $ cd llvm/tools
+ $ git clone --depth 1 http://llvm.org/git/clang.git
+ $ cd ..; mkdir build; cd build
+ $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86"
+ $ make -j $(getconf _NPROCESSORS_ONLN)
+
+It is also possible to point make to the newly compiled 'llc' command
+via redefining LLC on the make command line::
+
+ make samples/bpf/ LLC=~/git/llvm/build/bin/llc
+

^ permalink raw reply related

* [net-next PATCH V3 4/5] samples/bpf: allow make to be run from samples/bpf/ directory
From: Jesper Dangaard Brouer @ 2016-04-27  7:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kbuild, bblanco, Jesper Dangaard Brouer, naveen.n.rao,
	borkmann, alexei.starovoitov
In-Reply-To: <20160427072855.29959.70252.stgit@firesoul>

It is not intuitive that 'make' must be run from the top level
directory with argument "samples/bpf/" to compile these eBPF samples.

Introduce a kbuild make file trick that allow make to be run from the
"samples/bpf/" directory itself.  It basically change to the top level
directory and call "make samples/bpf/" with the "/" slash after the
directory name.

Also add a clean target that only cleans this directory, by taking
advantage of the kbuild external module setting M=$PWD.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/Makefile   |    8 ++++++++
 samples/bpf/README.rst |    3 +++
 2 files changed, 11 insertions(+)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 45859c99f573..dd63521832d8 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -85,6 +85,14 @@ HOSTLOADLIBES_test_overhead += -lelf -lrt
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc
 LLC ?= llc
 
+# Trick to allow make to be run from this directory
+all:
+	$(MAKE) -C ../../ $$PWD/
+
+clean:
+	$(MAKE) -C ../../ M=$$PWD clean
+	@rm -f *~
+
 # Verify LLVM compiler is available and bpf target is supported
 .PHONY: verify_cmd_llc verify_target_bpf
 
diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
index 1fa157db905b..6bbc7958c305 100644
--- a/samples/bpf/README.rst
+++ b/samples/bpf/README.rst
@@ -47,6 +47,9 @@ For compiling goto kernel top level build directory and run make like::
 
 Do notice the "/" slash after the directory name.
 
+It is also possible to call make from this directory.  This will just
+hide the the invocation of make as above with the appended "/".
+
 Manually compiling LLVM with 'bpf' support
 ------------------------------------------
 


^ permalink raw reply related

* [net-next PATCH V3 5/5] samples/bpf: like LLC also verify and allow redefining CLANG command
From: Jesper Dangaard Brouer @ 2016-04-27  7:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kbuild, bblanco, Jesper Dangaard Brouer, naveen.n.rao,
	borkmann, alexei.starovoitov
In-Reply-To: <20160427072855.29959.70252.stgit@firesoul>

Users are likely to manually compile both LLVM 'llc' and 'clang'
tools.  Thus, also allow redefining CLANG and verify command exist.

Makefile implementation wise, the target that verify the command have
been generalized.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/Makefile   |   25 ++++++++++++++-----------
 samples/bpf/README.rst |    6 +++---
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index dd63521832d8..157d68d6eace 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -81,9 +81,10 @@ HOSTLOADLIBES_spintest += -lelf
 HOSTLOADLIBES_map_perf_test += -lelf -lrt
 HOSTLOADLIBES_test_overhead += -lelf -lrt
 
-# Allows pointing LLC to a LLVM backend with bpf support, redefine on cmdline:
-#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc
+# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
+#  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
 LLC ?= llc
+CLANG ?= clang
 
 # Trick to allow make to be run from this directory
 all:
@@ -93,16 +94,18 @@ clean:
 	$(MAKE) -C ../../ M=$$PWD clean
 	@rm -f *~
 
-# Verify LLVM compiler is available and bpf target is supported
-.PHONY: verify_cmd_llc verify_target_bpf
+# Verify LLVM compiler tools are available and bpf target is supported by llc
+.PHONY: verify_cmds verify_target_bpf $(CLANG) $(LLC)
 
-verify_cmd_llc:
-	@if ! (which "${LLC}" > /dev/null 2>&1); then \
-		echo "*** ERROR: Cannot find LLVM tool 'llc' (${LLC})" ;\
-		exit 1; \
-	else true; fi
+verify_cmds: $(CLANG) $(LLC)
+	@for TOOL in $^ ; do \
+		if ! (which "$${TOOL}" > /dev/null 2>&1); then \
+			echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\
+			exit 1; \
+		else true; fi; \
+	done
 
-verify_target_bpf: verify_cmd_llc
+verify_target_bpf: verify_cmds
 	@if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \
 		echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\
 		echo "   NOTICE: LLVM version >= 3.7.1 required" ;\
@@ -115,6 +118,6 @@ $(src)/*.c: verify_target_bpf
 # But, there is no easy way to fix it, so just exclude it since it is
 # useless for BPF samples.
 $(obj)/%.o: $(src)/%.c
-	clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
+	$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
 		-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
 		-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
index 6bbc7958c305..d653548467a5 100644
--- a/samples/bpf/README.rst
+++ b/samples/bpf/README.rst
@@ -71,8 +71,8 @@ Quick sniplet for manually compiling LLVM and clang
  $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86"
  $ make -j $(getconf _NPROCESSORS_ONLN)
 
-It is also possible to point make to the newly compiled 'llc' command
-via redefining LLC on the make command line::
+It is also possible to point make to the newly compiled 'llc' or
+'clang' command via redefining LLC or CLANG on the make command line::
 
- make samples/bpf/ LLC=~/git/llvm/build/bin/llc
+ make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
 


^ permalink raw reply related

* [PATCH v2 0/2] pegasus: correct buffer sizes
From: Petko Manolov @ 2016-04-27  7:35 UTC (permalink / raw)
  To: netdev; +Cc: davem, a1291762, Petko Manolov

As noticed by Lincoln Ramsay <a1291762@gmail.com> some old (usb 1.1) Pegasus 
based devices may actually return more bytes than the specified in the datasheet 
amount.  That would not be a problem if the allocated space for the SKB was 
equal to the parameter passed to usb_fill_bulk_urb().  Some poor bugger (i 
really hope it was not me, but 'git blame' is useless in this case, so anyway) 
decided to add '+ 8' to the buffer length parameter.  Sometimes the usb transfer 
overflows and corrupts the socket structure, leading to kernel panic.

The above doesn't seem to happen for newer (Pegasus2 based) devices which did 
help this bug to hide for so long.

Nearly all Pegasus devices may append the RX status to the end of the received 
packet.  It is the default setup for the driver.  The actual ethernet packet is 
4 bytes shorter.  Why and when 'pkt_len -= 4' became 'pkt_len -= 8' is again 
hidden in the mists of time.  There might have been a good reason to do so, but 
multiple reads of the datasheet did not point me to any.

The patch is against v4.6-rc5 and was tested on ADM8515 device by transferring 
multiple gigabytes of data over a couple of days without any complaints from the 
kernel.

Changes since v1:

 - split the patch in two parts;
 - corrected the subject lines;

Petko Manolov (2):
  pegasus: fixes URB buffer allocation size;
  pegasus: fixes reported packet length

 drivers/net/usb/pegasus.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.8.0.rc3

^ permalink raw reply

* [PATCH v2 1/2] pegasus: fixes URB buffer allocation size;
From: Petko Manolov @ 2016-04-27  7:35 UTC (permalink / raw)
  To: netdev; +Cc: davem, a1291762, Petko Manolov
In-Reply-To: <1461742525-2748-1-git-send-email-petkan@mip-labs.com>

usb_fill_bulk_urb() receives buffer length parameter 8 bytes larger
than what's allocated by alloc_skb(); This seems to be a problem with
older (pegasus usb-1.1) devices, which may silently return more data
than the maximal packet length.

Reported-by: Lincoln Ramsay <a1291762@gmail.com>
Signed-off-by: Petko Manolov <petkan@mip-labs.com>
---
 drivers/net/usb/pegasus.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
index f840802..f919e20 100644
--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -528,7 +528,7 @@ static void read_bulk_callback(struct urb *urb)
 goon:
 	usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
 			  usb_rcvbulkpipe(pegasus->usb, 1),
-			  pegasus->rx_skb->data, PEGASUS_MTU + 8,
+			  pegasus->rx_skb->data, PEGASUS_MTU,
 			  read_bulk_callback, pegasus);
 	rx_status = usb_submit_urb(pegasus->rx_urb, GFP_ATOMIC);
 	if (rx_status == -ENODEV)
@@ -569,7 +569,7 @@ static void rx_fixup(unsigned long data)
 	}
 	usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
 			  usb_rcvbulkpipe(pegasus->usb, 1),
-			  pegasus->rx_skb->data, PEGASUS_MTU + 8,
+			  pegasus->rx_skb->data, PEGASUS_MTU,
 			  read_bulk_callback, pegasus);
 try_again:
 	status = usb_submit_urb(pegasus->rx_urb, GFP_ATOMIC);
@@ -823,7 +823,7 @@ static int pegasus_open(struct net_device *net)
 
 	usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb,
 			  usb_rcvbulkpipe(pegasus->usb, 1),
-			  pegasus->rx_skb->data, PEGASUS_MTU + 8,
+			  pegasus->rx_skb->data, PEGASUS_MTU,
 			  read_bulk_callback, pegasus);
 	if ((res = usb_submit_urb(pegasus->rx_urb, GFP_KERNEL))) {
 		if (res == -ENODEV)
-- 
2.8.0.rc3

^ permalink raw reply related

* [PATCH v2 2/2] pegasus: fixes reported packet length
From: Petko Manolov @ 2016-04-27  7:35 UTC (permalink / raw)
  To: netdev; +Cc: davem, a1291762, Petko Manolov
In-Reply-To: <1461742525-2748-1-git-send-email-petkan@mip-labs.com>

According to ADM851x docs the ethernet packet is appended with 4
bytes, not 8.  Fixing this to report (hopefully) the right amount
of data.

Signed-off-by: Petko Manolov <petkan@mip-labs.com>
---
 drivers/net/usb/pegasus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
index f919e20..780c217 100644
--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -497,7 +497,7 @@ static void read_bulk_callback(struct urb *urb)
 		pkt_len = buf[count - 3] << 8;
 		pkt_len += buf[count - 4];
 		pkt_len &= 0xfff;
-		pkt_len -= 8;
+		pkt_len -= 4;
 	}
 
 	/*
-- 
2.8.0.rc3

^ permalink raw reply related

* Re: [PATCH v2 0/2] pegasus: correct buffer sizes
From: Johannes Berg @ 2016-04-27  7:46 UTC (permalink / raw)
  To: Petko Manolov, netdev; +Cc: davem, a1291762
In-Reply-To: <1461742525-2748-1-git-send-email-petkan@mip-labs.com>

On Wed, 2016-04-27 at 10:35 +0300, Petko Manolov wrote:
> 
> Nearly all Pegasus devices may append the RX status to the end of the
> received  packet.  It is the default setup for the driver.  The
> actual ethernet packet is  4 bytes shorter.  Why and when 'pkt_len -=
> 4' became 'pkt_len -= 8' is again hidden in the mists of time.  There
> might have been a good reason to do so, but multiple reads of the
> datasheet did not point me to any.
> 

Wild guess: (some) devices pass the FCS, perhaps depending on the
configuration?

johannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox