Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v3 15/19] thunderbolt: Add function to retrieve DMA device for the ring
From: Mika Westerberg @ 2017-10-02 10:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, netdev, linux-kernel
In-Reply-To: <20171002103846.64602-1-mika.westerberg@linux.intel.com>

This is needed when Thunderbolt service drivers need to DMA map memory
before it is passed down to the ring.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 include/linux/thunderbolt.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index 36925e3aec7c..7b69853188b1 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -19,6 +19,7 @@
 #include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/mod_devicetable.h>
+#include <linux/pci.h>
 #include <linux/uuid.h>
 #include <linux/workqueue.h>
 
@@ -582,4 +583,16 @@ static inline int tb_ring_tx(struct tb_ring *ring, struct ring_frame *frame)
 struct ring_frame *tb_ring_poll(struct tb_ring *ring);
 void tb_ring_poll_complete(struct tb_ring *ring);
 
+/**
+ * tb_ring_dma_device() - Return device used for DMA mapping
+ * @ring: Ring whose DMA device is retrieved
+ *
+ * Use this function when you are mapping DMA for buffers that are
+ * passed to the ring for sending/receiving.
+ */
+static inline struct device *tb_ring_dma_device(struct tb_ring *ring)
+{
+	return &ring->nhi->pdev->dev;
+}
+
 #endif /* THUNDERBOLT_H_ */
-- 
2.14.2

^ permalink raw reply related

* [PATCH v3 17/19] MAINTAINERS: Add thunderbolt.h to the Thunderbolt driver entry
From: Mika Westerberg @ 2017-10-02 10:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, netdev, linux-kernel
In-Reply-To: <20171002103846.64602-1-mika.westerberg@linux.intel.com>

The new API header (include/linux/thunderbolt.h) is maintained by the
Thunderbolt driver maintainers.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 65b0c88d5ee0..34661b5ac9ad 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13284,6 +13284,7 @@ M:	Mika Westerberg <mika.westerberg@linux.intel.com>
 M:	Yehezkel Bernat <yehezkel.bernat@intel.com>
 S:	Maintained
 F:	drivers/thunderbolt/
+F:	include/linux/thunderbolt.h
 
 THUNDERX GPIO DRIVER
 M:	David Daney <david.daney@cavium.com>
-- 
2.14.2

^ permalink raw reply related

* [PATCH v3 18/19] net: Add support for networking over Thunderbolt cable
From: Mika Westerberg @ 2017-10-02 10:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, netdev, linux-kernel
In-Reply-To: <20171002103846.64602-1-mika.westerberg@linux.intel.com>

From: Amir Levy <amir.jer.levy@intel.com>

ThunderboltIP is a protocol created by Apple to tunnel IP/ethernet
traffic over a Thunderbolt cable. The protocol consists of configuration
phase where each side sends ThunderboltIP login packets (the protocol is
determined by UUID in the XDomain packet header) over the configuration
channel. Once both sides get positive acknowledgment to their login
packet, they configure high-speed DMA path accordingly. This DMA path is
then used to transmit and receive networking traffic.

This patch creates a virtual ethernet interface the host software can
use in the same way as any other networking interface. Once the
interface is brought up successfully network packets get tunneled over
the Thunderbolt cable to the remote host and back.

The connection is terminated by sending a ThunderboltIP logout packet
over the configuration channel. We do this when the network interface is
brought down by user or the driver is unloaded.

Signed-off-by: Amir Levy <amir.jer.levy@intel.com>
Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 Documentation/admin-guide/thunderbolt.rst |   24 +
 drivers/net/Kconfig                       |   12 +
 drivers/net/Makefile                      |    3 +
 drivers/net/thunderbolt.c                 | 1362 +++++++++++++++++++++++++++++
 4 files changed, 1401 insertions(+)
 create mode 100644 drivers/net/thunderbolt.c

diff --git a/Documentation/admin-guide/thunderbolt.rst b/Documentation/admin-guide/thunderbolt.rst
index 6a4cd1f159ca..5c62d11d77e8 100644
--- a/Documentation/admin-guide/thunderbolt.rst
+++ b/Documentation/admin-guide/thunderbolt.rst
@@ -197,3 +197,27 @@ information is missing.
 
 To recover from this mode, one needs to flash a valid NVM image to the
 host host controller in the same way it is done in the previous chapter.
+
+Networking over Thunderbolt cable
+---------------------------------
+Thunderbolt technology allows software communication across two hosts
+connected by a Thunderbolt cable.
+
+It is possible to tunnel any kind of traffic over Thunderbolt link but
+currently we only support Apple ThunderboltIP protocol.
+
+If the other host is running Windows or macOS only thing you need to
+do is to connect Thunderbolt cable between the two hosts, the
+``thunderbolt-net`` is loaded automatically. If the other host is also
+Linux you should load ``thunderbolt-net`` manually on one host (it does
+not matter which one)::
+
+  # modprobe thunderbolt-net
+
+This triggers module load on the other host automatically. If the driver
+is built-in to the kernel image, there is no need to do anything.
+
+The driver will create one virtual ethernet interface per Thunderbolt
+port which are named like ``thunderbolt0`` and so on. From this point
+you can either use standard userspace tools like ``ifconfig`` to
+configure the interface or let your GUI to handle it automatically.
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index aba0d652095b..0936da592e12 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -483,6 +483,18 @@ config FUJITSU_ES
 	  This driver provides support for Extended Socket network device
           on Extended Partitioning of FUJITSU PRIMEQUEST 2000 E2 series.
 
+config THUNDERBOLT_NET
+	tristate "Networking over Thunderbolt cable"
+	depends on THUNDERBOLT && INET
+	help
+	  Select this if you want to create network between two
+	  computers over a Thunderbolt cable. The driver supports Apple
+	  ThunderboltIP protocol and allows communication with any host
+	  supporting the same protocol including Windows and macOS.
+
+	  To compile this driver a module, choose M here. The module will be
+	  called thunderbolt-net.
+
 source "drivers/net/hyperv/Kconfig"
 
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 8dff900085d6..7c8f4dd3a7c5 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -74,3 +74,6 @@ obj-$(CONFIG_HYPERV_NET) += hyperv/
 obj-$(CONFIG_NTB_NETDEV) += ntb_netdev.o
 
 obj-$(CONFIG_FUJITSU_ES) += fjes/
+
+thunderbolt-net-y += thunderbolt.o
+obj-$(CONFIG_THUNDERBOLT_NET) += thunderbolt-net.o
diff --git a/drivers/net/thunderbolt.c b/drivers/net/thunderbolt.c
new file mode 100644
index 000000000000..1a7bc0bf4598
--- /dev/null
+++ b/drivers/net/thunderbolt.c
@@ -0,0 +1,1362 @@
+/*
+ * Networking over Thunderbolt cable using Apple ThunderboltIP protocol
+ *
+ * Copyright (C) 2017, Intel Corporation
+ * Authors: Amir Levy <amir.jer.levy@intel.com>
+ *          Michael Jamet <michael.jamet@intel.com>
+ *          Mika Westerberg <mika.westerberg@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/atomic.h>
+#include <linux/highmem.h>
+#include <linux/if_vlan.h>
+#include <linux/jhash.h>
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/rtnetlink.h>
+#include <linux/sizes.h>
+#include <linux/thunderbolt.h>
+#include <linux/uuid.h>
+#include <linux/workqueue.h>
+
+#include <net/ip6_checksum.h>
+
+/* Protocol timeouts in ms */
+#define TBNET_LOGIN_DELAY	4500
+#define TBNET_LOGIN_TIMEOUT	500
+#define TBNET_LOGOUT_TIMEOUT	100
+
+#define TBNET_RING_SIZE		256
+#define TBNET_LOCAL_PATH	0xf
+#define TBNET_LOGIN_RETRIES	60
+#define TBNET_LOGOUT_RETRIES	5
+#define TBNET_MATCH_FRAGS_ID	BIT(1)
+#define TBNET_MAX_MTU		SZ_64K
+#define TBNET_FRAME_SIZE	SZ_4K
+#define TBNET_MAX_PAYLOAD_SIZE	\
+	(TBNET_FRAME_SIZE - sizeof(struct thunderbolt_ip_frame_header))
+/* Rx packets need to hold space for skb_shared_info */
+#define TBNET_RX_MAX_SIZE	\
+	(TBNET_FRAME_SIZE + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+#define TBNET_RX_PAGE_ORDER	get_order(TBNET_RX_MAX_SIZE)
+#define TBNET_RX_PAGE_SIZE	(PAGE_SIZE << TBNET_RX_PAGE_ORDER)
+
+#define TBNET_L0_PORT_NUM(route) ((route) & GENMASK(5, 0))
+
+/**
+ * struct thunderbolt_ip_frame_header - Header for each Thunderbolt frame
+ * @frame_size: size of the data with the frame
+ * @frame_index: running index on the frames
+ * @frame_id: ID of the frame to match frames to specific packet
+ * @frame_count: how many frames assembles a full packet
+ *
+ * Each data frame passed to the high-speed DMA ring has this header. If
+ * the XDomain network directory announces that %TBNET_MATCH_FRAGS_ID is
+ * supported then @frame_id is filled, otherwise it stays %0.
+ */
+struct thunderbolt_ip_frame_header {
+	u32 frame_size;
+	u16 frame_index;
+	u16 frame_id;
+	u32 frame_count;
+};
+
+enum thunderbolt_ip_frame_pdf {
+	TBIP_PDF_FRAME_START = 1,
+	TBIP_PDF_FRAME_END,
+};
+
+enum thunderbolt_ip_type {
+	TBIP_LOGIN,
+	TBIP_LOGIN_RESPONSE,
+	TBIP_LOGOUT,
+	TBIP_STATUS,
+};
+
+struct thunderbolt_ip_header {
+	u32 route_hi;
+	u32 route_lo;
+	u32 length_sn;
+	uuid_t uuid;
+	uuid_t initiator_uuid;
+	uuid_t target_uuid;
+	u32 type;
+	u32 command_id;
+};
+
+#define TBIP_HDR_LENGTH_MASK		GENMASK(5, 0)
+#define TBIP_HDR_SN_MASK		GENMASK(28, 27)
+#define TBIP_HDR_SN_SHIFT		27
+
+struct thunderbolt_ip_login {
+	struct thunderbolt_ip_header hdr;
+	u32 proto_version;
+	u32 transmit_path;
+	u32 reserved[4];
+};
+
+#define TBIP_LOGIN_PROTO_VERSION	1
+
+struct thunderbolt_ip_login_response {
+	struct thunderbolt_ip_header hdr;
+	u32 status;
+	u32 receiver_mac[2];
+	u32 receiver_mac_len;
+	u32 reserved[4];
+};
+
+struct thunderbolt_ip_logout {
+	struct thunderbolt_ip_header hdr;
+};
+
+struct thunderbolt_ip_status {
+	struct thunderbolt_ip_header hdr;
+	u32 status;
+};
+
+struct tbnet_stats {
+	u64 tx_packets;
+	u64 rx_packets;
+	u64 tx_bytes;
+	u64 rx_bytes;
+	u64 rx_errors;
+	u64 tx_errors;
+	u64 rx_length_errors;
+	u64 rx_over_errors;
+	u64 rx_crc_errors;
+	u64 rx_missed_errors;
+};
+
+struct tbnet_frame {
+	struct net_device *dev;
+	struct page *page;
+	struct ring_frame frame;
+};
+
+struct tbnet_ring {
+	struct tbnet_frame frames[TBNET_RING_SIZE];
+	unsigned int cons;
+	unsigned int prod;
+	struct tb_ring *ring;
+};
+
+/**
+ * struct tbnet - ThunderboltIP network driver private data
+ * @svc: XDomain service the driver is bound to
+ * @xd: XDomain the service blongs to
+ * @handler: ThunderboltIP configuration protocol handler
+ * @dev: Networking device
+ * @napi: NAPI structure for Rx polling
+ * @stats: Network statistics
+ * @skb: Network packet that is currently processed on Rx path
+ * @command_id: ID used for next configuration protocol packet
+ * @login_sent: ThunderboltIP login message successfully sent
+ * @login_received: ThunderboltIP login message received from the remote
+ *		    host
+ * @transmit_path: HopID the other end needs to use building the
+ *		   opposite side path.
+ * @connection_lock: Lock serializing access to @login_sent,
+ *		     @login_received and @transmit_path.
+ * @login_retries: Number of login retries currently done
+ * @login_work: Worker to send ThunderboltIP login packets
+ * @connected_work: Worker that finalizes the ThunderboltIP connection
+ *		    setup and enables DMA paths for high speed data
+ *		    transfers
+ * @rx_hdr: Copy of the currently processed Rx frame. Used when a
+ *	    network packet consists of multiple Thunderbolt frames.
+ *	    In host byte order.
+ * @rx_ring: Software ring holding Rx frames
+ * @frame_id: Frame ID use for next Tx packet
+ *            (if %TBNET_MATCH_FRAGS_ID is supported in both ends)
+ * @tx_ring: Software ring holding Tx frames
+ */
+struct tbnet {
+	const struct tb_service *svc;
+	struct tb_xdomain *xd;
+	struct tb_protocol_handler handler;
+	struct net_device *dev;
+	struct napi_struct napi;
+	struct tbnet_stats stats;
+	struct sk_buff *skb;
+	atomic_t command_id;
+	bool login_sent;
+	bool login_received;
+	u32 transmit_path;
+	struct mutex connection_lock;
+	int login_retries;
+	struct delayed_work login_work;
+	struct work_struct connected_work;
+	struct thunderbolt_ip_frame_header rx_hdr;
+	struct tbnet_ring rx_ring;
+	atomic_t frame_id;
+	struct tbnet_ring tx_ring;
+};
+
+/* Network property directory UUID: c66189ca-1cce-4195-bdb8-49592e5f5a4f */
+static const uuid_t tbnet_dir_uuid =
+	UUID_INIT(0xc66189ca, 0x1cce, 0x4195,
+		  0xbd, 0xb8, 0x49, 0x59, 0x2e, 0x5f, 0x5a, 0x4f);
+
+/* ThunderboltIP protocol UUID: 798f589e-3616-8a47-97c6-5664a920c8dd */
+static const uuid_t tbnet_svc_uuid =
+	UUID_INIT(0x798f589e, 0x3616, 0x8a47,
+		  0x97, 0xc6, 0x56, 0x64, 0xa9, 0x20, 0xc8, 0xdd);
+
+static struct tb_property_dir *tbnet_dir;
+
+static void tbnet_fill_header(struct thunderbolt_ip_header *hdr, u64 route,
+	u8 sequence, const uuid_t *initiator_uuid, const uuid_t *target_uuid,
+	enum thunderbolt_ip_type type, size_t size, u32 command_id)
+{
+	u32 length_sn;
+
+	/* Length does not include route_hi/lo and length_sn fields */
+	length_sn = (size - 3 * 4) / 4;
+	length_sn |= (sequence << TBIP_HDR_SN_SHIFT) & TBIP_HDR_SN_MASK;
+
+	hdr->route_hi = upper_32_bits(route);
+	hdr->route_lo = lower_32_bits(route);
+	hdr->length_sn = length_sn;
+	uuid_copy(&hdr->uuid, &tbnet_svc_uuid);
+	uuid_copy(&hdr->initiator_uuid, initiator_uuid);
+	uuid_copy(&hdr->target_uuid, target_uuid);
+	hdr->type = type;
+	hdr->command_id = command_id;
+}
+
+static int tbnet_login_response(struct tbnet *net, u64 route, u8 sequence,
+				u32 command_id)
+{
+	struct thunderbolt_ip_login_response reply;
+	struct tb_xdomain *xd = net->xd;
+
+	memset(&reply, 0, sizeof(reply));
+	tbnet_fill_header(&reply.hdr, route, sequence, xd->local_uuid,
+			  xd->remote_uuid, TBIP_LOGIN_RESPONSE, sizeof(reply),
+			  command_id);
+	memcpy(reply.receiver_mac, net->dev->dev_addr, ETH_ALEN);
+	reply.receiver_mac_len = ETH_ALEN;
+
+	return tb_xdomain_response(xd, &reply, sizeof(reply),
+				   TB_CFG_PKG_XDOMAIN_RESP);
+}
+
+static int tbnet_login_request(struct tbnet *net, u8 sequence)
+{
+	struct thunderbolt_ip_login_response reply;
+	struct thunderbolt_ip_login request;
+	struct tb_xdomain *xd = net->xd;
+
+	memset(&request, 0, sizeof(request));
+	tbnet_fill_header(&request.hdr, xd->route, sequence, xd->local_uuid,
+			  xd->remote_uuid, TBIP_LOGIN, sizeof(request),
+			  atomic_inc_return(&net->command_id));
+
+	request.proto_version = TBIP_LOGIN_PROTO_VERSION;
+	request.transmit_path = TBNET_LOCAL_PATH;
+
+	return tb_xdomain_request(xd, &request, sizeof(request),
+				  TB_CFG_PKG_XDOMAIN_RESP, &reply,
+				  sizeof(reply), TB_CFG_PKG_XDOMAIN_RESP,
+				  TBNET_LOGIN_TIMEOUT);
+}
+
+static int tbnet_logout_response(struct tbnet *net, u64 route, u8 sequence,
+				 u32 command_id)
+{
+	struct thunderbolt_ip_status reply;
+	struct tb_xdomain *xd = net->xd;
+
+	memset(&reply, 0, sizeof(reply));
+	tbnet_fill_header(&reply.hdr, route, sequence, xd->local_uuid,
+			  xd->remote_uuid, TBIP_STATUS, sizeof(reply),
+			  atomic_inc_return(&net->command_id));
+	return tb_xdomain_response(xd, &reply, sizeof(reply),
+				   TB_CFG_PKG_XDOMAIN_RESP);
+}
+
+static int tbnet_logout_request(struct tbnet *net)
+{
+	struct thunderbolt_ip_logout request;
+	struct thunderbolt_ip_status reply;
+	struct tb_xdomain *xd = net->xd;
+
+	memset(&request, 0, sizeof(request));
+	tbnet_fill_header(&request.hdr, xd->route, 0, xd->local_uuid,
+			  xd->remote_uuid, TBIP_LOGOUT, sizeof(request),
+			  atomic_inc_return(&net->command_id));
+
+	return tb_xdomain_request(xd, &request, sizeof(request),
+				  TB_CFG_PKG_XDOMAIN_RESP, &reply,
+				  sizeof(reply), TB_CFG_PKG_XDOMAIN_RESP,
+				  TBNET_LOGOUT_TIMEOUT);
+}
+
+static void start_login(struct tbnet *net)
+{
+	mutex_lock(&net->connection_lock);
+	net->login_sent = false;
+	net->login_received = false;
+	mutex_unlock(&net->connection_lock);
+
+	queue_delayed_work(system_long_wq, &net->login_work,
+			   msecs_to_jiffies(1000));
+}
+
+static void stop_login(struct tbnet *net)
+{
+	cancel_delayed_work_sync(&net->login_work);
+	cancel_work_sync(&net->connected_work);
+}
+
+static inline unsigned int tbnet_frame_size(const struct tbnet_frame *tf)
+{
+	return tf->frame.size ? : TBNET_FRAME_SIZE;
+}
+
+static void tbnet_free_buffers(struct tbnet_ring *ring)
+{
+	unsigned int i;
+
+	for (i = 0; i < TBNET_RING_SIZE; i++) {
+		struct device *dma_dev = tb_ring_dma_device(ring->ring);
+		struct tbnet_frame *tf = &ring->frames[i];
+		enum dma_data_direction dir;
+		unsigned int order;
+		size_t size;
+
+		if (!tf->page)
+			continue;
+
+		if (ring->ring->is_tx) {
+			dir = DMA_TO_DEVICE;
+			order = 0;
+			size = tbnet_frame_size(tf);
+		} else {
+			dir = DMA_FROM_DEVICE;
+			order = TBNET_RX_PAGE_ORDER;
+			size = TBNET_RX_PAGE_SIZE;
+		}
+
+		if (tf->frame.buffer_phy)
+			dma_unmap_page(dma_dev, tf->frame.buffer_phy, size,
+				       dir);
+
+		__free_pages(tf->page, order);
+		tf->page = NULL;
+	}
+
+	ring->cons = 0;
+	ring->prod = 0;
+}
+
+static void tbnet_tear_down(struct tbnet *net, bool send_logout)
+{
+	netif_carrier_off(net->dev);
+	netif_stop_queue(net->dev);
+
+	stop_login(net);
+
+	mutex_lock(&net->connection_lock);
+
+	if (net->login_sent && net->login_received) {
+		int retries = TBNET_LOGOUT_RETRIES;
+
+		while (send_logout && retries-- > 0) {
+			int ret = tbnet_logout_request(net);
+			if (ret != -ETIMEDOUT)
+				break;
+		}
+
+		tb_ring_stop(net->rx_ring.ring);
+		tb_ring_stop(net->tx_ring.ring);
+		tbnet_free_buffers(&net->rx_ring);
+		tbnet_free_buffers(&net->tx_ring);
+
+		if (tb_xdomain_disable_paths(net->xd))
+			netdev_warn(net->dev, "failed to disable DMA paths\n");
+	}
+
+	net->login_retries = 0;
+	net->login_sent = false;
+	net->login_received = false;
+
+	mutex_unlock(&net->connection_lock);
+}
+
+static int tbnet_handle_packet(const void *buf, size_t size, void *data)
+{
+	const struct thunderbolt_ip_login *pkg = buf;
+	struct tbnet *net = data;
+	u32 command_id;
+	int ret = 0;
+	u8 sequence;
+	u64 route;
+
+	/* Make sure the packet is for us */
+	if (size < sizeof(struct thunderbolt_ip_header))
+		return 0;
+	if (!uuid_equal(&pkg->hdr.initiator_uuid, net->xd->remote_uuid))
+		return 0;
+	if (!uuid_equal(&pkg->hdr.target_uuid, net->xd->local_uuid))
+		return 0;
+
+	route = ((u64)pkg->hdr.route_hi << 32) | pkg->hdr.route_lo;
+	route &= ~BIT_ULL(63);
+	if (route != net->xd->route)
+		return 0;
+
+	sequence = pkg->hdr.length_sn & TBIP_HDR_SN_MASK;
+	sequence >>= TBIP_HDR_SN_SHIFT;
+	command_id = pkg->hdr.command_id;
+
+	switch (pkg->hdr.type) {
+	case TBIP_LOGIN:
+		if (!netif_running(net->dev))
+			break;
+
+		ret = tbnet_login_response(net, route, sequence,
+					   pkg->hdr.command_id);
+		if (!ret) {
+			mutex_lock(&net->connection_lock);
+			net->login_received = true;
+			net->transmit_path = pkg->transmit_path;
+
+			/* If we reached the number of max retries or
+			 * previous logout, schedule another round of
+			 * login retries
+			 */
+			if (net->login_retries >= TBNET_LOGIN_RETRIES ||
+			    !net->login_sent) {
+				net->login_retries = 0;
+				queue_delayed_work(system_long_wq,
+						   &net->login_work, 0);
+			}
+			mutex_unlock(&net->connection_lock);
+
+			queue_work(system_long_wq, &net->connected_work);
+		}
+		break;
+
+	case TBIP_LOGOUT:
+		ret = tbnet_logout_response(net, route, sequence, command_id);
+		if (!ret)
+			tbnet_tear_down(net, false);
+		break;
+
+	default:
+		return 0;
+	}
+
+	if (ret)
+		netdev_warn(net->dev, "failed to send ThunderboltIP response\n");
+
+	return 1;
+}
+
+static unsigned int tbnet_available_buffers(const struct tbnet_ring *ring)
+{
+	return ring->prod - ring->cons;
+}
+
+static int tbnet_alloc_rx_buffers(struct tbnet *net, unsigned int nbuffers)
+{
+	struct tbnet_ring *ring = &net->rx_ring;
+	int ret;
+
+	while (nbuffers--) {
+		struct device *dma_dev = tb_ring_dma_device(ring->ring);
+		unsigned int index = ring->prod & (TBNET_RING_SIZE - 1);
+		struct tbnet_frame *tf = &ring->frames[index];
+		dma_addr_t dma_addr;
+
+		if (tf->page)
+			break;
+
+		/* Allocate page (order > 0) so that it can hold maximum
+		 * ThunderboltIP frame (4kB) and the additional room for
+		 * SKB shared info required by build_skb().
+		 */
+		tf->page = dev_alloc_pages(TBNET_RX_PAGE_ORDER);
+		if (!tf->page) {
+			ret = -ENOMEM;
+			goto err_free;
+		}
+
+		dma_addr = dma_map_page(dma_dev, tf->page, 0,
+					TBNET_RX_PAGE_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(dma_dev, dma_addr)) {
+			ret = -ENOMEM;
+			goto err_free;
+		}
+
+		tf->frame.buffer_phy = dma_addr;
+		tf->dev = net->dev;
+
+		tb_ring_rx(ring->ring, &tf->frame);
+
+		ring->prod++;
+	}
+
+	return 0;
+
+err_free:
+	tbnet_free_buffers(ring);
+	return ret;
+}
+
+static struct tbnet_frame *tbnet_get_tx_buffer(struct tbnet *net)
+{
+	struct tbnet_ring *ring = &net->tx_ring;
+	struct tbnet_frame *tf;
+	unsigned int index;
+
+	if (!tbnet_available_buffers(ring))
+		return NULL;
+
+	index = ring->cons++ & (TBNET_RING_SIZE - 1);
+
+	tf = &ring->frames[index];
+	tf->frame.size = 0;
+	tf->frame.buffer_phy = 0;
+
+	return tf;
+}
+
+static void tbnet_tx_callback(struct tb_ring *ring, struct ring_frame *frame,
+			      bool canceled)
+{
+	struct tbnet_frame *tf = container_of(frame, typeof(*tf), frame);
+	struct device *dma_dev = tb_ring_dma_device(ring);
+	struct tbnet *net = netdev_priv(tf->dev);
+
+	dma_unmap_page(dma_dev, tf->frame.buffer_phy, tbnet_frame_size(tf),
+		       DMA_TO_DEVICE);
+
+	/* Return buffer to the ring */
+	net->tx_ring.prod++;
+
+	if (tbnet_available_buffers(&net->tx_ring) >= TBNET_RING_SIZE / 2)
+		netif_wake_queue(net->dev);
+}
+
+static int tbnet_alloc_tx_buffers(struct tbnet *net)
+{
+	struct tbnet_ring *ring = &net->tx_ring;
+	unsigned int i;
+
+	for (i = 0; i < TBNET_RING_SIZE; i++) {
+		struct tbnet_frame *tf = &ring->frames[i];
+
+		tf->page = alloc_page(GFP_KERNEL);
+		if (!tf->page) {
+			tbnet_free_buffers(ring);
+			return -ENOMEM;
+		}
+
+		tf->dev = net->dev;
+		tf->frame.callback = tbnet_tx_callback;
+		tf->frame.sof = TBIP_PDF_FRAME_START;
+		tf->frame.eof = TBIP_PDF_FRAME_END;
+	}
+
+	ring->cons = 0;
+	ring->prod = TBNET_RING_SIZE - 1;
+
+	return 0;
+}
+
+static void tbnet_connected_work(struct work_struct *work)
+{
+	struct tbnet *net = container_of(work, typeof(*net), connected_work);
+	bool connected;
+	int ret;
+
+	if (netif_carrier_ok(net->dev))
+		return;
+
+	mutex_lock(&net->connection_lock);
+	connected = net->login_sent && net->login_received;
+	mutex_unlock(&net->connection_lock);
+
+	if (!connected)
+		return;
+
+	/* Both logins successful so enable the high-speed DMA paths and
+	 * start the network device queue.
+	 */
+	ret = tb_xdomain_enable_paths(net->xd, TBNET_LOCAL_PATH,
+				      net->rx_ring.ring->hop,
+				      net->transmit_path,
+				      net->tx_ring.ring->hop);
+	if (ret) {
+		netdev_err(net->dev, "failed to enable DMA paths\n");
+		return;
+	}
+
+	tb_ring_start(net->tx_ring.ring);
+	tb_ring_start(net->rx_ring.ring);
+
+	ret = tbnet_alloc_rx_buffers(net, TBNET_RING_SIZE);
+	if (ret)
+		goto err_stop_rings;
+
+	ret = tbnet_alloc_tx_buffers(net);
+	if (ret)
+		goto err_free_rx_buffers;
+
+	netif_carrier_on(net->dev);
+	netif_start_queue(net->dev);
+	return;
+
+err_free_rx_buffers:
+	tbnet_free_buffers(&net->rx_ring);
+err_stop_rings:
+	tb_ring_stop(net->rx_ring.ring);
+	tb_ring_stop(net->tx_ring.ring);
+}
+
+static void tbnet_login_work(struct work_struct *work)
+{
+	struct tbnet *net = container_of(work, typeof(*net), login_work.work);
+	unsigned long delay = msecs_to_jiffies(TBNET_LOGIN_DELAY);
+	int ret;
+
+	if (netif_carrier_ok(net->dev))
+		return;
+
+	ret = tbnet_login_request(net, net->login_retries % 4);
+	if (ret) {
+		if (net->login_retries++ < TBNET_LOGIN_RETRIES) {
+			queue_delayed_work(system_long_wq, &net->login_work,
+					   delay);
+		} else {
+			netdev_info(net->dev, "ThunderboltIP login timed out\n");
+		}
+	} else {
+		net->login_retries = 0;
+
+		mutex_lock(&net->connection_lock);
+		net->login_sent = true;
+		mutex_unlock(&net->connection_lock);
+
+		queue_work(system_long_wq, &net->connected_work);
+	}
+}
+
+static bool tbnet_check_frame(struct tbnet *net, const struct tbnet_frame *tf,
+			      const struct thunderbolt_ip_frame_header *hdr)
+{
+	u32 frame_id, frame_count, frame_size, frame_index;
+	unsigned int size;
+
+	if (tf->frame.flags & RING_DESC_CRC_ERROR) {
+		net->stats.rx_crc_errors++;
+		return false;
+	} else if (tf->frame.flags & RING_DESC_BUFFER_OVERRUN) {
+		net->stats.rx_over_errors++;
+		return false;
+	}
+
+	/* Should be greater than just header i.e. contains data */
+	size = tbnet_frame_size(tf);
+	if (size <= sizeof(*hdr)) {
+		net->stats.rx_length_errors++;
+		return false;
+	}
+
+	frame_count = le32_to_cpu(hdr->frame_count);
+	frame_size = le32_to_cpu(hdr->frame_size);
+	frame_index = le16_to_cpu(hdr->frame_index);
+	frame_id = le16_to_cpu(hdr->frame_id);
+
+	if ((frame_size > size - sizeof(*hdr)) || !frame_size) {
+		net->stats.rx_length_errors++;
+		return false;
+	}
+
+	/* In case we're in the middle of packet, validate the frame
+	 * header based on first fragment of the packet.
+	 */
+	if (net->skb && net->rx_hdr.frame_count) {
+		/* Check the frame count fits the count field */
+		if (frame_count != net->rx_hdr.frame_count) {
+			net->stats.rx_length_errors++;
+			return false;
+		}
+
+		/* Check the frame identifiers are incremented correctly,
+		 * and id is matching.
+		 */
+		if (frame_index != net->rx_hdr.frame_index + 1 ||
+		    frame_id != net->rx_hdr.frame_id) {
+			net->stats.rx_missed_errors++;
+			return false;
+		}
+
+		if (net->skb->len + frame_size > TBNET_MAX_MTU) {
+			net->stats.rx_length_errors++;
+			return false;
+		}
+
+		return true;
+	}
+
+	/* Start of packet, validate the frame header */
+	if (frame_count == 0 || frame_count > TBNET_RING_SIZE / 4) {
+		net->stats.rx_length_errors++;
+		return false;
+	}
+	if (frame_index != 0) {
+		net->stats.rx_missed_errors++;
+		return false;
+	}
+
+	return true;
+}
+
+static int tbnet_poll(struct napi_struct *napi, int budget)
+{
+	struct tbnet *net = container_of(napi, struct tbnet, napi);
+	unsigned int cleaned_count = tbnet_available_buffers(&net->rx_ring);
+	struct device *dma_dev = tb_ring_dma_device(net->rx_ring.ring);
+	unsigned int rx_packets = 0;
+
+	while (rx_packets < budget) {
+		const struct thunderbolt_ip_frame_header *hdr;
+		unsigned int hdr_size = sizeof(*hdr);
+		struct sk_buff *skb = NULL;
+		struct ring_frame *frame;
+		struct tbnet_frame *tf;
+		struct page *page;
+		bool last = true;
+		u32 frame_size;
+
+		/* Return some buffers to hardware, one at a time is too
+		 * slow so allocate MAX_SKB_FRAGS buffers at the same
+		 * time.
+		 */
+		if (cleaned_count >= MAX_SKB_FRAGS) {
+			tbnet_alloc_rx_buffers(net, cleaned_count);
+			cleaned_count = 0;
+		}
+
+		frame = tb_ring_poll(net->rx_ring.ring);
+		if (!frame)
+			break;
+
+		dma_unmap_page(dma_dev, frame->buffer_phy,
+			       TBNET_RX_PAGE_SIZE, DMA_FROM_DEVICE);
+
+		tf = container_of(frame, typeof(*tf), frame);
+
+		page = tf->page;
+		tf->page = NULL;
+		net->rx_ring.cons++;
+		cleaned_count++;
+
+		hdr = page_address(page);
+		if (!tbnet_check_frame(net, tf, hdr)) {
+			__free_pages(page, TBNET_RX_PAGE_ORDER);
+			dev_kfree_skb_any(net->skb);
+			net->skb = NULL;
+			continue;
+		}
+
+		frame_size = le32_to_cpu(hdr->frame_size);
+
+		skb = net->skb;
+		if (!skb) {
+			skb = build_skb(page_address(page),
+					TBNET_RX_PAGE_SIZE);
+			if (!skb) {
+				__free_pages(page, TBNET_RX_PAGE_ORDER);
+				net->stats.rx_errors++;
+				break;
+			}
+
+			skb_reserve(skb, hdr_size);
+			skb_put(skb, frame_size);
+
+			net->skb = skb;
+		} else {
+			skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
+					page, hdr_size, frame_size,
+					TBNET_RX_PAGE_SIZE - hdr_size);
+		}
+
+		net->rx_hdr.frame_size = frame_size;
+		net->rx_hdr.frame_count = le32_to_cpu(hdr->frame_count);
+		net->rx_hdr.frame_index = le16_to_cpu(hdr->frame_index);
+		net->rx_hdr.frame_id = le16_to_cpu(hdr->frame_id);
+		last = net->rx_hdr.frame_index == net->rx_hdr.frame_count - 1;
+
+		rx_packets++;
+		net->stats.rx_bytes += frame_size;
+
+		if (last) {
+			skb->protocol = eth_type_trans(skb, net->dev);
+			napi_gro_receive(&net->napi, skb);
+			net->skb = NULL;
+		}
+	}
+
+	net->stats.rx_packets += rx_packets;
+
+	if (cleaned_count)
+		tbnet_alloc_rx_buffers(net, cleaned_count);
+
+	if (rx_packets >= budget)
+		return budget;
+
+	napi_complete_done(napi, rx_packets);
+	/* Re-enable the ring interrupt */
+	tb_ring_poll_complete(net->rx_ring.ring);
+
+	return rx_packets;
+}
+
+static void tbnet_start_poll(void *data)
+{
+	struct tbnet *net = data;
+
+	napi_schedule(&net->napi);
+}
+
+static int tbnet_open(struct net_device *dev)
+{
+	struct tbnet *net = netdev_priv(dev);
+	struct tb_xdomain *xd = net->xd;
+	u16 sof_mask, eof_mask;
+	struct tb_ring *ring;
+
+	netif_carrier_off(dev);
+
+	ring = tb_ring_alloc_tx(xd->tb->nhi, -1, TBNET_RING_SIZE,
+				RING_FLAG_FRAME);
+	if (!ring) {
+		netdev_err(dev, "failed to allocate Tx ring\n");
+		return -ENOMEM;
+	}
+	net->tx_ring.ring = ring;
+
+	sof_mask = BIT(TBIP_PDF_FRAME_START);
+	eof_mask = BIT(TBIP_PDF_FRAME_END);
+
+	ring = tb_ring_alloc_rx(xd->tb->nhi, -1, TBNET_RING_SIZE,
+				RING_FLAG_FRAME | RING_FLAG_E2E, sof_mask,
+				eof_mask, tbnet_start_poll, net);
+	if (!ring) {
+		netdev_err(dev, "failed to allocate Rx ring\n");
+		tb_ring_free(net->tx_ring.ring);
+		net->tx_ring.ring = NULL;
+		return -ENOMEM;
+	}
+	net->rx_ring.ring = ring;
+
+	napi_enable(&net->napi);
+	start_login(net);
+
+	return 0;
+}
+
+static int tbnet_stop(struct net_device *dev)
+{
+	struct tbnet *net = netdev_priv(dev);
+
+	napi_disable(&net->napi);
+
+	tbnet_tear_down(net, true);
+
+	tb_ring_free(net->rx_ring.ring);
+	net->rx_ring.ring = NULL;
+	tb_ring_free(net->tx_ring.ring);
+	net->tx_ring.ring = NULL;
+
+	return 0;
+}
+
+static bool tbnet_xmit_map(struct device *dma_dev, struct tbnet_frame *tf)
+{
+	dma_addr_t dma_addr;
+
+	dma_addr = dma_map_page(dma_dev, tf->page, 0, tbnet_frame_size(tf),
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dma_dev, dma_addr))
+		return false;
+
+	tf->frame.buffer_phy = dma_addr;
+	return true;
+}
+
+static bool tbnet_xmit_csum_and_map(struct tbnet *net, struct sk_buff *skb,
+	struct tbnet_frame **frames, u32 frame_count)
+{
+	struct thunderbolt_ip_frame_header *hdr = page_address(frames[0]->page);
+	struct device *dma_dev = tb_ring_dma_device(net->tx_ring.ring);
+	__wsum wsum = htonl(skb->len - skb_transport_offset(skb));
+	unsigned int i, len, offset = skb_transport_offset(skb);
+	__be16 protocol = skb->protocol;
+	void *data = skb->data;
+	void *dest = hdr + 1;
+	__sum16 *tucso;
+
+	if (skb->ip_summed != CHECKSUM_PARTIAL) {
+		/* No need to calculate checksum so we just update the
+		 * total frame count and map the frames for DMA.
+		 */
+		for (i = 0; i < frame_count; i++) {
+			hdr = page_address(frames[i]->page);
+			hdr->frame_count = cpu_to_le32(frame_count);
+			if (!tbnet_xmit_map(dma_dev, frames[i]))
+				goto err_unmap;
+		}
+
+		return true;
+	}
+
+	if (protocol == htons(ETH_P_8021Q)) {
+		struct vlan_hdr *vhdr, vh;
+
+		vhdr = skb_header_pointer(skb, ETH_HLEN, sizeof(vh), &vh);
+		if (!vhdr)
+			return false;
+
+		protocol = vhdr->h_vlan_encapsulated_proto;
+	}
+
+	/* Data points on the beginning of packet.
+	 * Check is the checksum absolute place in the packet.
+	 * ipcso will update IP checksum.
+	 * tucso will update TCP/UPD checksum.
+	 */
+	if (protocol == htons(ETH_P_IP)) {
+		__sum16 *ipcso = dest + ((void *)&(ip_hdr(skb)->check) - data);
+
+		*ipcso = 0;
+		*ipcso = ip_fast_csum(dest + skb_network_offset(skb),
+				      ip_hdr(skb)->ihl);
+
+		if (ip_hdr(skb)->protocol == IPPROTO_TCP)
+			tucso = dest + ((void *)&(tcp_hdr(skb)->check) - data);
+		else if (ip_hdr(skb)->protocol == IPPROTO_UDP)
+			tucso = dest + ((void *)&(udp_hdr(skb)->check) - data);
+		else
+			return false;
+
+		*tucso = ~csum_tcpudp_magic(ip_hdr(skb)->saddr,
+					    ip_hdr(skb)->daddr, 0,
+					    ip_hdr(skb)->protocol, 0);
+	} else if (skb_is_gso_v6(skb)) {
+		tucso = dest + ((void *)&(tcp_hdr(skb)->check) - data);
+		*tucso = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+					  &ipv6_hdr(skb)->daddr, 0,
+					  IPPROTO_TCP, 0);
+		return false;
+	} else if (protocol == htons(ETH_P_IPV6)) {
+		tucso = dest + skb_checksum_start_offset(skb) + skb->csum_offset;
+		*tucso = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+					  &ipv6_hdr(skb)->daddr, 0,
+					  ipv6_hdr(skb)->nexthdr, 0);
+	} else {
+		return false;
+	}
+
+	/* First frame was headers, rest of the frames contain data.
+	 * Calculate checksum over each frame.
+	 */
+	for (i = 0; i < frame_count; i++) {
+		hdr = page_address(frames[i]->page);
+		dest = (void *)(hdr + 1) + offset;
+		len = le32_to_cpu(hdr->frame_size) - offset;
+		wsum = csum_partial(dest, len, wsum);
+		hdr->frame_count = cpu_to_le32(frame_count);
+
+		offset = 0;
+	}
+
+	*tucso = csum_fold(wsum);
+
+	/* Checksum is finally calculated and we don't touch the memory
+	 * anymore, so DMA map the frames now.
+	 */
+	for (i = 0; i < frame_count; i++) {
+		if (!tbnet_xmit_map(dma_dev, frames[i]))
+			goto err_unmap;
+	}
+
+	return true;
+
+err_unmap:
+	while (i--)
+		dma_unmap_page(dma_dev, frames[i]->frame.buffer_phy,
+			       tbnet_frame_size(frames[i]), DMA_TO_DEVICE);
+
+	return false;
+}
+
+static void *tbnet_kmap_frag(struct sk_buff *skb, unsigned int frag_num,
+			     unsigned int *len)
+{
+	const skb_frag_t *frag = &skb_shinfo(skb)->frags[frag_num];
+
+	*len = skb_frag_size(frag);
+	return kmap_atomic(skb_frag_page(frag)) + frag->page_offset;
+}
+
+static netdev_tx_t tbnet_start_xmit(struct sk_buff *skb,
+				    struct net_device *dev)
+{
+	struct tbnet *net = netdev_priv(dev);
+	struct tbnet_frame *frames[MAX_SKB_FRAGS];
+	u16 frame_id = atomic_read(&net->frame_id);
+	struct thunderbolt_ip_frame_header *hdr;
+	unsigned int len = skb_headlen(skb);
+	unsigned int data_len = skb->len;
+	unsigned int nframes, i;
+	unsigned int frag = 0;
+	void *src = skb->data;
+	u32 frame_index = 0;
+	bool unmap = false;
+	void *dest;
+
+	nframes = DIV_ROUND_UP(data_len, TBNET_MAX_PAYLOAD_SIZE);
+	if (tbnet_available_buffers(&net->tx_ring) < nframes) {
+		netif_stop_queue(net->dev);
+		return NETDEV_TX_BUSY;
+	}
+
+	frames[frame_index] = tbnet_get_tx_buffer(net);
+	if (!frames[frame_index])
+		goto err_drop;
+
+	hdr = page_address(frames[frame_index]->page);
+	dest = hdr + 1;
+
+	/* If overall packet is bigger than the frame data size */
+	while (data_len > TBNET_MAX_PAYLOAD_SIZE) {
+		unsigned int size_left = TBNET_MAX_PAYLOAD_SIZE;
+
+		hdr->frame_size = cpu_to_le32(TBNET_MAX_PAYLOAD_SIZE);
+		hdr->frame_index = cpu_to_le16(frame_index);
+		hdr->frame_id = cpu_to_le16(frame_id);
+
+		do {
+			if (len > size_left) {
+				/* Copy data onto Tx buffer data with
+				 * full frame size then break and go to
+				 * next frame
+				 */
+				memcpy(dest, src, size_left);
+				len -= size_left;
+				dest += size_left;
+				src += size_left;
+				break;
+			}
+
+			memcpy(dest, src, len);
+			size_left -= len;
+			dest += len;
+
+			if (unmap) {
+				kunmap_atomic(src);
+				unmap = false;
+			}
+
+			/* Ensure all fragments have been processed */
+			if (frag < skb_shinfo(skb)->nr_frags) {
+				/* Map and then unmap quickly */
+				src = tbnet_kmap_frag(skb, frag++, &len);
+				unmap = true;
+			} else if (unlikely(size_left > 0)) {
+				goto err_drop;
+			}
+		} while (size_left > 0);
+
+		data_len -= TBNET_MAX_PAYLOAD_SIZE;
+		frame_index++;
+
+		frames[frame_index] = tbnet_get_tx_buffer(net);
+		if (!frames[frame_index])
+			goto err_drop;
+
+		hdr = page_address(frames[frame_index]->page);
+		dest = hdr + 1;
+	}
+
+	hdr->frame_size = cpu_to_le32(data_len);
+	hdr->frame_index = cpu_to_le16(frame_index);
+	hdr->frame_id = cpu_to_le16(frame_id);
+
+	frames[frame_index]->frame.size = data_len + sizeof(*hdr);
+
+	/* In case the remaining data_len is smaller than a frame */
+	while (len < data_len) {
+		memcpy(dest, src, len);
+		data_len -= len;
+		dest += len;
+
+		if (unmap) {
+			kunmap_atomic(src);
+			unmap = false;
+		}
+
+		if (frag < skb_shinfo(skb)->nr_frags) {
+			src = tbnet_kmap_frag(skb, frag++, &len);
+			unmap = true;
+		} else if (unlikely(data_len > 0)) {
+			goto err_drop;
+		}
+	}
+
+	memcpy(dest, src, data_len);
+
+	if (unmap)
+		kunmap_atomic(src);
+
+	if (!tbnet_xmit_csum_and_map(net, skb, frames, frame_index + 1))
+		goto err_drop;
+
+	for (i = 0; i < frame_index + 1; i++)
+		tb_ring_tx(net->tx_ring.ring, &frames[i]->frame);
+
+	if (net->svc->prtcstns & TBNET_MATCH_FRAGS_ID)
+		atomic_inc(&net->frame_id);
+
+	net->stats.tx_packets++;
+	net->stats.tx_bytes += skb->len;
+
+	dev_consume_skb_any(skb);
+
+	return NETDEV_TX_OK;
+
+err_drop:
+	/* We can re-use the buffers */
+	net->tx_ring.cons -= frame_index;
+
+	dev_kfree_skb_any(skb);
+	net->stats.tx_errors++;
+
+	return NETDEV_TX_OK;
+}
+
+static void tbnet_get_stats64(struct net_device *dev,
+			      struct rtnl_link_stats64 *stats)
+{
+	struct tbnet *net = netdev_priv(dev);
+
+	stats->tx_packets = net->stats.tx_packets;
+	stats->rx_packets = net->stats.rx_packets;
+	stats->tx_bytes = net->stats.tx_bytes;
+	stats->rx_bytes = net->stats.rx_bytes;
+	stats->rx_errors = net->stats.rx_errors + net->stats.rx_length_errors +
+		net->stats.rx_over_errors + net->stats.rx_crc_errors +
+		net->stats.rx_missed_errors;
+	stats->tx_errors = net->stats.tx_errors;
+	stats->rx_length_errors = net->stats.rx_length_errors;
+	stats->rx_over_errors = net->stats.rx_over_errors;
+	stats->rx_crc_errors = net->stats.rx_crc_errors;
+	stats->rx_missed_errors = net->stats.rx_missed_errors;
+}
+
+static const struct net_device_ops tbnet_netdev_ops = {
+	.ndo_open = tbnet_open,
+	.ndo_stop = tbnet_stop,
+	.ndo_start_xmit = tbnet_start_xmit,
+	.ndo_get_stats64 = tbnet_get_stats64,
+};
+
+static void tbnet_generate_mac(struct net_device *dev)
+{
+	const struct tbnet *net = netdev_priv(dev);
+	const struct tb_xdomain *xd = net->xd;
+	u8 phy_port;
+	u32 hash;
+
+	phy_port = tb_phy_port_from_link(TBNET_L0_PORT_NUM(xd->route));
+
+	/* Unicast and locally administered MAC */
+	dev->dev_addr[0] = phy_port << 4 | 0x02;
+	hash = jhash2((u32 *)xd->local_uuid, 4, 0);
+	memcpy(dev->dev_addr + 1, &hash, sizeof(hash));
+	hash = jhash2((u32 *)xd->local_uuid, 4, hash);
+	dev->dev_addr[5] = hash & 0xff;
+}
+
+static int tbnet_probe(struct tb_service *svc, const struct tb_service_id *id)
+{
+	struct tb_xdomain *xd = tb_service_parent(svc);
+	struct net_device *dev;
+	struct tbnet *net;
+	int ret;
+
+	dev = alloc_etherdev(sizeof(*net));
+	if (!dev)
+		return -ENOMEM;
+
+	SET_NETDEV_DEV(dev, &svc->dev);
+
+	net = netdev_priv(dev);
+	INIT_DELAYED_WORK(&net->login_work, tbnet_login_work);
+	INIT_WORK(&net->connected_work, tbnet_connected_work);
+	mutex_init(&net->connection_lock);
+	atomic_set(&net->command_id, 0);
+	atomic_set(&net->frame_id, 0);
+	net->svc = svc;
+	net->dev = dev;
+	net->xd = xd;
+
+	tbnet_generate_mac(dev);
+
+	strcpy(dev->name, "thunderbolt%d");
+	dev->netdev_ops = &tbnet_netdev_ops;
+
+	/* ThunderboltIP takes advantage of TSO packets but instead of
+	 * segmenting them we just split the packet into Thunderbolt
+	 * frames (maximum payload size of each frame is 4084 bytes) and
+	 * calculate checksum over the whole packet here.
+	 *
+	 * The receiving side does the opposite if the host OS supports
+	 * LRO, otherwise it needs to split the large packet into MTU
+	 * sized smaller packets.
+	 *
+	 * In order to receive large packets from the networking stack,
+	 * we need to announce support for most of the offloading
+	 * features here.
+	 */
+	dev->hw_features = NETIF_F_SG | NETIF_F_ALL_TSO | NETIF_F_GRO |
+			   NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
+	dev->features = dev->hw_features | NETIF_F_HIGHDMA;
+	dev->hard_header_len += sizeof(struct thunderbolt_ip_frame_header);
+
+	netif_napi_add(dev, &net->napi, tbnet_poll, NAPI_POLL_WEIGHT);
+
+	/* MTU range: 68 - 65522 */
+	dev->min_mtu = ETH_MIN_MTU;
+	dev->max_mtu = TBNET_MAX_MTU - ETH_HLEN;
+
+	net->handler.uuid = &tbnet_svc_uuid;
+	net->handler.callback = tbnet_handle_packet,
+	net->handler.data = net;
+	tb_register_protocol_handler(&net->handler);
+
+	tb_service_set_drvdata(svc, net);
+
+	ret = register_netdev(dev);
+	if (ret) {
+		tb_unregister_protocol_handler(&net->handler);
+		free_netdev(dev);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void tbnet_remove(struct tb_service *svc)
+{
+	struct tbnet *net = tb_service_get_drvdata(svc);
+
+	unregister_netdev(net->dev);
+	tb_unregister_protocol_handler(&net->handler);
+	free_netdev(net->dev);
+}
+
+static void tbnet_shutdown(struct tb_service *svc)
+{
+	tbnet_tear_down(tb_service_get_drvdata(svc), true);
+}
+
+static int __maybe_unused tbnet_suspend(struct device *dev)
+{
+	struct tb_service *svc = tb_to_service(dev);
+	struct tbnet *net = tb_service_get_drvdata(svc);
+
+	stop_login(net);
+	if (netif_running(net->dev)) {
+		netif_device_detach(net->dev);
+		tb_ring_stop(net->rx_ring.ring);
+		tb_ring_stop(net->tx_ring.ring);
+		tbnet_free_buffers(&net->rx_ring);
+		tbnet_free_buffers(&net->tx_ring);
+	}
+
+	return 0;
+}
+
+static int __maybe_unused tbnet_resume(struct device *dev)
+{
+	struct tb_service *svc = tb_to_service(dev);
+	struct tbnet *net = tb_service_get_drvdata(svc);
+
+	netif_carrier_off(net->dev);
+	if (netif_running(net->dev)) {
+		netif_device_attach(net->dev);
+		start_login(net);
+	}
+
+	return 0;
+}
+
+static const struct dev_pm_ops tbnet_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(tbnet_suspend, tbnet_resume)
+};
+
+static const struct tb_service_id tbnet_ids[] = {
+	{ TB_SERVICE("network", 1) },
+	{ },
+};
+MODULE_DEVICE_TABLE(tbsvc, tbnet_ids);
+
+static struct tb_service_driver tbnet_driver = {
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = "thunderbolt-net",
+		.pm = &tbnet_pm_ops,
+	},
+	.probe = tbnet_probe,
+	.remove = tbnet_remove,
+	.shutdown = tbnet_shutdown,
+	.id_table = tbnet_ids,
+};
+
+static int __init tbnet_init(void)
+{
+	int ret;
+
+	tbnet_dir = tb_property_create_dir(&tbnet_dir_uuid);
+	if (!tbnet_dir)
+		return -ENOMEM;
+
+	tb_property_add_immediate(tbnet_dir, "prtcid", 1);
+	tb_property_add_immediate(tbnet_dir, "prtcvers", 1);
+	tb_property_add_immediate(tbnet_dir, "prtcrevs", 1);
+	tb_property_add_immediate(tbnet_dir, "prtcstns",
+				  TBNET_MATCH_FRAGS_ID);
+
+	ret = tb_register_property_dir("network", tbnet_dir);
+	if (ret) {
+		tb_property_free_dir(tbnet_dir);
+		return ret;
+	}
+
+	return tb_register_service_driver(&tbnet_driver);
+}
+module_init(tbnet_init);
+
+static void __exit tbnet_exit(void)
+{
+	tb_unregister_service_driver(&tbnet_driver);
+	tb_unregister_property_dir("network", tbnet_dir);
+	tb_property_free_dir(tbnet_dir);
+}
+module_exit(tbnet_exit);
+
+MODULE_AUTHOR("Amir Levy <amir.jer.levy@intel.com>");
+MODULE_AUTHOR("Michael Jamet <michael.jamet@intel.com>");
+MODULE_AUTHOR("Mika Westerberg <mika.westerberg@linux.intel.com>");
+MODULE_DESCRIPTION("Thunderbolt network driver");
+MODULE_LICENSE("GPL v2");
-- 
2.14.2

^ permalink raw reply related

* [PATCH v3 19/19] MAINTAINERS: Add entry for Thunderbolt network driver
From: Mika Westerberg @ 2017-10-02 10:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, netdev, linux-kernel
In-Reply-To: <20171002103846.64602-1-mika.westerberg@linux.intel.com>

I will be maintaining the Thunderbolt network driver along with Michael
and Yehezkel.

Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 34661b5ac9ad..745527d5e326 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13286,6 +13286,14 @@ S:	Maintained
 F:	drivers/thunderbolt/
 F:	include/linux/thunderbolt.h
 
+THUNDERBOLT NETWORK DRIVER
+M:	Michael Jamet <michael.jamet@intel.com>
+M:	Mika Westerberg <mika.westerberg@linux.intel.com>
+M:	Yehezkel Bernat <yehezkel.bernat@intel.com>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/thunderbolt.c
+
 THUNDERX GPIO DRIVER
 M:	David Daney <david.daney@cavium.com>
 S:	Maintained
-- 
2.14.2

^ permalink raw reply related

* [PATCH v3 16/19] thunderbolt: Allocate ring HopID automatically if requested
From: Mika Westerberg @ 2017-10-02 10:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, netdev, linux-kernel
In-Reply-To: <20171002103846.64602-1-mika.westerberg@linux.intel.com>

Thunderbolt services should not care which HopID (ring) they use for
sending and receiving packets over the high-speed DMA path, so make
tb_ring_alloc_rx() and tb_ring_alloc_tx() accept negative HopID. This
means that the NHI will allocate next available HopID for the caller
automatically.

These HopIDs will be allocated from the range which is not reserved for
the Thunderbolt protocol (8 .. hop_count - 1).

The allocated HopID can be retrieved from ring->hop field after the ring
has been allocated successfully if needed.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/thunderbolt/nhi.c | 78 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 60 insertions(+), 18 deletions(-)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index af0a80ddf594..0e79eebfcbb7 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -26,6 +26,8 @@
  * use this ring for anything else.
  */
 #define RING_E2E_UNUSED_HOPID	2
+/* HopIDs 0-7 are reserved by the Thunderbolt protocol */
+#define RING_FIRST_USABLE_HOPID	8
 
 /*
  * Minimal number of vectors when we use MSI-X. Two for control channel
@@ -411,6 +413,62 @@ static void ring_release_msix(struct tb_ring *ring)
 	ring->irq = 0;
 }
 
+static int nhi_alloc_hop(struct tb_nhi *nhi, struct tb_ring *ring)
+{
+	int ret = 0;
+
+	spin_lock_irq(&nhi->lock);
+
+	if (ring->hop < 0) {
+		unsigned int i;
+
+		/*
+		 * Automatically allocate HopID from the non-reserved
+		 * range 8 .. hop_count - 1.
+		 */
+		for (i = RING_FIRST_USABLE_HOPID; i < nhi->hop_count; i++) {
+			if (ring->is_tx) {
+				if (!nhi->tx_rings[i]) {
+					ring->hop = i;
+					break;
+				}
+			} else {
+				if (!nhi->rx_rings[i]) {
+					ring->hop = i;
+					break;
+				}
+			}
+		}
+	}
+
+	if (ring->hop < 0 || ring->hop >= nhi->hop_count) {
+		dev_warn(&nhi->pdev->dev, "invalid hop: %d\n", ring->hop);
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+	if (ring->is_tx && nhi->tx_rings[ring->hop]) {
+		dev_warn(&nhi->pdev->dev, "TX hop %d already allocated\n",
+			 ring->hop);
+		ret = -EBUSY;
+		goto err_unlock;
+	} else if (!ring->is_tx && nhi->rx_rings[ring->hop]) {
+		dev_warn(&nhi->pdev->dev, "RX hop %d already allocated\n",
+			 ring->hop);
+		ret = -EBUSY;
+		goto err_unlock;
+	}
+
+	if (ring->is_tx)
+		nhi->tx_rings[ring->hop] = ring;
+	else
+		nhi->rx_rings[ring->hop] = ring;
+
+err_unlock:
+	spin_unlock_irq(&nhi->lock);
+
+	return ret;
+}
+
 static struct tb_ring *tb_ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
 				     bool transmit, unsigned int flags,
 				     u16 sof_mask, u16 eof_mask,
@@ -456,28 +514,12 @@ static struct tb_ring *tb_ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
 	if (ring_request_msix(ring, flags & RING_FLAG_NO_SUSPEND))
 		goto err_free_descs;
 
-	spin_lock_irq(&nhi->lock);
-	if (hop >= nhi->hop_count) {
-		dev_WARN(&nhi->pdev->dev, "invalid hop: %d\n", hop);
+	if (nhi_alloc_hop(nhi, ring))
 		goto err_release_msix;
-	}
-	if (transmit && nhi->tx_rings[hop]) {
-		dev_WARN(&nhi->pdev->dev, "TX hop %d already allocated\n", hop);
-		goto err_release_msix;
-	} else if (!transmit && nhi->rx_rings[hop]) {
-		dev_WARN(&nhi->pdev->dev, "RX hop %d already allocated\n", hop);
-		goto err_release_msix;
-	}
-	if (transmit)
-		nhi->tx_rings[hop] = ring;
-	else
-		nhi->rx_rings[hop] = ring;
-	spin_unlock_irq(&nhi->lock);
 
 	return ring;
 
 err_release_msix:
-	spin_unlock_irq(&nhi->lock);
 	ring_release_msix(ring);
 err_free_descs:
 	dma_free_coherent(&ring->nhi->pdev->dev,
@@ -506,7 +548,7 @@ EXPORT_SYMBOL_GPL(tb_ring_alloc_tx);
 /**
  * tb_ring_alloc_rx() - Allocate DMA ring for receive
  * @nhi: Pointer to the NHI the ring is to be allocated
- * @hop: HopID (ring) to allocate
+ * @hop: HopID (ring) to allocate. Pass %-1 for automatic allocation.
  * @size: Number of entries in the ring
  * @flags: Flags for the ring
  * @sof_mask: Mask of PDF values that start a frame
-- 
2.14.2

^ permalink raw reply related

* v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 10:49 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-arm-kernel, syzkaller
  Cc: David S. Miller, Willem de Bruijn, Eric Dumazet

Hi all,

I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
skb_copy_and_csum_bits().

I've uploaded a copy of the splat, my config, and (full) Syzkaller log
to my kernel.org web space [1]. I haven't had the opportunity to
reproduce this yet. 

This isn't a pure v4.14-rc2, as I have a not-yet-upstream fix [2]
applied to avoid a userfaultfd bug. However, per the Syzkaller log, the
userfaultfd syscall wasn't invoked, so I don't believe that should
matter.

Thanks,
Mark.

[1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skbuff-bug/
[2] https://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com

------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:2626!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc2-00001-gd7ad33d #115
Hardware name: linux,dummy-virt (DT)
task: ffff80003a901a80 task.stack: ffff80003a908000
PC is at skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
LR is at skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
pc : [<ffff200009e03214>] lr : [<ffff200009e03214>] pstate: 00000145
sp : ffff80003efd7b50
x29: ffff80003efd7b50 x28: 000000000000003c 
x27: 00000000000001e8 x26: ffff80003a901a90 
x25: 000000000000003c x24: dfff200000000000 
x23: ffff800035723a80 x22: 000000000000003c 
x21: 0000000000000000 x20: 0000000000000000 
x19: 0000000000003a6d x18: ffff20000da58140 
x17: 0000000000000000 x16: 0000000000000001 
x15: ffff20000e1485a0 x14: ffff2000082f8980 
x13: ffff200009fc73d0 x12: ffff200009fc707c 
x11: 1ffff00002c2a3fc x10: ffff100002c2a3fc 
x9 : dfff200000000000 x8 : 07030301a8ff1127 
x7 : edff11270a080204 x6 : ffff800016151fe8 
x5 : ffff100002c2a3fd x4 : 000000000000000c 
x3 : 0000000000000030 x2 : 1ffff00006ae47a1 
x1 : 01f6cee936b5bc00 x0 : 0000000000000000 
Process swapper/3 (pid: 0, stack limit = 0xffff80003a908000)
Call trace:
Exception stack(0xffff80003efd7a10 to 0xffff80003efd7b50)
7a00:                                   0000000000000000 01f6cee936b5bc00
7a20: 1ffff00006ae47a1 0000000000000030 000000000000000c ffff100002c2a3fd
7a40: ffff800016151fe8 edff11270a080204 07030301a8ff1127 dfff200000000000
7a60: ffff100002c2a3fc 1ffff00002c2a3fc ffff200009fc707c ffff200009fc73d0
7a80: ffff2000082f8980 ffff20000e1485a0 0000000000000001 0000000000000000
7aa0: ffff20000da58140 0000000000003a6d 0000000000000000 0000000000000000
7ac0: 000000000000003c ffff800035723a80 dfff200000000000 000000000000003c
7ae0: ffff80003a901a90 00000000000001e8 000000000000003c ffff80003efd7b50
7b00: ffff200009e03214 ffff80003efd7b50 ffff200009e03214 0000000000000145
7b20: 0000000000003a6d 0000000000000000 0001000000000000 000000000000003c
7b40: ffff80003efd7b50 ffff200009e03214
[<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
[<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
[<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
[<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
[<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
[<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
[<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
[<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
[<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
[<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
[<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
[<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
[<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
[<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
[<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
[<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
[<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
[<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
[<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
[<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
[<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
[<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
[<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
[<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
[<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
[<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
[<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
[<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
[<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
[<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
Exception stack(0xffff80003a90bb70 to 0xffff80003a90bcb0)
bb60:                                   ffff80003a90234c 0000000000000007
bb80: 0000000000000000 1ffff00007520469 1fffe400017ad00c dfff200000000000
bba0: dfff200000000000 0000000000000000 ffff80003a902350 1ffff00007520469
bbc0: ffff80003a902348 ffff80003a902368 1ffff0000752046c 1ffff0000752046e
bbe0: 1ffff0000752046d ffff20000e1485a0 0000000000000000 0000000000000001
bc00: ffff20000da58140 ffff80003efd9800 ffff80003efd9800 ffff20000ae60000
bc20: ffff80003a971a80 1ffff000075217aa 0000000000000000 ffff20000ae60000
bc40: 0000000000000001 ffff20000a34fce0 0000dffff519f438 ffff80003a90bcb0
bc60: ffff20000a36134c ffff80003a90bcb0 ffff20000a361350 0000000010000145
bc80: ffff80003efd9800 ffff80003efd9800 ffffffffffffffff ffff80003efd9800
bca0: ffff80003a90bcb0 ffff20000a361350
[<ffff200008084034>] el1_irq+0xb4/0x12c arch/arm64/kernel/entry.S:569
[<ffff20000a361350>] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:40 [inline]
[<ffff20000a361350>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
[<ffff20000a361350>] _raw_spin_unlock_irq+0x30/0x100 kernel/locking/spinlock.c:199
[<ffff2000081e0850>] finish_lock_switch kernel/sched/sched.h:1335 [inline]
[<ffff2000081e0850>] finish_task_switch+0x1d8/0x950 kernel/sched/core.c:2657
[<ffff20000a34fce0>] context_switch kernel/sched/core.c:2793 [inline]
[<ffff20000a34fce0>] __schedule+0x518/0x17b0 kernel/sched/core.c:3366
[<ffff20000a3520e8>] schedule_idle+0x58/0xc8 kernel/sched/core.c:3452
[<ffff200008254a00>] do_idle+0x1d8/0x370 kernel/sched/idle.c:269
[<ffff200008255138>] cpu_startup_entry+0x20/0x28 kernel/sched/idle.c:351
[<ffff2000080a2f4c>] secondary_start_kernel+0x2fc/0x498 arch/arm64/kernel/smp.c:280
Code: 97bcbfac 17fffe19 d503201f 97974258 (d4210000) 
---[ end trace 3359b414c3a12466 ]---

^ permalink raw reply

* Re: cross namespace interface notification for tun devices
From: Jason A. Donenfeld @ 2017-10-02 11:11 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Netdev, Mathias
In-Reply-To: <f80d9afa-0ce2-903d-9bf9-b7bb8765086b@6wind.com>

On Mon, Oct 2, 2017 at 11:32 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> 1. Move the process to netns B, open the netlink socket and move back the
> process to netns A. The socket will remain in netns B and you will receive all
> netlink messages related to netns B.
>
> 2. Assign a nsid to netns B in netns A and use NETLINK_LISTEN_ALL_NSID on your
> netlink socket (see iproute2).

Both of these seem to rely on the process knowing where the device is
being moved and having access to that namespace. I don't think these
two things are a given though. Unless I'm missing something?

Jason

^ permalink raw reply

* Re: [PATCH v3 02/19] thunderbolt: Remove __packed from ICM message structures
From: Andy Shevchenko @ 2017-10-02 11:45 UTC (permalink / raw)
  To: Mika Westerberg, Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andrew Lunn, netdev,
	linux-kernel
In-Reply-To: <20171002103846.64602-3-mika.westerberg@linux.intel.com>

On Mon, 2017-10-02 at 13:38 +0300, Mika Westerberg wrote:
> These messages are all 32-byte aligned and they should be packed 

Obviously 32-bit.

Other than that,

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

> without
> the __packed attribute just fine. It also allows compiler to generate
> better code on some architectures.
> 
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Reviewed-by: Michael Jamet <michael.jamet@intel.com>
> Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
> ---
>  drivers/thunderbolt/tb_msgs.h | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/thunderbolt/tb_msgs.h
> b/drivers/thunderbolt/tb_msgs.h
> index de6441e4a060..f3adf58a40ce 100644
> --- a/drivers/thunderbolt/tb_msgs.h
> +++ b/drivers/thunderbolt/tb_msgs.h
> @@ -130,7 +130,7 @@ struct icm_pkg_header {
>  	u8 flags;
>  	u8 packet_id;
>  	u8 total_packets;
> -} __packed;
> +};
>  
>  #define ICM_FLAGS_ERROR			BIT(0)
>  #define ICM_FLAGS_NO_KEY		BIT(1)
> @@ -139,20 +139,20 @@ struct icm_pkg_header {
>  
>  struct icm_pkg_driver_ready {
>  	struct icm_pkg_header hdr;
> -} __packed;
> +};
>  
>  struct icm_pkg_driver_ready_response {
>  	struct icm_pkg_header hdr;
>  	u8 romver;
>  	u8 ramver;
>  	u16 security_level;
> -} __packed;
> +};
>  
>  /* Falcon Ridge & Alpine Ridge common messages */
>  
>  struct icm_fr_pkg_get_topology {
>  	struct icm_pkg_header hdr;
> -} __packed;
> +};
>  
>  #define ICM_GET_TOPOLOGY_PACKETS	14
>  
> @@ -167,7 +167,7 @@ struct icm_fr_pkg_get_topology_response {
>  	u32 reserved[2];
>  	u32 ports[16];
>  	u32 port_hop_info[16];
> -} __packed;
> +};
>  
>  #define ICM_SWITCH_USED			BIT(0)
>  #define ICM_SWITCH_UPSTREAM_PORT_MASK	GENMASK(7, 1)
> @@ -184,7 +184,7 @@ struct icm_fr_event_device_connected {
>  	u8 connection_id;
>  	u16 link_info;
>  	u32 ep_name[55];
> -} __packed;
> +};
>  
>  #define ICM_LINK_INFO_LINK_MASK		0x7
>  #define ICM_LINK_INFO_DEPTH_SHIFT	4
> @@ -197,13 +197,13 @@ struct icm_fr_pkg_approve_device {
>  	u8 connection_key;
>  	u8 connection_id;
>  	u16 reserved;
> -} __packed;
> +};
>  
>  struct icm_fr_event_device_disconnected {
>  	struct icm_pkg_header hdr;
>  	u16 reserved;
>  	u16 link_info;
> -} __packed;
> +};
>  
>  struct icm_fr_pkg_add_device_key {
>  	struct icm_pkg_header hdr;
> @@ -212,7 +212,7 @@ struct icm_fr_pkg_add_device_key {
>  	u8 connection_id;
>  	u16 reserved;
>  	u32 key[8];
> -} __packed;
> +};
>  
>  struct icm_fr_pkg_add_device_key_response {
>  	struct icm_pkg_header hdr;
> @@ -220,7 +220,7 @@ struct icm_fr_pkg_add_device_key_response {
>  	u8 connection_key;
>  	u8 connection_id;
>  	u16 reserved;
> -} __packed;
> +};
>  
>  struct icm_fr_pkg_challenge_device {
>  	struct icm_pkg_header hdr;
> @@ -229,7 +229,7 @@ struct icm_fr_pkg_challenge_device {
>  	u8 connection_id;
>  	u16 reserved;
>  	u32 challenge[8];
> -} __packed;
> +};
>  
>  struct icm_fr_pkg_challenge_device_response {
>  	struct icm_pkg_header hdr;
> @@ -239,7 +239,7 @@ struct icm_fr_pkg_challenge_device_response {
>  	u16 reserved;
>  	u32 challenge[8];
>  	u32 response[8];
> -} __packed;
> +};
>  
>  /* Alpine Ridge only messages */
>  
> @@ -247,7 +247,7 @@ struct icm_ar_pkg_get_route {
>  	struct icm_pkg_header hdr;
>  	u16 reserved;
>  	u16 link_info;
> -} __packed;
> +};
>  
>  struct icm_ar_pkg_get_route_response {
>  	struct icm_pkg_header hdr;
> @@ -255,6 +255,6 @@ struct icm_ar_pkg_get_route_response {
>  	u16 link_info;
>  	u32 route_hi;
>  	u32 route_lo;
> -} __packed;
> +};
>  
>  #endif

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply

* [iproute PATCH v3 1/3] ip{6,}tunnel: Avoid copying user-supplied interface name around
From: Phil Sutter @ 2017-10-02 11:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20171002114637.25703-1-phil@nwl.cc>

In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq.  Avoid this by just caching the argv
pointer value until the later lookup/strcpy.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 ip/ip6tunnel.c |  6 +++---
 ip/iptunnel.c  | 22 +++++++++-------------
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
index b4a7def144226..c12d700e74189 100644
--- a/ip/ip6tunnel.c
+++ b/ip/ip6tunnel.c
@@ -136,7 +136,7 @@ static void print_tunnel(struct ip6_tnl_parm2 *p)
 static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 {
 	int count = 0;
-	char medium[IFNAMSIZ] = {};
+	const char *medium = NULL;
 
 	while (argc > 0) {
 		if (strcmp(*argv, "mode") == 0) {
@@ -180,7 +180,7 @@ static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 			memcpy(&p->laddr, &laddr.data, sizeof(p->laddr));
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ - 1);
+			medium = *argv;
 		} else if (strcmp(*argv, "encaplimit") == 0) {
 			NEXT_ARG();
 			if (strcmp(*argv, "none") == 0) {
@@ -285,7 +285,7 @@ static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 		count++;
 		argc--; argv++;
 	}
-	if (medium[0]) {
+	if (medium) {
 		p->link = ll_name_to_index(medium);
 		if (p->link == 0) {
 			fprintf(stderr, "Cannot find device \"%s\"\n", medium);
diff --git a/ip/iptunnel.c b/ip/iptunnel.c
index 105d0f5576f1a..0acfd0793d3cd 100644
--- a/ip/iptunnel.c
+++ b/ip/iptunnel.c
@@ -60,7 +60,7 @@ static void set_tunnel_proto(struct ip_tunnel_parm *p, int proto)
 static int parse_args(int argc, char **argv, int cmd, struct ip_tunnel_parm *p)
 {
 	int count = 0;
-	char medium[IFNAMSIZ] = {};
+	const char *medium = NULL;
 	int isatap = 0;
 
 	memset(p, 0, sizeof(*p));
@@ -139,7 +139,7 @@ static int parse_args(int argc, char **argv, int cmd, struct ip_tunnel_parm *p)
 				p->iph.saddr = htonl(INADDR_ANY);
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ - 1);
+			medium = *argv;
 		} else if (strcmp(*argv, "ttl") == 0 ||
 			   strcmp(*argv, "hoplimit") == 0 ||
 			   strcmp(*argv, "hlim") == 0) {
@@ -216,7 +216,7 @@ static int parse_args(int argc, char **argv, int cmd, struct ip_tunnel_parm *p)
 		}
 	}
 
-	if (medium[0]) {
+	if (medium) {
 		p->link = ll_name_to_index(medium);
 		if (p->link == 0) {
 			fprintf(stderr, "Cannot find device \"%s\"\n", medium);
@@ -465,9 +465,8 @@ static int do_prl(int argc, char **argv)
 {
 	struct ip_tunnel_prl p = {};
 	int count = 0;
-	int devname = 0;
 	int cmd = 0;
-	char medium[IFNAMSIZ] = {};
+	const char *medium = NULL;
 
 	while (argc > 0) {
 		if (strcmp(*argv, "prl-default") == 0) {
@@ -488,8 +487,7 @@ static int do_prl(int argc, char **argv)
 			count++;
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ-1);
-			devname++;
+			medium = *argv;
 		} else {
 			fprintf(stderr,
 				"Invalid PRL parameter \"%s\"\n", *argv);
@@ -502,7 +500,7 @@ static int do_prl(int argc, char **argv)
 		}
 		argc--; argv++;
 	}
-	if (devname == 0) {
+	if (!medium) {
 		fprintf(stderr, "Must specify device\n");
 		exit(-1);
 	}
@@ -513,9 +511,8 @@ static int do_prl(int argc, char **argv)
 static int do_6rd(int argc, char **argv)
 {
 	struct ip_tunnel_6rd ip6rd = {};
-	int devname = 0;
 	int cmd = 0;
-	char medium[IFNAMSIZ] = {};
+	const char *medium = NULL;
 	inet_prefix prefix;
 
 	while (argc > 0) {
@@ -537,8 +534,7 @@ static int do_6rd(int argc, char **argv)
 			cmd = SIOCDEL6RD;
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ-1);
-			devname++;
+			medium = *argv;
 		} else {
 			fprintf(stderr,
 				"Invalid 6RD parameter \"%s\"\n", *argv);
@@ -546,7 +542,7 @@ static int do_6rd(int argc, char **argv)
 		}
 		argc--; argv++;
 	}
-	if (devname == 0) {
+	if (!medium) {
 		fprintf(stderr, "Must specify device\n");
 		exit(-1);
 	}
-- 
2.13.1

^ permalink raw reply related

* [iproute PATCH v3 3/3] Check user supplied interface name lengths
From: Phil Sutter @ 2017-10-02 11:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20171002114637.25703-1-phil@nwl.cc>

The original problem was that something like:

| strncpy(ifr.ifr_name, *argv, IFNAMSIZ);

might leave ifr.ifr_name unterminated if length of *argv exceeds
IFNAMSIZ. In order to fix this, I thought about replacing all those
cases with (equivalent) calls to snprintf() or even introducing
strlcpy(). But as Ulrich Drepper correctly pointed out when rejecting
the latter from being added to glibc, truncating a string without
notifying the user is not to be considered good practice. So let's
excercise what he suggested and reject empty, overlong or otherwise
invalid interface names right from the start - this way calls to
strncpy() like shown above become safe and the user has a chance to
reconsider what he was trying to do.

Note that this doesn't add calls to check_ifname() to all places where
user supplied interface name is parsed. In many cases, the interface
must exist already and is therefore looked up using ll_name_to_index(),
so if_nametoindex() will perform the necessary checks already.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
Changes since v2:
- Change implementation of check_ifname() and add get_ifname() just as
  Stephen suggested with one exception: Call strncpy() with length of
  IFNAMSIZ, otherwise it might leave destination unterminated.
- Change callers accordingly.

Changes since v1:
- Added missing check to tc/f_flower.c.
- Drop some useless checks from ip/ip{6,}tunnel.c (ll_name_to_index()
  will detect illegal interface names for us).
- Renamed assert_valid_dev_name() to the shorter check_ifname().
- iplink: Check 'name' and 'dev' parameters right where they are parsed.
- ipl2tp: Drop needless check for p->ifname[0].
---
 include/utils.h |  2 ++
 ip/ip6tunnel.c  |  3 ++-
 ip/ipl2tp.c     |  4 +++-
 ip/iplink.c     | 31 ++++++++++++-------------------
 ip/ipmaddr.c    |  3 ++-
 ip/iprule.c     | 10 ++++++++--
 ip/iptunnel.c   |  7 ++++++-
 ip/iptuntap.c   |  6 ++++--
 lib/utils.c     | 29 +++++++++++++++++++++++++++++
 misc/arpd.c     |  3 ++-
 tc/f_flower.c   |  2 ++
 11 files changed, 72 insertions(+), 28 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index c9ed230b96044..76addb3258f59 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -133,6 +133,8 @@ void missarg(const char *) __attribute__((noreturn));
 void invarg(const char *, const char *) __attribute__((noreturn));
 void duparg(const char *, const char *) __attribute__((noreturn));
 void duparg2(const char *, const char *) __attribute__((noreturn));
+int check_ifname(const char *);
+int get_ifname(char *, const char *);
 int matches(const char *arg, const char *pattern);
 int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
 
diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
index c12d700e74189..bc44bef7f030c 100644
--- a/ip/ip6tunnel.c
+++ b/ip/ip6tunnel.c
@@ -273,7 +273,8 @@ static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 				usage();
 			if (p->name[0])
 				duparg2("name", *argv);
-			strncpy(p->name, *argv, IFNAMSIZ - 1);
+			if (get_ifname(p->name, *argv))
+				invarg("\"name\" not a valid ifname", *argv);
 			if (cmd == SIOCCHGTUNNEL && count == 0) {
 				struct ip6_tnl_parm2 old_p = {};
 
diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 88664c909e11f..1e37b175e3315 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -182,7 +182,7 @@ static int create_session(struct l2tp_parm *p)
 	if (p->peer_cookie_len)
 		addattr_l(&req.n, 1024, L2TP_ATTR_PEER_COOKIE,
 			  p->peer_cookie,  p->peer_cookie_len);
-	if (p->ifname && p->ifname[0])
+	if (p->ifname)
 		addattrstrz(&req.n, 1024, L2TP_ATTR_IFNAME, p->ifname);
 
 	if (rtnl_talk(&genl_rth, &req.n, NULL, 0) < 0)
@@ -545,6 +545,8 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
 			}
 		} else if (strcmp(*argv, "name") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("\"name\" not a valid ifname", *argv);
 			p->ifname = *argv;
 		} else if (strcmp(*argv, "remote") == 0) {
 			NEXT_ARG();
diff --git a/ip/iplink.c b/ip/iplink.c
index ff5b56c038d28..6a96ea9ff56a7 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -573,6 +573,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 			req->i.ifi_flags &= ~IFF_UP;
 		} else if (strcmp(*argv, "name") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("\"name\" not a valid ifname", *argv);
 			*name = *argv;
 		} else if (strcmp(*argv, "index") == 0) {
 			NEXT_ARG();
@@ -848,6 +850,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 				NEXT_ARG();
 			if (*dev)
 				duparg2("dev", *argv);
+			if (check_ifname(*argv))
+				invarg("\"dev\" not a valid ifname", *argv);
 			*dev = *argv;
 			dev_index = ll_name_to_index(*dev);
 		}
@@ -870,7 +874,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 
 static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 {
-	int len;
 	char *dev = NULL;
 	char *name = NULL;
 	char *link = NULL;
@@ -960,13 +963,8 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 	}
 
 	if (name) {
-		len = strlen(name) + 1;
-		if (len == 1)
-			invarg("\"\" is not a valid device identifier\n",
-			       "name");
-		if (len > IFNAMSIZ)
-			invarg("\"name\" too long\n", name);
-		addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name, len);
+		addattr_l(&req.n, sizeof(req),
+			  IFLA_IFNAME, name, strlen(name) + 1);
 	}
 
 	if (type) {
@@ -1016,7 +1014,6 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 
 int iplink_get(unsigned int flags, char *name, __u32 filt_mask)
 {
-	int len;
 	struct iplink_req req = {
 		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
 		.n.nlmsg_flags = NLM_F_REQUEST | flags,
@@ -1029,13 +1026,8 @@ int iplink_get(unsigned int flags, char *name, __u32 filt_mask)
 	} answer;
 
 	if (name) {
-		len = strlen(name) + 1;
-		if (len == 1)
-			invarg("\"\" is not a valid device identifier\n",
-				   "name");
-		if (len > IFNAMSIZ)
-			invarg("\"name\" too long\n", name);
-		addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name, len);
+		addattr_l(&req.n, sizeof(req),
+			  IFLA_IFNAME, name, strlen(name) + 1);
 	}
 	addattr32(&req.n, sizeof(req), IFLA_EXT_MASK, filt_mask);
 
@@ -1265,6 +1257,8 @@ static int do_set(int argc, char **argv)
 			flags &= ~IFF_UP;
 		} else if (strcmp(*argv, "name") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("\"name\" not a valid ifname", *argv);
 			newname = *argv;
 		} else if (matches(*argv, "address") == 0) {
 			NEXT_ARG();
@@ -1355,6 +1349,8 @@ static int do_set(int argc, char **argv)
 
 			if (dev)
 				duparg2("dev", *argv);
+			if (check_ifname(*argv))
+				invarg("\"dev\" not a valid ifname", *argv);
 			dev = *argv;
 		}
 		argc--; argv++;
@@ -1383,9 +1379,6 @@ static int do_set(int argc, char **argv)
 	}
 
 	if (newname && strcmp(dev, newname)) {
-		if (strlen(newname) == 0)
-			invarg("\"\" is not a valid device identifier\n",
-			       "name");
 		if (do_changename(dev, newname) < 0)
 			return -1;
 		dev = newname;
diff --git a/ip/ipmaddr.c b/ip/ipmaddr.c
index 85a69e779563d..5683f6fa830c1 100644
--- a/ip/ipmaddr.c
+++ b/ip/ipmaddr.c
@@ -284,7 +284,8 @@ static int multiaddr_modify(int cmd, int argc, char **argv)
 			NEXT_ARG();
 			if (ifr.ifr_name[0])
 				duparg("dev", *argv);
-			strncpy(ifr.ifr_name, *argv, IFNAMSIZ);
+			if (get_ifname(ifr.ifr_name, *argv))
+				invarg("\"dev\" not a valid ifname", *argv);
 		} else {
 			if (matches(*argv, "address") == 0) {
 				NEXT_ARG();
diff --git a/ip/iprule.c b/ip/iprule.c
index 8313138db815f..36c57fa70b74a 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -472,11 +472,13 @@ static int iprule_list_flush_or_save(int argc, char **argv, int action)
 		} else if (strcmp(*argv, "dev") == 0 ||
 			   strcmp(*argv, "iif") == 0) {
 			NEXT_ARG();
-			strncpy(filter.iif, *argv, IFNAMSIZ);
+			if (get_ifname(filter.iif, *argv))
+				invarg("\"iif\"/\"dev\" not a valid ifname", *argv);
 			filter.iifmask = 1;
 		} else if (strcmp(*argv, "oif") == 0) {
 			NEXT_ARG();
-			strncpy(filter.oif, *argv, IFNAMSIZ);
+			if (get_ifname(filter.oif, *argv))
+				invarg("\"oif\" not a valid ifname", *argv);
 			filter.oifmask = 1;
 		} else if (strcmp(*argv, "l3mdev") == 0) {
 			filter.l3mdev = 1;
@@ -695,10 +697,14 @@ static int iprule_modify(int cmd, int argc, char **argv)
 		} else if (strcmp(*argv, "dev") == 0 ||
 			   strcmp(*argv, "iif") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("\"iif\"/\"dev\" not a valid ifname", *argv);
 			addattr_l(&req.n, sizeof(req), FRA_IFNAME,
 				  *argv, strlen(*argv)+1);
 		} else if (strcmp(*argv, "oif") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("\"oif\" not a valid ifname", *argv);
 			addattr_l(&req.n, sizeof(req), FRA_OIFNAME,
 				  *argv, strlen(*argv)+1);
 		} else if (strcmp(*argv, "l3mdev") == 0) {
diff --git a/ip/iptunnel.c b/ip/iptunnel.c
index 0acfd0793d3cd..208a1f06ab12f 100644
--- a/ip/iptunnel.c
+++ b/ip/iptunnel.c
@@ -178,7 +178,8 @@ static int parse_args(int argc, char **argv, int cmd, struct ip_tunnel_parm *p)
 
 			if (p->name[0])
 				duparg2("name", *argv);
-			strncpy(p->name, *argv, IFNAMSIZ - 1);
+			if (get_ifname(p->name, *argv))
+				invarg("\"name\" not a valid ifname", *argv);
 			if (cmd == SIOCCHGTUNNEL && count == 0) {
 				struct ip_tunnel_parm old_p = {};
 
@@ -487,6 +488,8 @@ static int do_prl(int argc, char **argv)
 			count++;
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("\"dev\" not a valid ifname", *argv);
 			medium = *argv;
 		} else {
 			fprintf(stderr,
@@ -534,6 +537,8 @@ static int do_6rd(int argc, char **argv)
 			cmd = SIOCDEL6RD;
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("\"dev\" not a valid ifname", *argv);
 			medium = *argv;
 		} else {
 			fprintf(stderr,
diff --git a/ip/iptuntap.c b/ip/iptuntap.c
index 451f7f0eac6bb..b46e452f21278 100644
--- a/ip/iptuntap.c
+++ b/ip/iptuntap.c
@@ -176,7 +176,8 @@ static int parse_args(int argc, char **argv,
 			ifr->ifr_flags |= IFF_MULTI_QUEUE;
 		} else if (matches(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(ifr->ifr_name, *argv, IFNAMSIZ-1);
+			if (get_ifname(ifr->ifr_name, *argv))
+				invarg("\"dev\" not a valid ifname", *argv);
 		} else {
 			if (matches(*argv, "name") == 0) {
 				NEXT_ARG();
@@ -184,7 +185,8 @@ static int parse_args(int argc, char **argv,
 				usage();
 			if (ifr->ifr_name[0])
 				duparg2("name", *argv);
-			strncpy(ifr->ifr_name, *argv, IFNAMSIZ);
+			if (get_ifname(ifr->ifr_name, *argv))
+				invarg("\"name\" not a valid ifname", *argv);
 		}
 		count++;
 		argc--; argv++;
diff --git a/lib/utils.c b/lib/utils.c
index bbd3cbc46a0e5..0cf99619c3021 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -20,6 +20,7 @@
 #include <sys/socket.h>
 #include <netinet/in.h>
 #include <string.h>
+#include <ctype.h>
 #include <netdb.h>
 #include <arpa/inet.h>
 #include <asm/types.h>
@@ -699,6 +700,34 @@ void duparg2(const char *key, const char *arg)
 	exit(-1);
 }
 
+int check_ifname(const char *name)
+{
+	/* These checks mimic kernel checks in dev_valid_name */
+	if (*name == '\0')
+		return -1;
+	if (strlen(name) >= IFNAMSIZ)
+		return -1;
+
+	while (*name) {
+		if (*name == '/' || isspace(*name))
+			return -1;
+		++name;
+	}
+	return 0;
+}
+
+/* buf is assumed to be IFNAMSIZ */
+int get_ifname(char *buf, const char *name)
+{
+	int ret;
+
+	ret = check_ifname(name);
+	if (ret == 0)
+		strncpy(buf, name, IFNAMSIZ);
+
+	return ret;
+}
+
 int matches(const char *cmd, const char *pattern)
 {
 	int len = strlen(cmd);
diff --git a/misc/arpd.c b/misc/arpd.c
index bfab44544ee1d..c2666f76fd5e9 100644
--- a/misc/arpd.c
+++ b/misc/arpd.c
@@ -664,7 +664,8 @@ int main(int argc, char **argv)
 		struct ifreq ifr = {};
 
 		for (i = 0; i < ifnum; i++) {
-			strncpy(ifr.ifr_name, ifnames[i], IFNAMSIZ);
+			if (get_ifname(ifr.ifr_name, ifnames[i]))
+				invarg("not a valid ifname", ifnames[i]);
 			if (ioctl(udp_sock, SIOCGIFINDEX, &ifr)) {
 				perror("ioctl(SIOCGIFINDEX)");
 				exit(-1);
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 99e62a382dec6..b180210717394 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -630,6 +630,8 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 			flags |= TCA_CLS_FLAGS_SKIP_SW;
 		} else if (matches(*argv, "indev") == 0) {
 			NEXT_ARG();
+			if (check_ifname(*argv))
+				invarg("\"indev\" not a valid ifname", *argv);
 			addattrstrz(n, MAX_MSG, TCA_FLOWER_INDEV, *argv);
 		} else if (matches(*argv, "vlan_id") == 0) {
 			__u16 vid;
-- 
2.13.1

^ permalink raw reply related

* [iproute PATCH v3 2/3] tc: flower: No need to cache indev arg
From: Phil Sutter @ 2017-10-02 11:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20171002114637.25703-1-phil@nwl.cc>

Since addattrstrz() will copy the provided string into the attribute
payload, there is no need to cache the data.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 tc/f_flower.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 934832e2bbe90..99e62a382dec6 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -629,11 +629,8 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 		} else if (matches(*argv, "skip_sw") == 0) {
 			flags |= TCA_CLS_FLAGS_SKIP_SW;
 		} else if (matches(*argv, "indev") == 0) {
-			char ifname[IFNAMSIZ] = {};
-
 			NEXT_ARG();
-			strncpy(ifname, *argv, sizeof(ifname) - 1);
-			addattrstrz(n, MAX_MSG, TCA_FLOWER_INDEV, ifname);
+			addattrstrz(n, MAX_MSG, TCA_FLOWER_INDEV, *argv);
 		} else if (matches(*argv, "vlan_id") == 0) {
 			__u16 vid;
 
-- 
2.13.1

^ permalink raw reply related

* [iproute PATCH v3 0/3] Check user supplied interface name lengths
From: Phil Sutter @ 2017-10-02 11:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

This series adds explicit checks for user-supplied interface names to
make sure they fit Linux's requirements.

The first two patches simplify interface name parsing in some places -
these are side-effects of working on the actual implementation provided
in patch three.

Changes since v2:
- Changed patch 3 as suggested in review.

Changes since v1:
- Patches 1 and 2 introduced.
- Changes to patch 3 are listed in there.

Phil Sutter (3):
  ip{6,}tunnel: Avoid copying user-supplied interface name around
  tc: flower: No need to cache indev arg
  Check user supplied interface name lengths

 include/utils.h |  2 ++
 ip/ip6tunnel.c  |  9 +++++----
 ip/ipl2tp.c     |  4 +++-
 ip/iplink.c     | 31 ++++++++++++-------------------
 ip/ipmaddr.c    |  3 ++-
 ip/iprule.c     | 10 ++++++++--
 ip/iptunnel.c   | 29 +++++++++++++++--------------
 ip/iptuntap.c   |  6 ++++--
 lib/utils.c     | 29 +++++++++++++++++++++++++++++
 misc/arpd.c     |  3 ++-
 tc/f_flower.c   |  7 +++----
 11 files changed, 85 insertions(+), 48 deletions(-)

-- 
2.13.1

^ permalink raw reply

* v4.14-rc2/arm64 misaligned atomic in ip_expire() / skb_clone()
From: Mark Rutland @ 2017-10-02 11:57 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-arm-kernel, syzkaller
  Cc: David S. Miller, Willem de Bruijn, Eric Dumazet

Hi all,

I'm intermittently hitting splats like below in skb_clone() while
fuzzing v4.14-rc2 on arm64 with Syzkaller. It looks like the
atomic_inc() at the end of __skb_clone() is being passed a misaligned
pointer.

I've uploaded a number of splats and their associated (full) Syzkaller
logs, along with my kernel config to my kernel.org webspace [1]. It
might take a while for that to appear.

This isn't a pure v4.14-rc2, as I have a not-yet-upstream fix [2]
applied to avoid a userfaultfd bug. The userfaultfd syscall appears in
all of the Syzkaller logs, so there is the chance that this is related,
but as I've not seen any other issues I suspect that's unlikely.

Thanks,
Mark.

[1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skb_clone-misaligned-atomic
[2] https://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com

Unable to handle kernel paging request at virtual address ffff80002fd714a2
Mem abort info:
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000033
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff20000eeb2000
[ffff80002fd714a2] *pgd=000000007eff7003, *pud=000000007eff6003, *pmd=00f800006fc00711
Internal error: Oops: 96000021 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc2-00001-gd7ad33d #115
Hardware name: linux,dummy-virt (DT)
task: ffff80003a901a80 task.stack: ffff80003a908000
PC is at __ll_sc_atomic_add+0x4/0x18 arch/arm64/include/asm/atomic_ll_sc.h:113
LR is at atomic_add arch/arm64/include/asm/atomic_lse.h:45 [inline]
LR is at __skb_clone+0x4a8/0x6c0 net/core/skbuff.c:873
pc : [<ffff20000a30ce44>] lr : [<ffff200009dffb58>] pstate: 10000145
sp : ffff80003efd86e0
x29: ffff80003efd86e0 x28: 000060003418b000 
x27: ffff20000ae55360 x26: ffff8000182c1608 
x25: ffff80002fd7137e x24: ffff8000182c1610 
x23: ffff20000ae60000 x22: ffff80001577871c 
x21: 1ffff00007dfb0e8 x20: ffff8000182c1540 
x19: ffff800015778640 x18: ffff20000da58140 
x17: 0000000000000000 x16: 0000000000000002 
x15: ffff20000e1485a0 x14: ffff2000082f912c 
x13: ffff2000082f8dcc x12: ffff2000082f8980 
x11: 1ffff00002aef0df x10: ffff100002aef0df 
x9 : dfff200000000000 x8 : 0082009000a40008 
x7 : 0000000000000000 x6 : ffff800015778700 
x5 : ffff100002aef0e0 x4 : 0000000000000000 
x3 : 1ffff00002aef0e3 x2 : ffff80002fd7147e 
x1 : ffff80002fd714a2 x0 : 0000000000000001 
Process swapper/3 (pid: 0, stack limit = 0xffff80003a908000)
Call trace:
Exception stack(0xffff80003efd85a0 to 0xffff80003efd86e0)
85a0: 0000000000000001 ffff80002fd714a2 ffff80002fd7147e 1ffff00002aef0e3
85c0: 0000000000000000 ffff100002aef0e0 ffff800015778700 0000000000000000
85e0: 0082009000a40008 dfff200000000000 ffff100002aef0df 1ffff00002aef0df
8600: ffff2000082f8980 ffff2000082f8dcc ffff2000082f912c ffff20000e1485a0
8620: 0000000000000002 0000000000000000 ffff20000da58140 ffff800015778640
8640: ffff8000182c1540 1ffff00007dfb0e8 ffff80001577871c ffff20000ae60000
8660: ffff8000182c1610 ffff80002fd7137e ffff8000182c1608 ffff20000ae55360
8680: 000060003418b000 ffff80003efd86e0 ffff200009dffb58 ffff80003efd86e0
86a0: ffff20000a30ce44 0000000010000145 ffff800015778640 ffff8000182c1540
86c0: 0001000000000000 ffff8000182c15ce ffff80003efd86e0 ffff20000a30ce44
[<ffff20000a30ce44>] __ll_sc_atomic_add+0x4/0x18 arch/arm64/include/asm/atomic_ll_sc.h:113
[<ffff200009e1009c>] skb_clone+0x1c4/0x3b0 net/core/skbuff.c:1286
[<ffff200009f2ff80>] ip_expire+0x4e8/0x7c0 net/ipv4/ip_fragment.c:239
[<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
[<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
[<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
[<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
[<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
[<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
[<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
[<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
[<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
[<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
[<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
Exception stack(0xffff80003a90bd70 to 0xffff80003a90beb0)
bd60:                                   ffff80003a90234c 0000000000000007
bd80: 0000000000000000 1ffff00007520469 1fffe400017ad00c ffffffffffffe540
bda0: 0000000000000000 0000000000000000 ffff80003a902350 1ffff00007520469
bdc0: ffff80003a902348 ffff80003a902368 1ffff0000752046c 1ffff0000752046e
bde0: 1ffff0000752046d ffff20000e1485a0 0000000000000000 0000000000029d44
be00: ffff20000da58140 ffff80003a901a80 ffff80003a901a80 dfff200000000000
be20: ffff20000ae60e98 ffff0400015cc1d3 0000000000000000 ffff20000ae60df8
be40: ffff20000ae60df8 0000000000000000 0000000000000000 ffff80003a90beb0
be60: ffff200008089b50 ffff80003a90beb0 ffff200008089b54 0000000010000145
be80: ffff80003a901a80 ffff80003a901a80 ffffffffffffffff 01f6cee936b5bc00
bea0: ffff80003a90beb0 ffff200008089b54
[<ffff200008084034>] el1_irq+0xb4/0x12c arch/arm64/kernel/entry.S:569
[<ffff200008089b54>] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:40 [inline]
[<ffff200008089b54>] arch_cpu_idle+0x1c/0x28 arch/arm64/kernel/process.c:87
[<ffff20000a360a94>] default_idle_call+0x34/0x78 kernel/sched/idle.c:98
[<ffff200008254a34>] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
[<ffff200008254a34>] do_idle+0x20c/0x370 kernel/sched/idle.c:246
[<ffff20000825513c>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:351
[<ffff2000080a2f4c>] secondary_start_kernel+0x2fc/0x498 arch/arm64/kernel/smp.c:280
Code: 978b7cfd 17ffff91 00000000 f9800031 (885f7c31) 
---[ end trace e4e9a51ab15d3a5f ]---

^ permalink raw reply

* Re: [PATCH] mac80211: aead api to reduce redundancy
From: Johannes Berg @ 2017-10-02 12:04 UTC (permalink / raw)
  To: Xiang Gao, davem, linux-kernel, linux-wireless, netdev
In-Reply-To: <20170926131945.3962-1-qasdfgtyuiop@gmail.com>

Please use "v2" tag or so in the subject line, having the same patch
again is really not helpful.

The next should be v3, obviously.

> +++ b/net/mac80211/aead_api.c
> @@ -1,7 +1,4 @@
> -/*
> - * Copyright 2014-2015, Qualcomm Atheros, Inc.
> - *
> - * This program is free software; you can redistribute it and/or
> modify
> +/* This program is free software; you can redistribute it and/or
> modify

I see no reason to make this change, why remove copyright?

> +++ b/net/mac80211/wpa.c
> @@ -464,7 +464,8 @@ static int ccmp_encrypt_skb(struct
> ieee80211_tx_data *tx, struct sk_buff *skb,
>  	pos += IEEE80211_CCMP_HDR_LEN;
>  	ccmp_special_blocks(skb, pn, b_0, aad);
>  	return ieee80211_aes_ccm_encrypt(key->u.ccmp.tfm, b_0, aad,
> pos, len,
> -					 skb_put(skb, mic_len),
> mic_len);
> +					 skb_put(skb,
> +						 key->u.ccmp.tfm-
> >authsize));
>  }

I see no reason for the change from mic_len to authsize here?

> @@ -540,10 +541,11 @@ ieee80211_crypto_ccmp_decrypt(struct
> ieee80211_rx_data *rx,
>  			ccmp_special_blocks(skb, pn, b_0, aad);
>  
>  			if (ieee80211_aes_ccm_decrypt(
> -				    key->u.ccmp.tfm, b_0, aad,
> -				    skb->data + hdrlen + IEEE80211_CCMP_HDR_LEN,
> -				    data_len,
> -				    skb->data + skb->len - mic_len, mic_len))
> +				key->u.ccmp.tfm, b_0, aad,
> +				skb->data + hdrlen + IEEE80211_CCMP_HDR_LEN,
> +				data_len,
> +				skb->data + skb->len - key->u.ccmp.tfm->authsize
> +			))
>  				return RX_DROP_UNUSABLE;

That's a really really strange way of writing this ...

Please reformat.

johannes

^ permalink raw reply

* Re: cross namespace interface notification for tun devices
From: Nicolas Dichtel @ 2017-10-02 12:06 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev, Mathias
In-Reply-To: <CAHmME9pAvv7ebKC-uZGPJRi9Jasgrd2tgCvS1Lji+cgM1mV2qw@mail.gmail.com>

Le 02/10/2017 à 13:11, Jason A. Donenfeld a écrit :
> On Mon, Oct 2, 2017 at 11:32 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> 1. Move the process to netns B, open the netlink socket and move back the
>> process to netns A. The socket will remain in netns B and you will receive all
>> netlink messages related to netns B.
>>
>> 2. Assign a nsid to netns B in netns A and use NETLINK_LISTEN_ALL_NSID on your
>> netlink socket (see iproute2).
> 
> Both of these seem to rely on the process knowing where the device is
> being moved and having access to that namespace. I don't think these
> two things are a given though. Unless I'm missing something?
I didn't understand correctly.
Your control process cannot monitor or control an interface which is in a
unkown/hidden netns. But x-netns interfaces are special. We already add a way to
identify peer netns for this kind of interfaces.
If an handler get_link_net was added to the rtnl_link_ops of the tun driver, it
will help to identify netns A when you are in netns B. But you need the opposite.
I already try a patch to advertise via netlink the dst netns when an interface
moves to a new netns. I think that it is valid for x-netns interfaces.
As soon as you can identify the dst netns, your problem is solved, right?


Nicolas

^ permalink raw reply

* Re: [net-next V2 PATCH 5/5] samples/bpf: add cpumap sample program xdp_redirect_cpu
From: Jesper Dangaard Brouer @ 2017-10-02 12:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, Jason Wang, mchan,
	John Fastabend, peter.waskiewicz.jr, Daniel Borkmann,
	Andy Gospodarek, brouer
In-Reply-To: <20170930030607.sk2wzjxxlbhkkt7k@ast-mbp>

On Fri, 29 Sep 2017 20:06:09 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > +/*** Trace point code ***/
> > +
> > +/* Tracepoint format: /sys/kernel/debug/tracing/events/xdp/xdp_redirect/format
> > + * Code in:                kernel/include/trace/events/xdp.h
> > + */
> > +struct xdp_redirect_ctx {
> > +	unsigned short common_type;	//	offset:0;  size:2; signed:0;
> > +	unsigned char common_flags;	//	offset:2;  size:1; signed:0;
> > +	unsigned char common_preempt_count;//	offset:3;  size:1; signed:0;
> > +	int common_pid;			//	offset:4;  size:4; signed:1;  
> 
> this part is not right. First 8 bytes are not accessible by bpf code.
> Please use __u64 pad; or similar here.

I've corrected this in V3.

Can you explain why BPF cannot access these (first 8 bytes) struct members?


> Just noticed that samples/bpf/xdp_monitor_kern.c has the same problem.
> 
> > +
> > +	int prog_id;			//	offset:8;  size:4; signed:1;
> > +	u32 act;			//	offset:12  size:4; signed:0;
> > +	int ifindex;			//	offset:16  size:4; signed:1;
> > +	int err;			//	offset:20  size:4; signed:1;
> > +	int to_ifindex;			//	offset:24  size:4; signed:1;
> > +	u32 map_id;			//	offset:28  size:4; signed:0;
> > +	int map_index;			//	offset:32  size:4; signed:1;
> > +};					//	offset:36  
> 
> the second part of fields is correct.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC net-next 0/5] net: dsa: LAG support
From: Andrew Lunn @ 2017-10-02 12:51 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, vivien.didelot, jiri, idosch, Woojung.Huh, john,
	sean.wang
In-Reply-To: <20171001194639.8647-1-f.fainelli@gmail.com>

> - not sure what to do with a switch fabric, naively, if adding two ports
>   of two distinct switches as a LAG group, we may have to propagate that
>   to "dsa" cross-chip interfaces as well

Hi Florian

Marvell switches do support this. If i remember correctly, it requires
some setup for forwarding over the DSA ports.

But for a first implementation, i would be tempted to disallow such
setups. Force the LAG members to be on the same switch.

	Andrew

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH] ixgbe: Update adaptive ITR algorithm
From: Jesper Dangaard Brouer @ 2017-10-02 12:56 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, intel-wired-lan, john.fastabend, brouer
In-Reply-To: <20170925215225.15616.63705.stgit@localhost.localdomain>

On Mon, 25 Sep 2017 14:55:36 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> The following change is meant to update the adaptive ITR algorithm to
> better support the needs of the network. Specifically with this change what
> I have done is make it so that our ITR algorithm will try to prevent either
> starving a socket buffer for memory in the case of Tx, or overruing an Rx
> socket buffer on receive.
> 
> In addition a side effect of the calculations used is that we should
> function better with new features such as XDP which can handle small
> packets at high rates without needing to lock us into NAPI polling mode.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
> So I am putting this out to a wider distribution list than normal for a
> patch like this in order to get feedback on if there are any areas I may
> have overlooked. With this patch is should address many of the performance
> limitations seen with pktgen and XDP in terms of workloads that the old
> adaptive scheme wasn't handling.

Thanks a lot Alex!

I've tested the patch with XDP redirect (map), and the issue I reported
in [1] is solved with this patch.

[1] Subject: "XDP redirect measurements, gotchas and tracepoints"
 http://lkml.kernel.org/r/20170821212506.1cb0d5d6@redhat.com

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC net-next 0/5] net: dsa: LAG support
From: Andrew Lunn @ 2017-10-02 12:59 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Florian Fainelli, netdev, vivien.didelot, jiri, idosch,
	Woojung.Huh, john, sean.wang
In-Reply-To: <20171002065023.GA11832@shredder.mtl.com>

> > - not sure what to do with a switch fabric, naively, if adding two ports
> >   of two distinct switches as a LAG group, we may have to propagate that
> >   to "dsa" cross-chip interfaces as well
> 
> At least in mlxsw case, enslaving switch and non-switch ports to the
> same LAG doesn't make sense. Any traffic routed by the switch will only
> be load-balanced between the switch ports. One way to solve that is to
> forbid such enslavements during NETDEV_PRECHANGEUPPER in case the lower
> devices in the adjacency list of the LAG don't belong to the same
> switch.
> 
> Note that such configurations are bound to fail anyway, as the
> non-switch ports will not have `switchdev_ops` configured and thus fail
> during __switchdev_port_obj_add() / __switchdev_port_attr_set().

Hi Ido

Here Florian is thinking about the D in DSA. Marvell switches have the
capabilities of building a switch fabric out of multiple
interconnected switches. To switchdev, they appear as a single switch.
switchdev has no idea of the mapping of interfaces to switches, nor
the routing of frames between switches. This all happens in the layers
bellow. The hardware does support LAG members on different switches
within the same fabric. But it requires some additional setup for the
ports which link switches together. We have the same issues with MDB,
where additional setup is required for group members spread over the
switch fabric.

      Andrew

^ permalink raw reply

* Re: [RFC net-next 0/5] net: dsa: LAG support
From: Ido Schimmel @ 2017-10-02 13:05 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, netdev, vivien.didelot, jiri, idosch,
	Woojung.Huh, john, sean.wang
In-Reply-To: <20171002125932.GB4765@lunn.ch>

Hi Andrew,

On Mon, Oct 02, 2017 at 02:59:32PM +0200, Andrew Lunn wrote:
> > > - not sure what to do with a switch fabric, naively, if adding two ports
> > >   of two distinct switches as a LAG group, we may have to propagate that
> > >   to "dsa" cross-chip interfaces as well
> > 
> > At least in mlxsw case, enslaving switch and non-switch ports to the
> > same LAG doesn't make sense. Any traffic routed by the switch will only
> > be load-balanced between the switch ports. One way to solve that is to
> > forbid such enslavements during NETDEV_PRECHANGEUPPER in case the lower
> > devices in the adjacency list of the LAG don't belong to the same
> > switch.
> > 
> > Note that such configurations are bound to fail anyway, as the
> > non-switch ports will not have `switchdev_ops` configured and thus fail
> > during __switchdev_port_obj_add() / __switchdev_port_attr_set().
> 
> Hi Ido
> 
> Here Florian is thinking about the D in DSA. Marvell switches have the
> capabilities of building a switch fabric out of multiple
> interconnected switches. To switchdev, they appear as a single switch.
> switchdev has no idea of the mapping of interfaces to switches, nor
> the routing of frames between switches. This all happens in the layers
> bellow. The hardware does support LAG members on different switches
> within the same fabric. But it requires some additional setup for the
> ports which link switches together. We have the same issues with MDB,
> where additional setup is required for group members spread over the
> switch fabric.

Yes, I understood that. I was simply referring to the more general
problem of any two net devices and how to solve it. Not currently
implemented in mlxsw, but should be necessary for DSA as well.

Agree with your previous mail about keeping it simple for the first
implementation.

^ permalink raw reply

* Re: [PATCH 05/18] net: use ARRAY_SIZE
From: Andy Shevchenko @ 2017-10-02 13:07 UTC (permalink / raw)
  To: Jérémy Lefaure
  Cc: Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna, Somnath Kotur,
	Jeff Kirsher, Arend van Spriel, Franky Lin, Hante Meuleman,
	Chi-Hsien Lin, Wright Feng, Kalle Valo, Larry Finger, Chaoming Li,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	"linux-kernel@vger.kernel.org" <
In-Reply-To: <20171001193101.8898-6-jeremy.lefaure@lse.epita.fr>

On Sun, Oct 1, 2017 at 10:30 PM, Jérémy Lefaure
<jeremy.lefaure@lse.epita.fr> wrote:
> Using the ARRAY_SIZE macro improves the readability of the code. Also,
> it is not always useful to use a variable to store this constant
> calculated at compile time.
>

> +       {&gainctrl_lut_core0_rev0, ARRAY_SIZE(gainctrl_lut_core0_rev0), 26, 192,
> +        32},

For all such cases I would rather put on one line disregard checkpatch
warning for better readability.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-02 13:36 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKML, netdev, linux-arm-kernel, syzkaller, David S. Miller,
	Willem de Bruijn
In-Reply-To: <20171002104947.GE20737@leverpostej>

On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi all,
>
> I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
> on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
> skb_copy_and_csum_bits().
>
> I've uploaded a copy of the splat, my config, and (full) Syzkaller log
> to my kernel.org web space [1]. I haven't had the opportunity to
> reproduce this yet.
>
> This isn't a pure v4.14-rc2, as I have a not-yet-upstream fix [2]
> applied to avoid a userfaultfd bug. However, per the Syzkaller log, the
> userfaultfd syscall wasn't invoked, so I don't believe that should
> matter.
>
> Thanks,
> Mark.
>
> [1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skbuff-bug/
> [2] https://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com
>
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:2626!
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc2-00001-gd7ad33d #115
> Hardware name: linux,dummy-virt (DT)
> task: ffff80003a901a80 task.stack: ffff80003a908000
> PC is at skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> LR is at skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> pc : [<ffff200009e03214>] lr : [<ffff200009e03214>] pstate: 00000145
> sp : ffff80003efd7b50
> x29: ffff80003efd7b50 x28: 000000000000003c
> x27: 00000000000001e8 x26: ffff80003a901a90
> x25: 000000000000003c x24: dfff200000000000
> x23: ffff800035723a80 x22: 000000000000003c
> x21: 0000000000000000 x20: 0000000000000000
> x19: 0000000000003a6d x18: ffff20000da58140
> x17: 0000000000000000 x16: 0000000000000001
> x15: ffff20000e1485a0 x14: ffff2000082f8980
> x13: ffff200009fc73d0 x12: ffff200009fc707c
> x11: 1ffff00002c2a3fc x10: ffff100002c2a3fc
> x9 : dfff200000000000 x8 : 07030301a8ff1127
> x7 : edff11270a080204 x6 : ffff800016151fe8
> x5 : ffff100002c2a3fd x4 : 000000000000000c
> x3 : 0000000000000030 x2 : 1ffff00006ae47a1
> x1 : 01f6cee936b5bc00 x0 : 0000000000000000
> Process swapper/3 (pid: 0, stack limit = 0xffff80003a908000)
> Call trace:
> Exception stack(0xffff80003efd7a10 to 0xffff80003efd7b50)
> 7a00:                                   0000000000000000 01f6cee936b5bc00
> 7a20: 1ffff00006ae47a1 0000000000000030 000000000000000c ffff100002c2a3fd
> 7a40: ffff800016151fe8 edff11270a080204 07030301a8ff1127 dfff200000000000
> 7a60: ffff100002c2a3fc 1ffff00002c2a3fc ffff200009fc707c ffff200009fc73d0
> 7a80: ffff2000082f8980 ffff20000e1485a0 0000000000000001 0000000000000000
> 7aa0: ffff20000da58140 0000000000003a6d 0000000000000000 0000000000000000
> 7ac0: 000000000000003c ffff800035723a80 dfff200000000000 000000000000003c
> 7ae0: ffff80003a901a90 00000000000001e8 000000000000003c ffff80003efd7b50
> 7b00: ffff200009e03214 ffff80003efd7b50 ffff200009e03214 0000000000000145
> 7b20: 0000000000003a6d 0000000000000000 0001000000000000 000000000000003c
> 7b40: ffff80003efd7b50 ffff200009e03214
> [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
> [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
> [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
> [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
> [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
> [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
> [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
> [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
> [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
> [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
> [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
> [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
> [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
> [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
> [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
> [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
> [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
> [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
> [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
> [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
> [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
> [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
> [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
> [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
> [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
> [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
> [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
> [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
> [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
> Exception stack(0xffff80003a90bb70 to 0xffff80003a90bcb0)
> bb60:                                   ffff80003a90234c 0000000000000007
> bb80: 0000000000000000 1ffff00007520469 1fffe400017ad00c dfff200000000000
> bba0: dfff200000000000 0000000000000000 ffff80003a902350 1ffff00007520469
> bbc0: ffff80003a902348 ffff80003a902368 1ffff0000752046c 1ffff0000752046e
> bbe0: 1ffff0000752046d ffff20000e1485a0 0000000000000000 0000000000000001
> bc00: ffff20000da58140 ffff80003efd9800 ffff80003efd9800 ffff20000ae60000
> bc20: ffff80003a971a80 1ffff000075217aa 0000000000000000 ffff20000ae60000
> bc40: 0000000000000001 ffff20000a34fce0 0000dffff519f438 ffff80003a90bcb0
> bc60: ffff20000a36134c ffff80003a90bcb0 ffff20000a361350 0000000010000145
> bc80: ffff80003efd9800 ffff80003efd9800 ffffffffffffffff ffff80003efd9800
> bca0: ffff80003a90bcb0 ffff20000a361350
> [<ffff200008084034>] el1_irq+0xb4/0x12c arch/arm64/kernel/entry.S:569
> [<ffff20000a361350>] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:40 [inline]
> [<ffff20000a361350>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
> [<ffff20000a361350>] _raw_spin_unlock_irq+0x30/0x100 kernel/locking/spinlock.c:199
> [<ffff2000081e0850>] finish_lock_switch kernel/sched/sched.h:1335 [inline]
> [<ffff2000081e0850>] finish_task_switch+0x1d8/0x950 kernel/sched/core.c:2657
> [<ffff20000a34fce0>] context_switch kernel/sched/core.c:2793 [inline]
> [<ffff20000a34fce0>] __schedule+0x518/0x17b0 kernel/sched/core.c:3366
> [<ffff20000a3520e8>] schedule_idle+0x58/0xc8 kernel/sched/core.c:3452
> [<ffff200008254a00>] do_idle+0x1d8/0x370 kernel/sched/idle.c:269
> [<ffff200008255138>] cpu_startup_entry+0x20/0x28 kernel/sched/idle.c:351
> [<ffff2000080a2f4c>] secondary_start_kernel+0x2fc/0x498 arch/arm64/kernel/smp.c:280
> Code: 97bcbfac 17fffe19 d503201f 97974258 (d4210000)
> ---[ end trace 3359b414c3a12466 ]---

This is most likely a bug caused by syzkaller setting a ridiculous MTU
on loopback device, below minimum size of ipv4 MTU.

I tried to track it in August [1], but it seems hard to find all the
issues with this.

commit c780a049f9bf442314335372c9abc4548bfe3e44
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Aug 16 11:09:12 2017 -0700

    ipv4: better IP_MAX_MTU enforcement

    While working on yet another syzkaller report, I found
    that our IP_MAX_MTU enforcements were not properly done.

    gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
    final result can be bigger than IP_MAX_MTU :/

    This is a problem because device mtu can be changed on other cpus or
    threads.

    While this patch does not fix the issue I am working on, it is
    probably worth addressing it.

^ permalink raw reply

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
From: James Chapman @ 2017-10-02 13:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20171001102110.24184f1b@xeon-e3>

This seems to be a NULL pointer exception caused by tunnel->sock being
NULL at the call to bh_lock_sock() in l2tp_xmit_skb() at
l2tp_core.c:1135.

tunnel->sock is set NULL in l2tp_core's tunnel socket destructor.

At the moment, I don't understand how this happens because
pppol2tp_xmit() does a sock_hold() on the tunnel socket before
l2tp_xmit_skb() is called. I'm still looking at this.

Has this problem only recently started happening?





On 1 October 2017 at 18:21, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
>
> Begin forwarded message:
>
> Date: Sun, 01 Oct 2017 16:22:33 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=197099
>
>             Bug ID: 197099
>            Summary: Kernel panic in interrupt [l2tp_ppp]
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 4.8.13-1.el6.elrepo.x86_64
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: svimik@gmail.com
>         Regression: No
>
> Created attachment 258685
>   --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit
> stacktrace screenshot
>
> Hello!
>
> Getting kernel panics on multiple servers. Since it mentions l2tp_core,
> l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
> I'm wrong).
>
> Unfortunately I'm still struggling with making kdump work, so the trace
> screenshot is all I have at this moment. The only hope is that this stacktrace
> means something to the guys that wrote the code.
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.

^ permalink raw reply

* Re: v4.14-rc2/arm64 misaligned atomic in ip_expire() / skb_clone()
From: Eric Dumazet @ 2017-10-02 13:44 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn, Eric Dumazet
In-Reply-To: <20171002115730.GA21696@leverpostej>

On Mon, 2017-10-02 at 12:57 +0100, Mark Rutland wrote:
> Hi all,
> 
> I'm intermittently hitting splats like below in skb_clone() while
> fuzzing v4.14-rc2 on arm64 with Syzkaller. It looks like the
> atomic_inc() at the end of __skb_clone() is being passed a misaligned
> pointer.
> 
> I've uploaded a number of splats and their associated (full) Syzkaller
> logs, along with my kernel config to my kernel.org webspace [1]. It
> might take a while for that to appear.
> 
> This isn't a pure v4.14-rc2, as I have a not-yet-upstream fix [2]
> applied to avoid a userfaultfd bug. The userfaultfd syscall appears in
> all of the Syzkaller logs, so there is the chance that this is related,
> but as I've not seen any other issues I suspect that's unlikely.
> 
> Thanks,
> Mark.
> 
> [1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skb_clone-misaligned-atomic
> [2] https://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com
> 
> Unable to handle kernel paging request at virtual address ffff80002fd714a2
> Mem abort info:
>   Exception class = DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> Data abort info:
>   ISV = 0, ISS = 0x00000033
>   CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff20000eeb2000
> [ffff80002fd714a2] *pgd=000000007eff7003, *pud=000000007eff6003, *pmd=00f800006fc00711
> Internal error: Oops: 96000021 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc2-00001-gd7ad33d #115
> Hardware name: linux,dummy-virt (DT)
> task: ffff80003a901a80 task.stack: ffff80003a908000
> PC is at __ll_sc_atomic_add+0x4/0x18 arch/arm64/include/asm/atomic_ll_sc.h:113
> LR is at atomic_add arch/arm64/include/asm/atomic_lse.h:45 [inline]
> LR is at __skb_clone+0x4a8/0x6c0 net/core/skbuff.c:873
> pc : [<ffff20000a30ce44>] lr : [<ffff200009dffb58>] pstate: 10000145
> sp : ffff80003efd86e0
> x29: ffff80003efd86e0 x28: 000060003418b000 
> x27: ffff20000ae55360 x26: ffff8000182c1608 
> x25: ffff80002fd7137e x24: ffff8000182c1610 
> x23: ffff20000ae60000 x22: ffff80001577871c 
> x21: 1ffff00007dfb0e8 x20: ffff8000182c1540 
> x19: ffff800015778640 x18: ffff20000da58140 
> x17: 0000000000000000 x16: 0000000000000002 
> x15: ffff20000e1485a0 x14: ffff2000082f912c 
> x13: ffff2000082f8dcc x12: ffff2000082f8980 
> x11: 1ffff00002aef0df x10: ffff100002aef0df 
> x9 : dfff200000000000 x8 : 0082009000a40008 
> x7 : 0000000000000000 x6 : ffff800015778700 
> x5 : ffff100002aef0e0 x4 : 0000000000000000 
> x3 : 1ffff00002aef0e3 x2 : ffff80002fd7147e 
> x1 : ffff80002fd714a2 x0 : 0000000000000001 
> Process swapper/3 (pid: 0, stack limit = 0xffff80003a908000)
> Call trace:
> Exception stack(0xffff80003efd85a0 to 0xffff80003efd86e0)
> 85a0: 0000000000000001 ffff80002fd714a2 ffff80002fd7147e 1ffff00002aef0e3
> 85c0: 0000000000000000 ffff100002aef0e0 ffff800015778700 0000000000000000
> 85e0: 0082009000a40008 dfff200000000000 ffff100002aef0df 1ffff00002aef0df
> 8600: ffff2000082f8980 ffff2000082f8dcc ffff2000082f912c ffff20000e1485a0
> 8620: 0000000000000002 0000000000000000 ffff20000da58140 ffff800015778640
> 8640: ffff8000182c1540 1ffff00007dfb0e8 ffff80001577871c ffff20000ae60000
> 8660: ffff8000182c1610 ffff80002fd7137e ffff8000182c1608 ffff20000ae55360
> 8680: 000060003418b000 ffff80003efd86e0 ffff200009dffb58 ffff80003efd86e0
> 86a0: ffff20000a30ce44 0000000010000145 ffff800015778640 ffff8000182c1540
> 86c0: 0001000000000000 ffff8000182c15ce ffff80003efd86e0 ffff20000a30ce44
> [<ffff20000a30ce44>] __ll_sc_atomic_add+0x4/0x18 arch/arm64/include/asm/atomic_ll_sc.h:113
> [<ffff200009e1009c>] skb_clone+0x1c4/0x3b0 net/core/skbuff.c:1286
> [<ffff200009f2ff80>] ip_expire+0x4e8/0x7c0 net/ipv4/ip_fragment.c:239
> [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
> [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
> [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
> [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
> [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
> [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
> [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
> [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
> [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
> [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
> [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
> Exception stack(0xffff80003a90bd70 to 0xffff80003a90beb0)
> bd60:                                   ffff80003a90234c 0000000000000007
> bd80: 0000000000000000 1ffff00007520469 1fffe400017ad00c ffffffffffffe540
> bda0: 0000000000000000 0000000000000000 ffff80003a902350 1ffff00007520469
> bdc0: ffff80003a902348 ffff80003a902368 1ffff0000752046c 1ffff0000752046e
> bde0: 1ffff0000752046d ffff20000e1485a0 0000000000000000 0000000000029d44
> be00: ffff20000da58140 ffff80003a901a80 ffff80003a901a80 dfff200000000000
> be20: ffff20000ae60e98 ffff0400015cc1d3 0000000000000000 ffff20000ae60df8
> be40: ffff20000ae60df8 0000000000000000 0000000000000000 ffff80003a90beb0
> be60: ffff200008089b50 ffff80003a90beb0 ffff200008089b54 0000000010000145
> be80: ffff80003a901a80 ffff80003a901a80 ffffffffffffffff 01f6cee936b5bc00
> bea0: ffff80003a90beb0 ffff200008089b54
> [<ffff200008084034>] el1_irq+0xb4/0x12c arch/arm64/kernel/entry.S:569
> [<ffff200008089b54>] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:40 [inline]
> [<ffff200008089b54>] arch_cpu_idle+0x1c/0x28 arch/arm64/kernel/process.c:87
> [<ffff20000a360a94>] default_idle_call+0x34/0x78 kernel/sched/idle.c:98
> [<ffff200008254a34>] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
> [<ffff200008254a34>] do_idle+0x20c/0x370 kernel/sched/idle.c:246
> [<ffff20000825513c>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:351
> [<ffff2000080a2f4c>] secondary_start_kernel+0x2fc/0x498 arch/arm64/kernel/smp.c:280
> Code: 978b7cfd 17ffff91 00000000 f9800031 (885f7c31) 
> ---[ end trace e4e9a51ab15d3a5f ]---
> 

skb->head is allocated by a kmalloc() call or similar.

This would happen if skb->end is mangled to not be a multiple of
NET_SKB_PAD  (or at least 4 in your case)

^ permalink raw reply

* Re: [PATCH 05/18] net: use ARRAY_SIZE
From: Kalle Valo @ 2017-10-02 13:46 UTC (permalink / raw)
  To: Jérémy Lefaure
  Cc: Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna, Somnath Kotur,
	Jeff Kirsher, Arend van Spriel, Franky Lin, Hante Meuleman,
	Chi-Hsien Lin, Wright Feng, Larry Finger, Chaoming Li,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel, intel-wired-lan, linux-usb
In-Reply-To: <20171001193101.8898-6-jeremy.lefaure@lse.epita.fr>

Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> writes:

> Using the ARRAY_SIZE macro improves the readability of the code. Also,
> it is not always useful to use a variable to store this constant
> calculated at compile time.
>
> Found with Coccinelle with the following semantic patch:
> @r depends on (org || report)@
> type T;
> T[] E;
> position p;
> @@
> (
>  (sizeof(E)@p /sizeof(*E))
> |
>  (sizeof(E)@p /sizeof(E[...]))
> |
>  (sizeof(E)@p /sizeof(T))
> )
>
> Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
> ---
>  drivers/net/ethernet/emulex/benet/be_cmds.c        |   4 +-
>  drivers/net/ethernet/intel/i40e/i40e_adminq.h      |   3 +-
>  drivers/net/ethernet/intel/i40evf/i40e_adminq.h    |   3 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c      |   3 +-
>  drivers/net/ethernet/intel/ixgbevf/vf.c            |  17 +-
>  drivers/net/usb/kalmia.c                           |   9 +-
>  .../broadcom/brcm80211/brcmsmac/phy/phytbl_n.c     | 473 ++++++---------------
>  .../net/wireless/realtek/rtlwifi/rtl8723be/hw.c    |   9 +-
>  .../net/wireless/realtek/rtlwifi/rtl8723be/phy.c   |  12 +-
>  .../net/wireless/realtek/rtlwifi/rtl8723be/table.c |  14 +-
>  .../net/wireless/realtek/rtlwifi/rtl8821ae/table.c |  34 +-
>  include/net/bond_3ad.h                             |   3 +-
>  net/ipv6/seg6_local.c                              |   6 +-
>  13 files changed, 177 insertions(+), 413 deletions(-)

We have a tree for wireless so usually it's better to submit wireless
changes on their own but here I assume Dave will apply this to his tree.
If not, please resubmit the wireless part in a separate patch.

-- 
Kalle Valo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox