Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net v2] sctp: Fix a big endian bug in sctp_diag_dump()
From: Dan Carpenter @ 2017-09-25 10:19 UTC (permalink / raw)
  To: Vlad Yasevich, Xin Long
  Cc: Neil Horman, David S. Miller, linux-sctp, netdev, kernel-janitors
In-Reply-To: <CADvbK_dCT6Z6JwD+VtjNg6UUkQneU2OLeme9JVqrXjnJJ63cGg@mail.gmail.com>

The sctp_for_each_transport() function takes an pointer to int.  The
cb->args[] array holds longs so it's only using the high 32 bits.  It
works on little endian system but will break on big endian 64 bit
machines.

Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
v2: The v1 patch changed the function to take a long pointer, but v2
    just changes the caller.

diff --git a/net/sctp/sctp_diag.c b/net/sctp/sctp_diag.c
index 22ed01a76b19..a72a7d925d46 100644
--- a/net/sctp/sctp_diag.c
+++ b/net/sctp/sctp_diag.c
@@ -463,6 +463,7 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 		.r = r,
 		.net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN),
 	};
+	int pos = cb->args[2];

 	/* eps hashtable dumps
 	 * args:
@@ -493,7 +494,8 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 		goto done;

 	sctp_for_each_transport(sctp_sock_filter, sctp_sock_dump,
-				net, (int *)&cb->args[2], &commp);
+				net, &pos, &commp);
+	cb->args[2] = pos;

 done:
 	cb->args[1] = cb->args[4];

^ permalink raw reply related

* [PATCH net-next 0/7] nfp: flower vxlan tunnel offload
From: Simon Horman @ 2017-09-25 10:23 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, oss-drivers, Simon Horman

From: Simon Horman <simon.horman@netronome.com>

John says:

This patch set allows offloading of TC flower match and set tunnel fields
to the NFP. The initial focus is on VXLAN traffic. Due to the current
state of the NFP firmware, only VXLAN traffic on well known port 4789 is
handled. The match and action fields must explicity set this value to be
supported. Tunnel end point information is also offloaded to the NFP for
both encapsulation and decapsulation. The NFP expects 3 separate data sets
to be supplied.

For decapsulation, 2 separate lists exist; a list of MAC addresses
referenced by an index comprised of the port number, and a list of IP
addresses. These IP addresses are not connected to a MAC or port. The MAC
addresses can be written as a block or one at a time (because they have an
index, previous values can be overwritten) while the IP addresses are
always written as a list of all the available IPs. Because the MAC address
used as a tunnel end point may be associated with a physical port or may
be a virtual netdev like an OVS bridge, we do not know which addresses
should be offloaded. For this reason, all MAC addresses of active netdevs
are offloaded to the NFP. A notifier checks for changes to any currently
offloaded MACs or any new netdevs that may occur. For IP addresses, the
tunnel end point used in the rules is known as the destination IP address
must be specified in the flower classifier rule. When a new IP address
appears in a rule, the IP address is offloaded. The IP is removed from the
offloaded list when all rules matching on that IP are deleted.

For encapsulation, a next hop table is updated on the NFP that contains
the source/dest IPs, MACs and egress port. These are written individually
when requested. If the NFP tries to encapsulate a packet but does not know
the next hop, then is sends a request to the host. The host carries out a
route lookup and populates the given entry on the NFP table. A notifier
also exists to check for any links changing or going down in the kernel
next hop table. If an offloaded next hop entry is removed from the kernel
then it is also removed on the NFP.

The NFP periodically sends a message to the host telling it which tunnel
ports have packets egressing the system. The host uses this information to
update the used value in the neighbour entry. This means that, rather than
expire when it times out, the kernel will send an ARP to check if the link
is still live. From an NFP perspective, this means that valid entries will
not be removed from its next hop table.

John Hurley (7):
  nfp: add helper to get flower cmsg length
  nfp: compile flower vxlan tunnel metadata match fields
  nfp: compile flower vxlan tunnel set actions
  nfp: offload flower vxlan endpoint MAC addresses
  nfp: offload vxlan IPv4 endpoints of flower rules
  nfp: flower vxlan neighbour offload
  nfp: flower vxlan neighbour keep-alive

 drivers/net/ethernet/netronome/nfp/Makefile        |   3 +-
 drivers/net/ethernet/netronome/nfp/flower/action.c | 169 ++++-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |  16 +-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  87 ++-
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  13 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  35 +
 drivers/net/ethernet/netronome/nfp/flower/match.c  |  75 +-
 .../net/ethernet/netronome/nfp/flower/metadata.c   |   2 +-
 .../net/ethernet/netronome/nfp/flower/offload.c    |  74 +-
 .../ethernet/netronome/nfp/flower/tunnel_conf.c    | 811 +++++++++++++++++++++
 10 files changed, 1243 insertions(+), 42 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c

-- 
2.1.4

^ permalink raw reply

* [PATCH net-next 1/7] nfp: add helper to get flower cmsg length
From: Simon Horman @ 2017-09-25 10:23 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: netdev, oss-drivers, John Hurley, Simon Horman
In-Reply-To: <1506335021-32024-1-git-send-email-simon.horman@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Add a helper function that returns the length of the cmsg data when given
the cmsg skb

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h     | 5 +++++
 drivers/net/ethernet/netronome/nfp/flower/metadata.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index a2ec60344236..7a5ccf0cc7c2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -323,6 +323,11 @@ static inline void *nfp_flower_cmsg_get_data(struct sk_buff *skb)
 	return (unsigned char *)skb->data + NFP_FLOWER_CMSG_HLEN;
 }
 
+static inline int nfp_flower_cmsg_get_data_len(struct sk_buff *skb)
+{
+	return skb->len - NFP_FLOWER_CMSG_HLEN;
+}
+
 struct sk_buff *
 nfp_flower_cmsg_mac_repr_start(struct nfp_app *app, unsigned int num_ports);
 void
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 3226ddc55f99..193520ef23f0 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -140,7 +140,7 @@ nfp_flower_update_stats(struct nfp_app *app, struct nfp_fl_stats_frame *stats)
 
 void nfp_flower_rx_flow_stats(struct nfp_app *app, struct sk_buff *skb)
 {
-	unsigned int msg_len = skb->len - NFP_FLOWER_CMSG_HLEN;
+	unsigned int msg_len = nfp_flower_cmsg_get_data_len(skb);
 	struct nfp_fl_stats_frame *stats_frame;
 	unsigned char *msg;
 	int i;
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields
From: Simon Horman @ 2017-09-25 10:23 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: netdev, oss-drivers, John Hurley, Simon Horman
In-Reply-To: <1506335021-32024-1-git-send-email-simon.horman@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Compile ovs-tc flower vxlan metadata match fields for offloading. Only
support offload of tunnel data when the VXLAN port specifically matches
well known port 4789.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   | 38 ++++++++++++
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  2 +
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 60 +++++++++++++++++--
 .../net/ethernet/netronome/nfp/flower/offload.c    | 70 +++++++++++++++++++---
 4 files changed, 158 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 7a5ccf0cc7c2..af9165b3b652 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -83,6 +83,14 @@
 #define NFP_FL_PUSH_VLAN_CFI		BIT(12)
 #define NFP_FL_PUSH_VLAN_VID		GENMASK(11, 0)
 
+/* Tunnel ports */
+#define NFP_FL_PORT_TYPE_TUN		0x50000000
+
+enum nfp_flower_tun_type {
+	NFP_FL_TUNNEL_NONE =	0,
+	NFP_FL_TUNNEL_VXLAN =	2,
+};
+
 struct nfp_fl_output {
 	__be16 a_op;
 	__be16 flags;
@@ -230,6 +238,36 @@ struct nfp_flower_ipv6 {
 	struct in6_addr ipv6_dst;
 };
 
+/* Flow Frame VXLAN --> Tunnel details (4W/16B)
+ * -----------------------------------------------------------------
+ *    3                   2                   1
+ *  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                         ipv4_addr_src                         |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                         ipv4_addr_dst                         |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |           tun_flags           |       tos     |       ttl     |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |   gpe_flags   |            Reserved           | Next Protocol |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                     VNI                       |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+struct nfp_flower_vxlan {
+	__be32 ip_src;
+	__be32 ip_dst;
+	__be16 tun_flags;
+	u8 tos;
+	u8 ttl;
+	u8 gpe_flags;
+	u8 reserved[2];
+	u8 nxt_proto;
+	__be32 tun_id;
+};
+
+#define NFP_FL_TUN_VNI_OFFSET 8
+
 /* The base header for a control message packet.
  * Defines an 8-bit version, and an 8-bit type, padded
  * to a 32-bit word. Rest of the packet is type-specific.
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h
index c20dd00a1cae..cd695eabce02 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -58,6 +58,8 @@ struct nfp_app;
 #define NFP_FL_MASK_REUSE_TIME_NS	40000
 #define NFP_FL_MASK_ID_LOCATION		1
 
+#define NFP_FL_VXLAN_PORT		4789
+
 struct nfp_fl_mask_id {
 	struct circ_buf mask_id_free_list;
 	struct timespec64 *last_used;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index d25b5038c3a2..1fd1bab0611f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -77,14 +77,17 @@ nfp_flower_compile_meta(struct nfp_flower_meta_one *frame, u8 key_type)
 
 static int
 nfp_flower_compile_port(struct nfp_flower_in_port *frame, u32 cmsg_port,
-			bool mask_version)
+			bool mask_version, enum nfp_flower_tun_type tun_type)
 {
 	if (mask_version) {
 		frame->in_port = cpu_to_be32(~0);
 		return 0;
 	}
 
-	frame->in_port = cpu_to_be32(cmsg_port);
+	if (tun_type)
+		frame->in_port = cpu_to_be32(NFP_FL_PORT_TYPE_TUN | tun_type);
+	else
+		frame->in_port = cpu_to_be32(cmsg_port);
 
 	return 0;
 }
@@ -189,15 +192,53 @@ nfp_flower_compile_ipv6(struct nfp_flower_ipv6 *frame,
 	}
 }
 
+static void
+nfp_flower_compile_vxlan(struct nfp_flower_vxlan *frame,
+			 struct tc_cls_flower_offload *flow,
+			 bool mask_version)
+{
+	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
+	struct flow_dissector_key_ipv4_addrs *vxlan_ips;
+	struct flow_dissector_key_keyid *vni;
+
+	/* Wildcard TOS/TTL/GPE_FLAGS/NXT_PROTO for now. */
+	memset(frame, 0, sizeof(struct nfp_flower_vxlan));
+
+	if (dissector_uses_key(flow->dissector,
+			       FLOW_DISSECTOR_KEY_ENC_KEYID)) {
+		u32 temp_vni;
+
+		vni = skb_flow_dissector_target(flow->dissector,
+						FLOW_DISSECTOR_KEY_ENC_KEYID,
+						target);
+		temp_vni = be32_to_cpu(vni->keyid) << NFP_FL_TUN_VNI_OFFSET;
+		frame->tun_id = cpu_to_be32(temp_vni);
+	}
+
+	if (dissector_uses_key(flow->dissector,
+			       FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS)) {
+		vxlan_ips =
+		   skb_flow_dissector_target(flow->dissector,
+					     FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS,
+					     target);
+		frame->ip_src = vxlan_ips->src;
+		frame->ip_dst = vxlan_ips->dst;
+	}
+}
+
 int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
 				  struct nfp_fl_key_ls *key_ls,
 				  struct net_device *netdev,
 				  struct nfp_fl_payload *nfp_flow)
 {
+	enum nfp_flower_tun_type tun_type = NFP_FL_TUNNEL_NONE;
 	int err;
 	u8 *ext;
 	u8 *msk;
 
+	if (key_ls->key_layer & NFP_FLOWER_LAYER_VXLAN)
+		tun_type = NFP_FL_TUNNEL_VXLAN;
+
 	memset(nfp_flow->unmasked_data, 0, key_ls->key_size);
 	memset(nfp_flow->mask_data, 0, key_ls->key_size);
 
@@ -216,14 +257,14 @@ int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
 		/* Populate Exact Port data. */
 		err = nfp_flower_compile_port((struct nfp_flower_in_port *)ext,
 					      nfp_repr_get_port_id(netdev),
-					      false);
+					      false, tun_type);
 		if (err)
 			return err;
 
 		/* Populate Mask Port Data. */
 		err = nfp_flower_compile_port((struct nfp_flower_in_port *)msk,
 					      nfp_repr_get_port_id(netdev),
-					      true);
+					      true, tun_type);
 		if (err)
 			return err;
 
@@ -291,5 +332,16 @@ int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
 		msk += sizeof(struct nfp_flower_ipv6);
 	}
 
+	if (key_ls->key_layer & NFP_FLOWER_LAYER_VXLAN) {
+		/* Populate Exact VXLAN Data. */
+		nfp_flower_compile_vxlan((struct nfp_flower_vxlan *)ext,
+					 flow, false);
+		/* Populate Mask VXLAN Data. */
+		nfp_flower_compile_vxlan((struct nfp_flower_vxlan *)msk,
+					 flow, true);
+		ext += sizeof(struct nfp_flower_vxlan);
+		msk += sizeof(struct nfp_flower_vxlan);
+	}
+
 	return 0;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index a18b4d2b1d3e..637372ba8f55 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -52,8 +52,25 @@
 	 BIT(FLOW_DISSECTOR_KEY_PORTS) | \
 	 BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
 	 BIT(FLOW_DISSECTOR_KEY_VLAN) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_PORTS) | \
 	 BIT(FLOW_DISSECTOR_KEY_IP))
 
+#define NFP_FLOWER_WHITELIST_TUN_DISSECTOR \
+	(BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_PORTS))
+
+#define NFP_FLOWER_WHITELIST_TUN_DISSECTOR_R \
+	(BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
+	 BIT(FLOW_DISSECTOR_KEY_ENC_PORTS))
+
 static int
 nfp_flower_xmit_flow(struct net_device *netdev,
 		     struct nfp_fl_payload *nfp_flow, u8 mtype)
@@ -125,15 +142,58 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 	if (flow->dissector->used_keys & ~NFP_FLOWER_WHITELIST_DISSECTOR)
 		return -EOPNOTSUPP;
 
+	/* If any tun dissector is used then the required set must be used. */
+	if (flow->dissector->used_keys & NFP_FLOWER_WHITELIST_TUN_DISSECTOR &&
+	    (flow->dissector->used_keys & NFP_FLOWER_WHITELIST_TUN_DISSECTOR_R)
+	    != NFP_FLOWER_WHITELIST_TUN_DISSECTOR_R)
+		return -EOPNOTSUPP;
+
+	key_layer_two = 0;
+	key_layer = NFP_FLOWER_LAYER_PORT | NFP_FLOWER_LAYER_MAC;
+	key_size = sizeof(struct nfp_flower_meta_one) +
+		   sizeof(struct nfp_flower_in_port) +
+		   sizeof(struct nfp_flower_mac_mpls);
+
 	if (dissector_uses_key(flow->dissector,
 			       FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
+		struct flow_dissector_key_ipv4_addrs *mask_ipv4 = NULL;
+		struct flow_dissector_key_ports *mask_enc_ports = NULL;
+		struct flow_dissector_key_ports *enc_ports = NULL;
 		struct flow_dissector_key_control *mask_enc_ctl =
 			skb_flow_dissector_target(flow->dissector,
 						  FLOW_DISSECTOR_KEY_ENC_CONTROL,
 						  flow->mask);
-		/* We are expecting a tunnel. For now we ignore offloading. */
-		if (mask_enc_ctl->addr_type)
+		struct flow_dissector_key_control *enc_ctl =
+			skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_ENC_CONTROL,
+						  flow->key);
+		if (mask_enc_ctl->addr_type != 0xffff ||
+		    enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
 			return -EOPNOTSUPP;
+
+		/* These fields are already verified as used. */
+		mask_ipv4 =
+			skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS,
+						  flow->mask);
+		if (mask_ipv4->dst != cpu_to_be32(~0))
+			return -EOPNOTSUPP;
+
+		mask_enc_ports =
+			skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_ENC_PORTS,
+						  flow->mask);
+		enc_ports =
+			skb_flow_dissector_target(flow->dissector,
+						  FLOW_DISSECTOR_KEY_ENC_PORTS,
+						  flow->key);
+
+		if (mask_enc_ports->dst != cpu_to_be16(~0) ||
+		    enc_ports->dst != htons(NFP_FL_VXLAN_PORT))
+			return -EOPNOTSUPP;
+
+		key_layer |= NFP_FLOWER_LAYER_VXLAN;
+		key_size += sizeof(struct nfp_flower_vxlan);
 	}
 
 	if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
@@ -151,12 +211,6 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
 						    FLOW_DISSECTOR_KEY_IP,
 						    flow->mask);
 
-	key_layer_two = 0;
-	key_layer = NFP_FLOWER_LAYER_PORT | NFP_FLOWER_LAYER_MAC;
-	key_size = sizeof(struct nfp_flower_meta_one) +
-		   sizeof(struct nfp_flower_in_port) +
-		   sizeof(struct nfp_flower_mac_mpls);
-
 	if (mask_basic && mask_basic->n_proto) {
 		/* Ethernet type is present in the key. */
 		switch (key_basic->n_proto) {
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next 3/7] nfp: compile flower vxlan tunnel set actions
From: Simon Horman @ 2017-09-25 10:23 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: netdev, oss-drivers, John Hurley, Simon Horman
In-Reply-To: <1506335021-32024-1-git-send-email-simon.horman@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Compile set tunnel actions for tc flower. Only support VXLAN and ensure a
tunnel destination port of 4789 is used.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 169 ++++++++++++++++++---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  31 +++-
 2 files changed, 179 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index db9750695dc7..38f3835ae176 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -37,6 +37,7 @@
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
 #include <net/tc_act/tc_vlan.h>
+#include <net/tc_act/tc_tunnel_key.h>
 
 #include "cmsg.h"
 #include "main.h"
@@ -80,14 +81,27 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 	push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
 
+static bool nfp_fl_netdev_is_tunnel_type(struct net_device *out_dev,
+					 enum nfp_flower_tun_type tun_type)
+{
+	if (!out_dev->rtnl_link_ops)
+		return false;
+
+	if (!strcmp(out_dev->rtnl_link_ops->kind, "vxlan"))
+		return tun_type == NFP_FL_TUNNEL_VXLAN;
+
+	return false;
+}
+
 static int
 nfp_fl_output(struct nfp_fl_output *output, const struct tc_action *action,
 	      struct nfp_fl_payload *nfp_flow, bool last,
-	      struct net_device *in_dev)
+	      struct net_device *in_dev, enum nfp_flower_tun_type tun_type,
+	      int *tun_out_cnt)
 {
 	size_t act_size = sizeof(struct nfp_fl_output);
+	u16 tmp_output_op, tmp_flags;
 	struct net_device *out_dev;
-	u16 tmp_output_op;
 	int ifindex;
 
 	/* Set action opcode to output action. */
@@ -97,25 +111,114 @@ nfp_fl_output(struct nfp_fl_output *output, const struct tc_action *action,
 
 	output->a_op = cpu_to_be16(tmp_output_op);
 
-	/* Set action output parameters. */
-	output->flags = cpu_to_be16(last ? NFP_FL_OUT_FLAGS_LAST : 0);
-
 	ifindex = tcf_mirred_ifindex(action);
 	out_dev = __dev_get_by_index(dev_net(in_dev), ifindex);
 	if (!out_dev)
 		return -EOPNOTSUPP;
 
-	/* Only offload egress ports are on the same device as the ingress
-	 * port.
+	tmp_flags = last ? NFP_FL_OUT_FLAGS_LAST : 0;
+
+	if (tun_type) {
+		/* Verify the egress netdev matches the tunnel type. */
+		if (!nfp_fl_netdev_is_tunnel_type(out_dev, tun_type))
+			return -EOPNOTSUPP;
+
+		if (*tun_out_cnt)
+			return -EOPNOTSUPP;
+		(*tun_out_cnt)++;
+
+		output->flags = cpu_to_be16(tmp_flags |
+					    NFP_FL_OUT_FLAGS_USE_TUN);
+		output->port = cpu_to_be32(NFP_FL_PORT_TYPE_TUN | tun_type);
+	} else {
+		/* Set action output parameters. */
+		output->flags = cpu_to_be16(tmp_flags);
+
+		/* Only offload if egress ports are on the same device as the
+		 * ingress port.
+		 */
+		if (!switchdev_port_same_parent_id(in_dev, out_dev))
+			return -EOPNOTSUPP;
+
+		output->port = cpu_to_be32(nfp_repr_get_port_id(out_dev));
+		if (!output->port)
+			return -EOPNOTSUPP;
+	}
+	nfp_flow->meta.shortcut = output->port;
+
+	return 0;
+}
+
+static bool nfp_fl_supported_tun_port(const struct tc_action *action)
+{
+	struct ip_tunnel_info *tun = tcf_tunnel_info(action);
+
+	return tun->key.tp_dst == htons(NFP_FL_VXLAN_PORT);
+}
+
+static struct nfp_fl_pre_tunnel *nfp_fl_pre_tunnel(char *act_data, int act_len)
+{
+	size_t act_size = sizeof(struct nfp_fl_pre_tunnel);
+	struct nfp_fl_pre_tunnel *pre_tun_act;
+	u16 tmp_pre_tun_op;
+
+	/* Pre_tunnel action must be first on action list.
+	 * If other actions already exist they need pushed forward.
 	 */
-	if (!switchdev_port_same_parent_id(in_dev, out_dev))
-		return -EOPNOTSUPP;
+	if (act_len)
+		memmove(act_data + act_size, act_data, act_len);
+
+	pre_tun_act = (struct nfp_fl_pre_tunnel *)act_data;
+
+	memset(pre_tun_act, 0, act_size);
+
+	tmp_pre_tun_op =
+		FIELD_PREP(NFP_FL_ACT_LEN_LW, act_size >> NFP_FL_LW_SIZ) |
+		FIELD_PREP(NFP_FL_ACT_JMP_ID, NFP_FL_ACTION_OPCODE_PRE_TUNNEL);
+
+	pre_tun_act->a_op = cpu_to_be16(tmp_pre_tun_op);
 
-	output->port = cpu_to_be32(nfp_repr_get_port_id(out_dev));
-	if (!output->port)
+	return pre_tun_act;
+}
+
+static int
+nfp_fl_set_vxlan(struct nfp_fl_set_vxlan *set_vxlan,
+		 const struct tc_action *action,
+		 struct nfp_fl_pre_tunnel *pre_tun)
+{
+	struct ip_tunnel_info *vxlan = tcf_tunnel_info(action);
+	size_t act_size = sizeof(struct nfp_fl_set_vxlan);
+	u32 tmp_set_vxlan_type_index = 0;
+	u16 tmp_set_vxlan_op;
+	/* Currently support one pre-tunnel so index is always 0. */
+	int pretun_idx = 0;
+
+	if (vxlan->options_len) {
+		/* Do not support options e.g. vxlan gpe. */
 		return -EOPNOTSUPP;
+	}
 
-	nfp_flow->meta.shortcut = output->port;
+	tmp_set_vxlan_op =
+		FIELD_PREP(NFP_FL_ACT_LEN_LW, act_size >> NFP_FL_LW_SIZ) |
+		FIELD_PREP(NFP_FL_ACT_JMP_ID,
+			   NFP_FL_ACTION_OPCODE_SET_IPV4_TUNNEL);
+
+	set_vxlan->a_op = cpu_to_be16(tmp_set_vxlan_op);
+
+	/* Set tunnel type and pre-tunnel index. */
+	tmp_set_vxlan_type_index |=
+		FIELD_PREP(NFP_FL_IPV4_TUNNEL_TYPE, NFP_FL_TUNNEL_VXLAN) |
+		FIELD_PREP(NFP_FL_IPV4_PRE_TUN_INDEX, pretun_idx);
+
+	set_vxlan->tun_type_index = cpu_to_be32(tmp_set_vxlan_type_index);
+
+	set_vxlan->tun_id = vxlan->key.tun_id;
+	set_vxlan->tun_flags = vxlan->key.tun_flags;
+	set_vxlan->ipv4_ttl = vxlan->key.ttl;
+	set_vxlan->ipv4_tos = vxlan->key.tos;
+
+	/* Complete pre_tunnel action. */
+	pre_tun->ipv4_dst = vxlan->key.u.ipv4.dst;
 
 	return 0;
 }
@@ -123,8 +226,11 @@ nfp_fl_output(struct nfp_fl_output *output, const struct tc_action *action,
 static int
 nfp_flower_loop_action(const struct tc_action *a,
 		       struct nfp_fl_payload *nfp_fl, int *a_len,
-		       struct net_device *netdev)
+		       struct net_device *netdev,
+		       enum nfp_flower_tun_type *tun_type, int *tun_out_cnt)
 {
+	struct nfp_fl_pre_tunnel *pre_tun;
+	struct nfp_fl_set_vxlan *s_vxl;
 	struct nfp_fl_push_vlan *psh_v;
 	struct nfp_fl_pop_vlan *pop_v;
 	struct nfp_fl_output *output;
@@ -137,7 +243,8 @@ nfp_flower_loop_action(const struct tc_action *a,
 			return -EOPNOTSUPP;
 
 		output = (struct nfp_fl_output *)&nfp_fl->action_data[*a_len];
-		err = nfp_fl_output(output, a, nfp_fl, true, netdev);
+		err = nfp_fl_output(output, a, nfp_fl, true, netdev, *tun_type,
+				    tun_out_cnt);
 		if (err)
 			return err;
 
@@ -147,7 +254,8 @@ nfp_flower_loop_action(const struct tc_action *a,
 			return -EOPNOTSUPP;
 
 		output = (struct nfp_fl_output *)&nfp_fl->action_data[*a_len];
-		err = nfp_fl_output(output, a, nfp_fl, false, netdev);
+		err = nfp_fl_output(output, a, nfp_fl, false, netdev, *tun_type,
+				    tun_out_cnt);
 		if (err)
 			return err;
 
@@ -170,6 +278,29 @@ nfp_flower_loop_action(const struct tc_action *a,
 
 		nfp_fl_push_vlan(psh_v, a);
 		*a_len += sizeof(struct nfp_fl_push_vlan);
+	} else if (is_tcf_tunnel_set(a) && nfp_fl_supported_tun_port(a)) {
+		/* Pre-tunnel action is required for tunnel encap.
+		 * This checks for next hop entries on NFP.
+		 * If none, the packet falls back before applying other actions.
+		 */
+		if (*a_len + sizeof(struct nfp_fl_pre_tunnel) +
+		    sizeof(struct nfp_fl_set_vxlan) > NFP_FL_MAX_A_SIZ)
+			return -EOPNOTSUPP;
+
+		*tun_type = NFP_FL_TUNNEL_VXLAN;
+		pre_tun = nfp_fl_pre_tunnel(nfp_fl->action_data, *a_len);
+		nfp_fl->meta.shortcut = cpu_to_be32(NFP_FL_SC_ACT_NULL);
+		*a_len += sizeof(struct nfp_fl_pre_tunnel);
+
+		s_vxl = (struct nfp_fl_set_vxlan *)&nfp_fl->action_data[*a_len];
+		err = nfp_fl_set_vxlan(s_vxl, a, pre_tun);
+		if (err)
+			return err;
+
+		*a_len += sizeof(struct nfp_fl_set_vxlan);
+	} else if (is_tcf_tunnel_release(a)) {
+		/* Tunnel decap is handled by default so accept action. */
+		return 0;
 	} else {
 		/* Currently we do not handle any other actions. */
 		return -EOPNOTSUPP;
@@ -182,18 +313,22 @@ int nfp_flower_compile_action(struct tc_cls_flower_offload *flow,
 			      struct net_device *netdev,
 			      struct nfp_fl_payload *nfp_flow)
 {
-	int act_len, act_cnt, err;
+	int act_len, act_cnt, err, tun_out_cnt;
+	enum nfp_flower_tun_type tun_type;
 	const struct tc_action *a;
 	LIST_HEAD(actions);
 
 	memset(nfp_flow->action_data, 0, NFP_FL_MAX_A_SIZ);
 	nfp_flow->meta.act_len = 0;
+	tun_type = NFP_FL_TUNNEL_NONE;
 	act_len = 0;
 	act_cnt = 0;
+	tun_out_cnt = 0;
 
 	tcf_exts_to_list(flow->exts, &actions);
 	list_for_each_entry(a, &actions, list) {
-		err = nfp_flower_loop_action(a, nfp_flow, &act_len, netdev);
+		err = nfp_flower_loop_action(a, nfp_flow, &act_len, netdev,
+					     &tun_type, &tun_out_cnt);
 		if (err)
 			return err;
 		act_cnt++;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index af9165b3b652..ff42ce8a1e9c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -67,10 +67,12 @@
 #define NFP_FL_LW_SIZ			2
 
 /* Action opcodes */
-#define NFP_FL_ACTION_OPCODE_OUTPUT	0
-#define NFP_FL_ACTION_OPCODE_PUSH_VLAN	1
-#define NFP_FL_ACTION_OPCODE_POP_VLAN	2
-#define NFP_FL_ACTION_OPCODE_NUM	32
+#define NFP_FL_ACTION_OPCODE_OUTPUT		0
+#define NFP_FL_ACTION_OPCODE_PUSH_VLAN		1
+#define NFP_FL_ACTION_OPCODE_POP_VLAN		2
+#define NFP_FL_ACTION_OPCODE_SET_IPV4_TUNNEL	6
+#define NFP_FL_ACTION_OPCODE_PRE_TUNNEL		17
+#define NFP_FL_ACTION_OPCODE_NUM		32
 
 #define NFP_FL_ACT_JMP_ID		GENMASK(15, 8)
 #define NFP_FL_ACT_LEN_LW		GENMASK(7, 0)
@@ -85,6 +87,8 @@
 
 /* Tunnel ports */
 #define NFP_FL_PORT_TYPE_TUN		0x50000000
+#define NFP_FL_IPV4_TUNNEL_TYPE		GENMASK(7, 4)
+#define NFP_FL_IPV4_PRE_TUN_INDEX	GENMASK(2, 0)
 
 enum nfp_flower_tun_type {
 	NFP_FL_TUNNEL_NONE =	0,
@@ -123,6 +127,25 @@ struct nfp_flower_meta_one {
 	u16 reserved;
 };
 
+struct nfp_fl_pre_tunnel {
+	__be16 a_op;
+	__be16 reserved;
+	__be32 ipv4_dst;
+	/* reserved for use with IPv6 addresses */
+	__be32 extra[3];
+};
+
+struct nfp_fl_set_vxlan {
+	__be16 a_op;
+	__be16 reserved;
+	__be64 tun_id;
+	__be32 tun_type_index;
+	__be16 tun_flags;
+	u8 ipv4_ttl;
+	u8 ipv4_tos;
+	__be32 extra[2];
+} __packed;
+
 /* Metadata with L2 (1W/4B)
  * ----------------------------------------------------------------
  *    3                   2                   1
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next 4/7] nfp: offload flower vxlan endpoint MAC addresses
From: Simon Horman @ 2017-09-25 10:23 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: netdev, oss-drivers, John Hurley, Simon Horman
In-Reply-To: <1506335021-32024-1-git-send-email-simon.horman@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Generate a list of MAC addresses of netdevs that could be used as VXLAN
tunnel end points. Give offloaded MACs an index for storage on the NFP in
the ranges:
0x100-0x1ff physical port representors
0x200-0x2ff VF port representors
0x300-0x3ff other offloads (e.g. vxlan netdevs, ovs bridges)

Assign phys and vf indexes based on unique 8 bit values in the port num.
Maintain list of other netdevs to ensure same netdev is not offloaded
twice and each gets a unique ID without exhausting the entries. Because
the IDs are unique but constant for a netdev, any changes are implemented
by overwriting the index on NFP.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/Makefile        |   3 +-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |   7 -
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |   9 +
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  13 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  18 +
 drivers/net/ethernet/netronome/nfp/flower/match.c  |   7 +
 .../ethernet/netronome/nfp/flower/tunnel_conf.c    | 374 +++++++++++++++++++++
 7 files changed, 423 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile
index 96e579a15cbe..becaacf1554d 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -37,7 +37,8 @@ nfp-objs += \
 	    flower/main.o \
 	    flower/match.o \
 	    flower/metadata.o \
-	    flower/offload.o
+	    flower/offload.o \
+	    flower/tunnel_conf.o
 endif
 
 ifeq ($(CONFIG_BPF_SYSCALL),y)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index c3ca05d10fe1..b756006dba6f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -38,17 +38,10 @@
 #include <net/dst_metadata.h>
 
 #include "main.h"
-#include "../nfpcore/nfp_cpp.h"
 #include "../nfp_net.h"
 #include "../nfp_net_repr.h"
 #include "./cmsg.h"
 
-#define nfp_flower_cmsg_warn(app, fmt, args...)				\
-	do {								\
-		if (net_ratelimit())					\
-			nfp_warn((app)->cpp, fmt, ## args);		\
-	} while (0)
-
 static struct nfp_flower_cmsg_hdr *
 nfp_flower_cmsg_get_hdr(struct sk_buff *skb)
 {
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index ff42ce8a1e9c..dc248193c996 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -39,6 +39,7 @@
 #include <linux/types.h>
 
 #include "../nfp_app.h"
+#include "../nfpcore/nfp_cpp.h"
 
 #define NFP_FLOWER_LAYER_META		BIT(0)
 #define NFP_FLOWER_LAYER_PORT		BIT(1)
@@ -90,6 +91,12 @@
 #define NFP_FL_IPV4_TUNNEL_TYPE		GENMASK(7, 4)
 #define NFP_FL_IPV4_PRE_TUN_INDEX	GENMASK(2, 0)
 
+#define nfp_flower_cmsg_warn(app, fmt, args...)                         \
+	do {                                                            \
+		if (net_ratelimit())                                    \
+			nfp_warn((app)->cpp, fmt, ## args);             \
+	} while (0)
+
 enum nfp_flower_tun_type {
 	NFP_FL_TUNNEL_NONE =	0,
 	NFP_FL_TUNNEL_VXLAN =	2,
@@ -310,6 +317,7 @@ enum nfp_flower_cmsg_type_port {
 	NFP_FLOWER_CMSG_TYPE_FLOW_DEL =		2,
 	NFP_FLOWER_CMSG_TYPE_MAC_REPR =		7,
 	NFP_FLOWER_CMSG_TYPE_PORT_MOD =		8,
+	NFP_FLOWER_CMSG_TYPE_TUN_MAC =		11,
 	NFP_FLOWER_CMSG_TYPE_FLOW_STATS =	15,
 	NFP_FLOWER_CMSG_TYPE_PORT_ECHO =	16,
 	NFP_FLOWER_CMSG_TYPE_MAX =		32,
@@ -343,6 +351,7 @@ enum nfp_flower_cmsg_port_type {
 	NFP_FLOWER_CMSG_PORT_TYPE_UNSPEC =	0x0,
 	NFP_FLOWER_CMSG_PORT_TYPE_PHYS_PORT =	0x1,
 	NFP_FLOWER_CMSG_PORT_TYPE_PCIE_PORT =	0x2,
+	NFP_FLOWER_CMSG_PORT_TYPE_OTHER_PORT =  0x3,
 };
 
 enum nfp_flower_cmsg_port_vnic_type {
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 91fe03617106..e46e7c60d491 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -436,6 +436,16 @@ static void nfp_flower_clean(struct nfp_app *app)
 	app->priv = NULL;
 }
 
+static int nfp_flower_start(struct nfp_app *app)
+{
+	return nfp_tunnel_config_start(app);
+}
+
+static void nfp_flower_stop(struct nfp_app *app)
+{
+	nfp_tunnel_config_stop(app);
+}
+
 const struct nfp_app_type app_flower = {
 	.id		= NFP_APP_FLOWER_NIC,
 	.name		= "flower",
@@ -453,6 +463,9 @@ const struct nfp_app_type app_flower = {
 	.repr_open	= nfp_flower_repr_netdev_open,
 	.repr_stop	= nfp_flower_repr_netdev_stop,
 
+	.start		= nfp_flower_start,
+	.stop		= nfp_flower_stop,
+
 	.ctrl_msg_rx	= nfp_flower_cmsg_rx,
 
 	.sriov_enable	= nfp_flower_sriov_enable,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h
index cd695eabce02..9de375acc254 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -84,6 +84,13 @@ struct nfp_fl_stats_id {
  * @flow_table:		Hash table used to store flower rules
  * @cmsg_work:		Workqueue for control messages processing
  * @cmsg_skbs:		List of skbs for control message processing
+ * @nfp_mac_off_list:	List of MAC addresses to offload
+ * @nfp_mac_index_list:	List of unique 8-bit indexes for non NFP netdevs
+ * @nfp_mac_off_lock:	Lock for the MAC address list
+ * @nfp_mac_index_lock:	Lock for the MAC index list
+ * @nfp_mac_off_ids:	IDA to manage id assignment for offloaded macs
+ * @nfp_mac_off_count:	Number of MACs in address list
+ * @nfp_tun_mac_nb:	Notifier to monitor link state
  */
 struct nfp_flower_priv {
 	struct nfp_app *app;
@@ -96,6 +103,13 @@ struct nfp_flower_priv {
 	DECLARE_HASHTABLE(flow_table, NFP_FLOWER_HASH_BITS);
 	struct work_struct cmsg_work;
 	struct sk_buff_head cmsg_skbs;
+	struct list_head nfp_mac_off_list;
+	struct list_head nfp_mac_index_list;
+	struct mutex nfp_mac_off_lock;
+	struct mutex nfp_mac_index_lock;
+	struct ida nfp_mac_off_ids;
+	int nfp_mac_off_count;
+	struct notifier_block nfp_tun_mac_nb;
 };
 
 struct nfp_fl_key_ls {
@@ -165,4 +179,8 @@ nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie);
 
 void nfp_flower_rx_flow_stats(struct nfp_app *app, struct sk_buff *skb);
 
+int nfp_tunnel_config_start(struct nfp_app *app);
+void nfp_tunnel_config_stop(struct nfp_app *app);
+void nfp_tunnel_write_macs(struct nfp_app *app);
+
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index 1fd1bab0611f..cb3ff6c126e8 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -232,6 +232,7 @@ int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
 				  struct nfp_fl_payload *nfp_flow)
 {
 	enum nfp_flower_tun_type tun_type = NFP_FL_TUNNEL_NONE;
+	struct nfp_repr *netdev_repr;
 	int err;
 	u8 *ext;
 	u8 *msk;
@@ -341,6 +342,12 @@ int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
 					 flow, true);
 		ext += sizeof(struct nfp_flower_vxlan);
 		msk += sizeof(struct nfp_flower_vxlan);
+
+		/* Configure tunnel end point MAC. */
+		if (nfp_netdev_is_nfp_repr(netdev)) {
+			netdev_repr = netdev_priv(netdev);
+			nfp_tunnel_write_macs(netdev_repr->app);
+		}
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
new file mode 100644
index 000000000000..34be85803020
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -0,0 +1,374 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/idr.h>
+#include <net/dst_metadata.h>
+
+#include "cmsg.h"
+#include "main.h"
+#include "../nfp_net_repr.h"
+#include "../nfp_net.h"
+
+/**
+ * struct nfp_tun_mac_addr - configure MAC address of tunnel EP on NFP
+ * @reserved:	reserved for future use
+ * @count:	number of MAC addresses in the message
+ * @index:	index of MAC address in the lookup table
+ * @addr:	interface MAC address
+ * @addresses:	series of MACs to offload
+ */
+struct nfp_tun_mac_addr {
+	__be16 reserved;
+	__be16 count;
+	struct index_mac_addr {
+		__be16 index;
+		u8 addr[ETH_ALEN];
+	} addresses[];
+};
+
+/**
+ * struct nfp_tun_mac_offload_entry - list of MACs to offload
+ * @index:	index of MAC address for offloading
+ * @addr:	interface MAC address
+ * @list:	list pointer
+ */
+struct nfp_tun_mac_offload_entry {
+	__be16 index;
+	u8 addr[ETH_ALEN];
+	struct list_head list;
+};
+
+#define NFP_MAX_MAC_INDEX       0xff
+
+/**
+ * struct nfp_tun_mac_non_nfp_idx - converts non NFP netdev ifindex to 8-bit id
+ * @ifindex:	netdev ifindex of the device
+ * @index:	index of netdevs mac on NFP
+ * @list:	list pointer
+ */
+struct nfp_tun_mac_non_nfp_idx {
+	int ifindex;
+	u8 index;
+	struct list_head list;
+};
+
+static bool nfp_tun_is_netdev_to_offload(struct net_device *netdev)
+{
+	if (!netdev->rtnl_link_ops)
+		return false;
+	if (!strcmp(netdev->rtnl_link_ops->kind, "openvswitch"))
+		return true;
+	if (!strcmp(netdev->rtnl_link_ops->kind, "vxlan"))
+		return true;
+
+	return false;
+}
+
+static int
+nfp_flower_xmit_tun_conf(struct nfp_app *app, u8 mtype, u16 plen, void *pdata)
+{
+	struct sk_buff *skb;
+	unsigned char *msg;
+
+	skb = nfp_flower_cmsg_alloc(app, plen, mtype);
+	if (!skb)
+		return -ENOMEM;
+
+	msg = nfp_flower_cmsg_get_data(skb);
+	memcpy(msg, pdata, nfp_flower_cmsg_get_data_len(skb));
+
+	nfp_ctrl_tx(app->ctrl, skb);
+	return 0;
+}
+
+void nfp_tunnel_write_macs(struct nfp_app *app)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_tun_mac_offload_entry *entry;
+	struct nfp_tun_mac_addr *payload;
+	struct list_head *ptr, *storage;
+	int mac_count, err, pay_size;
+
+	mutex_lock(&priv->nfp_mac_off_lock);
+	if (!priv->nfp_mac_off_count) {
+		mutex_unlock(&priv->nfp_mac_off_lock);
+		return;
+	}
+
+	pay_size = sizeof(struct nfp_tun_mac_addr) +
+		   sizeof(struct index_mac_addr) * priv->nfp_mac_off_count;
+
+	payload = kzalloc(pay_size, GFP_KERNEL);
+	if (!payload) {
+		mutex_unlock(&priv->nfp_mac_off_lock);
+		return;
+	}
+
+	payload->count = cpu_to_be16(priv->nfp_mac_off_count);
+
+	mac_count = 0;
+	list_for_each_safe(ptr, storage, &priv->nfp_mac_off_list) {
+		entry = list_entry(ptr, struct nfp_tun_mac_offload_entry,
+				   list);
+		payload->addresses[mac_count].index = entry->index;
+		ether_addr_copy(payload->addresses[mac_count].addr,
+				entry->addr);
+		mac_count++;
+	}
+
+	err = nfp_flower_xmit_tun_conf(app, NFP_FLOWER_CMSG_TYPE_TUN_MAC,
+				       pay_size, payload);
+
+	kfree(payload);
+
+	if (err) {
+		mutex_unlock(&priv->nfp_mac_off_lock);
+		/* Write failed so retain list for future retry. */
+		return;
+	}
+
+	/* If list was successfully offloaded, flush it. */
+	list_for_each_safe(ptr, storage, &priv->nfp_mac_off_list) {
+		entry = list_entry(ptr, struct nfp_tun_mac_offload_entry,
+				   list);
+		list_del(&entry->list);
+		kfree(entry);
+	}
+
+	priv->nfp_mac_off_count = 0;
+	mutex_unlock(&priv->nfp_mac_off_lock);
+}
+
+static int nfp_tun_get_mac_idx(struct nfp_app *app, int ifindex)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_tun_mac_non_nfp_idx *entry;
+	struct list_head *ptr, *storage;
+	int idx;
+
+	mutex_lock(&priv->nfp_mac_index_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_mac_index_list) {
+		entry = list_entry(ptr, struct nfp_tun_mac_non_nfp_idx, list);
+		if (entry->ifindex == ifindex) {
+			idx = entry->index;
+			mutex_unlock(&priv->nfp_mac_index_lock);
+			return idx;
+		}
+	}
+
+	idx = ida_simple_get(&priv->nfp_mac_off_ids, 0,
+			     NFP_MAX_MAC_INDEX, GFP_KERNEL);
+	if (idx < 0) {
+		mutex_unlock(&priv->nfp_mac_index_lock);
+		return idx;
+	}
+
+	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry) {
+		mutex_unlock(&priv->nfp_mac_index_lock);
+		return -ENOMEM;
+	}
+	entry->ifindex = ifindex;
+	entry->index = idx;
+	list_add_tail(&entry->list, &priv->nfp_mac_index_list);
+	mutex_unlock(&priv->nfp_mac_index_lock);
+
+	return idx;
+}
+
+static void nfp_tun_del_mac_idx(struct nfp_app *app, int ifindex)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_tun_mac_non_nfp_idx *entry;
+	struct list_head *ptr, *storage;
+
+	mutex_lock(&priv->nfp_mac_index_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_mac_index_list) {
+		entry = list_entry(ptr, struct nfp_tun_mac_non_nfp_idx, list);
+		if (entry->ifindex == ifindex) {
+			ida_simple_remove(&priv->nfp_mac_off_ids,
+					  entry->index);
+			list_del(&entry->list);
+			kfree(entry);
+			break;
+		}
+	}
+	mutex_unlock(&priv->nfp_mac_index_lock);
+}
+
+static void nfp_tun_add_to_mac_offload_list(struct net_device *netdev,
+					    struct nfp_app *app)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_tun_mac_offload_entry *entry;
+	u16 nfp_mac_idx;
+	int port = 0;
+
+	/* Check if MAC should be offloaded. */
+	if (!is_valid_ether_addr(netdev->dev_addr))
+		return;
+
+	if (nfp_netdev_is_nfp_repr(netdev))
+		port = nfp_repr_get_port_id(netdev);
+	else if (!nfp_tun_is_netdev_to_offload(netdev))
+		return;
+
+	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry) {
+		nfp_flower_cmsg_warn(app, "Mem fail when offloading MAC.\n");
+		return;
+	}
+
+	if (FIELD_GET(NFP_FLOWER_CMSG_PORT_TYPE, port) ==
+	    NFP_FLOWER_CMSG_PORT_TYPE_PHYS_PORT) {
+		nfp_mac_idx = port << 8 | NFP_FLOWER_CMSG_PORT_TYPE_PHYS_PORT;
+	} else if (FIELD_GET(NFP_FLOWER_CMSG_PORT_TYPE, port) ==
+		   NFP_FLOWER_CMSG_PORT_TYPE_PCIE_PORT) {
+		port = FIELD_GET(NFP_FLOWER_CMSG_PORT_VNIC, port);
+		nfp_mac_idx = port << 8 | NFP_FLOWER_CMSG_PORT_TYPE_PCIE_PORT;
+	} else {
+		/* Must assign our own unique 8-bit index. */
+		int idx = nfp_tun_get_mac_idx(app, netdev->ifindex);
+
+		if (idx < 0) {
+			nfp_flower_cmsg_warn(app, "Can't assign non-repr MAC index.\n");
+			kfree(entry);
+			return;
+		}
+		nfp_mac_idx = idx << 8 | NFP_FLOWER_CMSG_PORT_TYPE_OTHER_PORT;
+	}
+
+	entry->index = cpu_to_be16(nfp_mac_idx);
+	ether_addr_copy(entry->addr, netdev->dev_addr);
+
+	mutex_lock(&priv->nfp_mac_off_lock);
+	priv->nfp_mac_off_count++;
+	list_add_tail(&entry->list, &priv->nfp_mac_off_list);
+	mutex_unlock(&priv->nfp_mac_off_lock);
+}
+
+static int nfp_tun_mac_event_handler(struct notifier_block *nb,
+				     unsigned long event, void *ptr)
+{
+	struct nfp_flower_priv *app_priv;
+	struct net_device *netdev;
+	struct nfp_app *app;
+
+	if (event == NETDEV_DOWN || event == NETDEV_UNREGISTER) {
+		app_priv = container_of(nb, struct nfp_flower_priv,
+					nfp_tun_mac_nb);
+		app = app_priv->app;
+		netdev = netdev_notifier_info_to_dev(ptr);
+
+		/* If non-nfp netdev then free its offload index. */
+		if (nfp_tun_is_netdev_to_offload(netdev))
+			nfp_tun_del_mac_idx(app, netdev->ifindex);
+	} else if (event == NETDEV_UP || event == NETDEV_CHANGEADDR ||
+		   event == NETDEV_REGISTER) {
+		app_priv = container_of(nb, struct nfp_flower_priv,
+					nfp_tun_mac_nb);
+		app = app_priv->app;
+		netdev = netdev_notifier_info_to_dev(ptr);
+
+		nfp_tun_add_to_mac_offload_list(netdev, app);
+
+		/* Force a list write to keep NFP up to date. */
+		nfp_tunnel_write_macs(app);
+	}
+	return NOTIFY_OK;
+}
+
+int nfp_tunnel_config_start(struct nfp_app *app)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct net_device *netdev;
+	int err;
+
+	/* Initialise priv data for MAC offloading. */
+	priv->nfp_mac_off_count = 0;
+	mutex_init(&priv->nfp_mac_off_lock);
+	INIT_LIST_HEAD(&priv->nfp_mac_off_list);
+	priv->nfp_tun_mac_nb.notifier_call = nfp_tun_mac_event_handler;
+	mutex_init(&priv->nfp_mac_index_lock);
+	INIT_LIST_HEAD(&priv->nfp_mac_index_list);
+	ida_init(&priv->nfp_mac_off_ids);
+
+	err = register_netdevice_notifier(&priv->nfp_tun_mac_nb);
+	if (err)
+		goto err_free_mac_ida;
+
+	/* Parse netdevs already registered for MACs that need offloaded. */
+	rtnl_lock();
+	for_each_netdev(&init_net, netdev)
+		nfp_tun_add_to_mac_offload_list(netdev, app);
+	rtnl_unlock();
+
+	return 0;
+
+err_free_mac_ida:
+	ida_destroy(&priv->nfp_mac_off_ids);
+	return err;
+}
+
+void nfp_tunnel_config_stop(struct nfp_app *app)
+{
+	struct nfp_tun_mac_offload_entry *mac_entry;
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_tun_mac_non_nfp_idx *mac_idx;
+	struct list_head *ptr, *storage;
+
+	unregister_netdevice_notifier(&priv->nfp_tun_mac_nb);
+
+	/* Free any memory that may be occupied by MAC list. */
+	mutex_lock(&priv->nfp_mac_off_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_mac_off_list) {
+		mac_entry = list_entry(ptr, struct nfp_tun_mac_offload_entry,
+				       list);
+		list_del(&mac_entry->list);
+		kfree(mac_entry);
+	}
+	mutex_unlock(&priv->nfp_mac_off_lock);
+
+	/* Free any memory that may be occupied by MAC index list. */
+	mutex_lock(&priv->nfp_mac_index_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_mac_index_list) {
+		mac_idx = list_entry(ptr, struct nfp_tun_mac_non_nfp_idx,
+				     list);
+		list_del(&mac_idx->list);
+		kfree(mac_idx);
+	}
+	mutex_unlock(&priv->nfp_mac_index_lock);
+
+	ida_destroy(&priv->nfp_mac_off_ids);
+}
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next 5/7] nfp: offload vxlan IPv4 endpoints of flower rules
From: Simon Horman @ 2017-09-25 10:23 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: netdev, oss-drivers, John Hurley, Simon Horman
In-Reply-To: <1506335021-32024-1-git-send-email-simon.horman@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Maintain a list of IPv4 addresses used as the tunnel destination IP match
fields in currently active flower rules. Offload the entire list of
NFP_FL_IPV4_ADDRS_MAX (even if some are unused) when new IPs are added or
removed. The NFP should only be aware of tunnel end points that are
currently used by rules on the device

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |   1 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   7 ++
 drivers/net/ethernet/netronome/nfp/flower/match.c  |  14 ++-
 .../net/ethernet/netronome/nfp/flower/offload.c    |   4 +
 .../ethernet/netronome/nfp/flower/tunnel_conf.c    | 120 +++++++++++++++++++++
 5 files changed, 143 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index dc248193c996..6540bb1ceefb 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -318,6 +318,7 @@ enum nfp_flower_cmsg_type_port {
 	NFP_FLOWER_CMSG_TYPE_MAC_REPR =		7,
 	NFP_FLOWER_CMSG_TYPE_PORT_MOD =		8,
 	NFP_FLOWER_CMSG_TYPE_TUN_MAC =		11,
+	NFP_FLOWER_CMSG_TYPE_TUN_IPS =		14,
 	NFP_FLOWER_CMSG_TYPE_FLOW_STATS =	15,
 	NFP_FLOWER_CMSG_TYPE_PORT_ECHO =	16,
 	NFP_FLOWER_CMSG_TYPE_MAX =		32,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 9de375acc254..53306af6cfe8 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -86,8 +86,10 @@ struct nfp_fl_stats_id {
  * @cmsg_skbs:		List of skbs for control message processing
  * @nfp_mac_off_list:	List of MAC addresses to offload
  * @nfp_mac_index_list:	List of unique 8-bit indexes for non NFP netdevs
+ * @nfp_ipv4_off_list:	List of IPv4 addresses to offload
  * @nfp_mac_off_lock:	Lock for the MAC address list
  * @nfp_mac_index_lock:	Lock for the MAC index list
+ * @nfp_ipv4_off_lock:	Lock for the IPv4 address list
  * @nfp_mac_off_ids:	IDA to manage id assignment for offloaded macs
  * @nfp_mac_off_count:	Number of MACs in address list
  * @nfp_tun_mac_nb:	Notifier to monitor link state
@@ -105,8 +107,10 @@ struct nfp_flower_priv {
 	struct sk_buff_head cmsg_skbs;
 	struct list_head nfp_mac_off_list;
 	struct list_head nfp_mac_index_list;
+	struct list_head nfp_ipv4_off_list;
 	struct mutex nfp_mac_off_lock;
 	struct mutex nfp_mac_index_lock;
+	struct mutex nfp_ipv4_off_lock;
 	struct ida nfp_mac_off_ids;
 	int nfp_mac_off_count;
 	struct notifier_block nfp_tun_mac_nb;
@@ -142,6 +146,7 @@ struct nfp_fl_payload {
 	struct rcu_head rcu;
 	spinlock_t lock; /* lock stats */
 	struct nfp_fl_stats stats;
+	__be32 nfp_tun_ipv4_addr;
 	char *unmasked_data;
 	char *mask_data;
 	char *action_data;
@@ -182,5 +187,7 @@ void nfp_flower_rx_flow_stats(struct nfp_app *app, struct sk_buff *skb);
 int nfp_tunnel_config_start(struct nfp_app *app);
 void nfp_tunnel_config_stop(struct nfp_app *app);
 void nfp_tunnel_write_macs(struct nfp_app *app);
+void nfp_tunnel_del_ipv4_off(struct nfp_app *app, __be32 ipv4);
+void nfp_tunnel_add_ipv4_off(struct nfp_app *app, __be32 ipv4);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index cb3ff6c126e8..865a815ab92a 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -195,7 +195,7 @@ nfp_flower_compile_ipv6(struct nfp_flower_ipv6 *frame,
 static void
 nfp_flower_compile_vxlan(struct nfp_flower_vxlan *frame,
 			 struct tc_cls_flower_offload *flow,
-			 bool mask_version)
+			 bool mask_version, __be32 *tun_dst)
 {
 	struct fl_flow_key *target = mask_version ? flow->mask : flow->key;
 	struct flow_dissector_key_ipv4_addrs *vxlan_ips;
@@ -223,6 +223,7 @@ nfp_flower_compile_vxlan(struct nfp_flower_vxlan *frame,
 					     target);
 		frame->ip_src = vxlan_ips->src;
 		frame->ip_dst = vxlan_ips->dst;
+		*tun_dst = vxlan_ips->dst;
 	}
 }
 
@@ -232,6 +233,7 @@ int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
 				  struct nfp_fl_payload *nfp_flow)
 {
 	enum nfp_flower_tun_type tun_type = NFP_FL_TUNNEL_NONE;
+	__be32 tun_dst, tun_dst_mask = 0;
 	struct nfp_repr *netdev_repr;
 	int err;
 	u8 *ext;
@@ -336,10 +338,10 @@ int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
 	if (key_ls->key_layer & NFP_FLOWER_LAYER_VXLAN) {
 		/* Populate Exact VXLAN Data. */
 		nfp_flower_compile_vxlan((struct nfp_flower_vxlan *)ext,
-					 flow, false);
+					 flow, false, &tun_dst);
 		/* Populate Mask VXLAN Data. */
 		nfp_flower_compile_vxlan((struct nfp_flower_vxlan *)msk,
-					 flow, true);
+					 flow, true, &tun_dst_mask);
 		ext += sizeof(struct nfp_flower_vxlan);
 		msk += sizeof(struct nfp_flower_vxlan);
 
@@ -347,6 +349,12 @@ int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
 		if (nfp_netdev_is_nfp_repr(netdev)) {
 			netdev_repr = netdev_priv(netdev);
 			nfp_tunnel_write_macs(netdev_repr->app);
+
+			/* Store the tunnel destination in the rule data.
+			 * This must be present and be an exact match.
+			 */
+			nfp_flow->nfp_tun_ipv4_addr = tun_dst;
+			nfp_tunnel_add_ipv4_off(netdev_repr->app, tun_dst);
 		}
 	}
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 637372ba8f55..3d9537ebdea4 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -306,6 +306,7 @@ nfp_flower_allocate_new(struct nfp_fl_key_ls *key_layer)
 	if (!flow_pay->action_data)
 		goto err_free_mask;
 
+	flow_pay->nfp_tun_ipv4_addr = 0;
 	flow_pay->meta.flags = 0;
 	spin_lock_init(&flow_pay->lock);
 
@@ -415,6 +416,9 @@ nfp_flower_del_offload(struct nfp_app *app, struct net_device *netdev,
 	if (err)
 		goto err_free_flow;
 
+	if (nfp_flow->nfp_tun_ipv4_addr)
+		nfp_tunnel_del_ipv4_off(app, nfp_flow->nfp_tun_ipv4_addr);
+
 	err = nfp_flower_xmit_flow(netdev, nfp_flow,
 				   NFP_FLOWER_CMSG_TYPE_FLOW_DEL);
 	if (err)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 34be85803020..185505140f5e 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -32,6 +32,7 @@
  */
 
 #include <linux/etherdevice.h>
+#include <linux/inetdevice.h>
 #include <linux/idr.h>
 #include <net/dst_metadata.h>
 
@@ -40,6 +41,30 @@
 #include "../nfp_net_repr.h"
 #include "../nfp_net.h"
 
+#define NFP_FL_IPV4_ADDRS_MAX        32
+
+/**
+ * struct nfp_tun_ipv4_addr - set the IP address list on the NFP
+ * @count:	number of IPs populated in the array
+ * @ipv4_addr:	array of IPV4_ADDRS_MAX 32 bit IPv4 addresses
+ */
+struct nfp_tun_ipv4_addr {
+	__be32 count;
+	__be32 ipv4_addr[NFP_FL_IPV4_ADDRS_MAX];
+};
+
+/**
+ * struct nfp_ipv4_addr_entry - cached IPv4 addresses
+ * @ipv4_addr:	IP address
+ * @ref_count:	number of rules currently using this IP
+ * @list:	list pointer
+ */
+struct nfp_ipv4_addr_entry {
+	__be32 ipv4_addr;
+	int ref_count;
+	struct list_head list;
+};
+
 /**
  * struct nfp_tun_mac_addr - configure MAC address of tunnel EP on NFP
  * @reserved:	reserved for future use
@@ -112,6 +137,87 @@ nfp_flower_xmit_tun_conf(struct nfp_app *app, u8 mtype, u16 plen, void *pdata)
 	return 0;
 }
 
+static void nfp_tun_write_ipv4_list(struct nfp_app *app)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_ipv4_addr_entry *entry;
+	struct nfp_tun_ipv4_addr payload;
+	struct list_head *ptr, *storage;
+	int count;
+
+	memset(&payload, 0, sizeof(struct nfp_tun_ipv4_addr));
+	mutex_lock(&priv->nfp_ipv4_off_lock);
+	count = 0;
+	list_for_each_safe(ptr, storage, &priv->nfp_ipv4_off_list) {
+		if (count >= NFP_FL_IPV4_ADDRS_MAX) {
+			mutex_unlock(&priv->nfp_ipv4_off_lock);
+			nfp_flower_cmsg_warn(app, "IPv4 offload exceeds limit.\n");
+			return;
+		}
+		entry = list_entry(ptr, struct nfp_ipv4_addr_entry, list);
+		payload.ipv4_addr[count++] = entry->ipv4_addr;
+	}
+	payload.count = cpu_to_be32(count);
+	mutex_unlock(&priv->nfp_ipv4_off_lock);
+
+	nfp_flower_xmit_tun_conf(app, NFP_FLOWER_CMSG_TYPE_TUN_IPS,
+				 sizeof(struct nfp_tun_ipv4_addr),
+				 &payload);
+}
+
+void nfp_tunnel_add_ipv4_off(struct nfp_app *app, __be32 ipv4)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_ipv4_addr_entry *entry;
+	struct list_head *ptr, *storage;
+
+	mutex_lock(&priv->nfp_ipv4_off_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_ipv4_off_list) {
+		entry = list_entry(ptr, struct nfp_ipv4_addr_entry, list);
+		if (entry->ipv4_addr == ipv4) {
+			entry->ref_count++;
+			mutex_unlock(&priv->nfp_ipv4_off_lock);
+			return;
+		}
+	}
+
+	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry) {
+		mutex_unlock(&priv->nfp_ipv4_off_lock);
+		nfp_flower_cmsg_warn(app, "Mem error when offloading IP address.\n");
+		return;
+	}
+	entry->ipv4_addr = ipv4;
+	entry->ref_count = 1;
+	list_add_tail(&entry->list, &priv->nfp_ipv4_off_list);
+	mutex_unlock(&priv->nfp_ipv4_off_lock);
+
+	nfp_tun_write_ipv4_list(app);
+}
+
+void nfp_tunnel_del_ipv4_off(struct nfp_app *app, __be32 ipv4)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_ipv4_addr_entry *entry;
+	struct list_head *ptr, *storage;
+
+	mutex_lock(&priv->nfp_ipv4_off_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_ipv4_off_list) {
+		entry = list_entry(ptr, struct nfp_ipv4_addr_entry, list);
+		if (entry->ipv4_addr == ipv4) {
+			entry->ref_count--;
+			if (!entry->ref_count) {
+				list_del(&entry->list);
+				kfree(entry);
+			}
+			break;
+		}
+	}
+	mutex_unlock(&priv->nfp_ipv4_off_lock);
+
+	nfp_tun_write_ipv4_list(app);
+}
+
 void nfp_tunnel_write_macs(struct nfp_app *app)
 {
 	struct nfp_flower_priv *priv = app->priv;
@@ -324,6 +430,10 @@ int nfp_tunnel_config_start(struct nfp_app *app)
 	INIT_LIST_HEAD(&priv->nfp_mac_index_list);
 	ida_init(&priv->nfp_mac_off_ids);
 
+	/* Initialise priv data for IPv4 offloading. */
+	mutex_init(&priv->nfp_ipv4_off_lock);
+	INIT_LIST_HEAD(&priv->nfp_ipv4_off_list);
+
 	err = register_netdevice_notifier(&priv->nfp_tun_mac_nb);
 	if (err)
 		goto err_free_mac_ida;
@@ -346,6 +456,7 @@ void nfp_tunnel_config_stop(struct nfp_app *app)
 	struct nfp_tun_mac_offload_entry *mac_entry;
 	struct nfp_flower_priv *priv = app->priv;
 	struct nfp_tun_mac_non_nfp_idx *mac_idx;
+	struct nfp_ipv4_addr_entry *ip_entry;
 	struct list_head *ptr, *storage;
 
 	unregister_netdevice_notifier(&priv->nfp_tun_mac_nb);
@@ -371,4 +482,13 @@ void nfp_tunnel_config_stop(struct nfp_app *app)
 	mutex_unlock(&priv->nfp_mac_index_lock);
 
 	ida_destroy(&priv->nfp_mac_off_ids);
+
+	/* Free any memory that may be occupied by ipv4 list. */
+	mutex_lock(&priv->nfp_ipv4_off_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_ipv4_off_list) {
+		ip_entry = list_entry(ptr, struct nfp_ipv4_addr_entry, list);
+		list_del(&ip_entry->list);
+		kfree(ip_entry);
+	}
+	mutex_unlock(&priv->nfp_ipv4_off_lock);
 }
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next 6/7] nfp: flower vxlan neighbour offload
From: Simon Horman @ 2017-09-25 10:23 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: netdev, oss-drivers, John Hurley, Simon Horman
In-Reply-To: <1506335021-32024-1-git-send-email-simon.horman@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Receive a request when the NFP does not know the next hop for a packet
that is to be encapsulated in a VXLAN tunnel. Do a route lookup, determine
the next hop entry and update neighbour table on NFP. Monitor the kernel
neighbour table for link changes and update NFP with relevant information.
Overwrite routes with zero values on the NFP when they expire.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |   6 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |   2 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   7 +
 .../ethernet/netronome/nfp/flower/tunnel_conf.c    | 253 +++++++++++++++++++++
 4 files changed, 268 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index b756006dba6f..862787daaa68 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -181,6 +181,12 @@ nfp_flower_cmsg_process_one_rx(struct nfp_app *app, struct sk_buff *skb)
 	case NFP_FLOWER_CMSG_TYPE_FLOW_STATS:
 		nfp_flower_rx_flow_stats(app, skb);
 		break;
+	case NFP_FLOWER_CMSG_TYPE_NO_NEIGH:
+		nfp_tunnel_request_route(app, skb);
+		break;
+	case NFP_FLOWER_CMSG_TYPE_TUN_NEIGH:
+		/* Acks from the NFP that the route is added - ignore. */
+		break;
 	default:
 		nfp_flower_cmsg_warn(app, "Cannot handle invalid repr control type %u\n",
 				     type);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 6540bb1ceefb..1dc72a1ed577 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -317,7 +317,9 @@ enum nfp_flower_cmsg_type_port {
 	NFP_FLOWER_CMSG_TYPE_FLOW_DEL =		2,
 	NFP_FLOWER_CMSG_TYPE_MAC_REPR =		7,
 	NFP_FLOWER_CMSG_TYPE_PORT_MOD =		8,
+	NFP_FLOWER_CMSG_TYPE_NO_NEIGH =		10,
 	NFP_FLOWER_CMSG_TYPE_TUN_MAC =		11,
+	NFP_FLOWER_CMSG_TYPE_TUN_NEIGH =	13,
 	NFP_FLOWER_CMSG_TYPE_TUN_IPS =		14,
 	NFP_FLOWER_CMSG_TYPE_FLOW_STATS =	15,
 	NFP_FLOWER_CMSG_TYPE_PORT_ECHO =	16,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 53306af6cfe8..93ad969c3653 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -87,12 +87,15 @@ struct nfp_fl_stats_id {
  * @nfp_mac_off_list:	List of MAC addresses to offload
  * @nfp_mac_index_list:	List of unique 8-bit indexes for non NFP netdevs
  * @nfp_ipv4_off_list:	List of IPv4 addresses to offload
+ * @nfp_neigh_off_list:	List of neighbour offloads
  * @nfp_mac_off_lock:	Lock for the MAC address list
  * @nfp_mac_index_lock:	Lock for the MAC index list
  * @nfp_ipv4_off_lock:	Lock for the IPv4 address list
+ * @nfp_neigh_off_lock:	Lock for the neighbour address list
  * @nfp_mac_off_ids:	IDA to manage id assignment for offloaded macs
  * @nfp_mac_off_count:	Number of MACs in address list
  * @nfp_tun_mac_nb:	Notifier to monitor link state
+ * @nfp_tun_neigh_nb:	Notifier to monitor neighbour state
  */
 struct nfp_flower_priv {
 	struct nfp_app *app;
@@ -108,12 +111,15 @@ struct nfp_flower_priv {
 	struct list_head nfp_mac_off_list;
 	struct list_head nfp_mac_index_list;
 	struct list_head nfp_ipv4_off_list;
+	struct list_head nfp_neigh_off_list;
 	struct mutex nfp_mac_off_lock;
 	struct mutex nfp_mac_index_lock;
 	struct mutex nfp_ipv4_off_lock;
+	struct mutex nfp_neigh_off_lock;
 	struct ida nfp_mac_off_ids;
 	int nfp_mac_off_count;
 	struct notifier_block nfp_tun_mac_nb;
+	struct notifier_block nfp_tun_neigh_nb;
 };
 
 struct nfp_fl_key_ls {
@@ -189,5 +195,6 @@ void nfp_tunnel_config_stop(struct nfp_app *app);
 void nfp_tunnel_write_macs(struct nfp_app *app);
 void nfp_tunnel_del_ipv4_off(struct nfp_app *app, __be32 ipv4);
 void nfp_tunnel_add_ipv4_off(struct nfp_app *app, __be32 ipv4);
+void nfp_tunnel_request_route(struct nfp_app *app, struct sk_buff *skb);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 185505140f5e..8c6b88a1306b 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -33,6 +33,7 @@
 
 #include <linux/etherdevice.h>
 #include <linux/inetdevice.h>
+#include <net/netevent.h>
 #include <linux/idr.h>
 #include <net/dst_metadata.h>
 
@@ -41,6 +42,44 @@
 #include "../nfp_net_repr.h"
 #include "../nfp_net.h"
 
+/**
+ * struct nfp_tun_neigh - neighbour/route entry on the NFP
+ * @dst_ipv4:	destination IPv4 address
+ * @src_ipv4:	source IPv4 address
+ * @dst_addr:	destination MAC address
+ * @src_addr:	source MAC address
+ * @port_id:	NFP port to output packet on - associated with source IPv4
+ */
+struct nfp_tun_neigh {
+	__be32 dst_ipv4;
+	__be32 src_ipv4;
+	u8 dst_addr[ETH_ALEN];
+	u8 src_addr[ETH_ALEN];
+	__be32 port_id;
+};
+
+/**
+ * struct nfp_tun_req_route_ipv4 - NFP requests a route/neighbour lookup
+ * @ingress_port:	ingress port of packet that signalled request
+ * @ipv4_addr:		destination ipv4 address for route
+ * @reserved:		reserved for future use
+ */
+struct nfp_tun_req_route_ipv4 {
+	__be32 ingress_port;
+	__be32 ipv4_addr;
+	__be32 reserved[2];
+};
+
+/**
+ * struct nfp_ipv4_route_entry - routes that are offloaded to the NFP
+ * @ipv4_addr:	destination of route
+ * @list:	list pointer
+ */
+struct nfp_ipv4_route_entry {
+	__be32 ipv4_addr;
+	struct list_head list;
+};
+
 #define NFP_FL_IPV4_ADDRS_MAX        32
 
 /**
@@ -137,6 +176,197 @@ nfp_flower_xmit_tun_conf(struct nfp_app *app, u8 mtype, u16 plen, void *pdata)
 	return 0;
 }
 
+static bool nfp_tun_has_route(struct nfp_app *app, __be32 ipv4_addr)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_ipv4_route_entry *entry;
+	struct list_head *ptr, *storage;
+
+	mutex_lock(&priv->nfp_neigh_off_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_neigh_off_list) {
+		entry = list_entry(ptr, struct nfp_ipv4_route_entry, list);
+		if (entry->ipv4_addr == ipv4_addr) {
+			mutex_unlock(&priv->nfp_neigh_off_lock);
+			return true;
+		}
+	}
+	mutex_unlock(&priv->nfp_neigh_off_lock);
+	return false;
+}
+
+static void nfp_tun_add_route_to_cache(struct nfp_app *app, __be32 ipv4_addr)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_ipv4_route_entry *entry;
+	struct list_head *ptr, *storage;
+
+	mutex_lock(&priv->nfp_neigh_off_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_neigh_off_list) {
+		entry = list_entry(ptr, struct nfp_ipv4_route_entry, list);
+		if (entry->ipv4_addr == ipv4_addr) {
+			mutex_unlock(&priv->nfp_neigh_off_lock);
+			return;
+		}
+	}
+	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry) {
+		mutex_unlock(&priv->nfp_neigh_off_lock);
+		nfp_flower_cmsg_warn(app, "Mem error when storing new route.\n");
+		return;
+	}
+
+	entry->ipv4_addr = ipv4_addr;
+	list_add_tail(&entry->list, &priv->nfp_neigh_off_list);
+	mutex_unlock(&priv->nfp_neigh_off_lock);
+}
+
+static void nfp_tun_del_route_from_cache(struct nfp_app *app, __be32 ipv4_addr)
+{
+	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_ipv4_route_entry *entry;
+	struct list_head *ptr, *storage;
+
+	mutex_lock(&priv->nfp_neigh_off_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_neigh_off_list) {
+		entry = list_entry(ptr, struct nfp_ipv4_route_entry, list);
+		if (entry->ipv4_addr == ipv4_addr) {
+			list_del(&entry->list);
+			kfree(entry);
+			break;
+		}
+	}
+	mutex_unlock(&priv->nfp_neigh_off_lock);
+}
+
+static void
+nfp_tun_write_neigh(struct net_device *netdev, struct nfp_app *app,
+		    struct flowi4 *flow, struct neighbour *neigh)
+{
+	struct nfp_tun_neigh payload;
+
+	/* Only offload representor IPv4s for now. */
+	if (!nfp_netdev_is_nfp_repr(netdev))
+		return;
+
+	memset(&payload, 0, sizeof(struct nfp_tun_neigh));
+	payload.dst_ipv4 = flow->daddr;
+
+	/* If entry has expired send dst IP with all other fields 0. */
+	if (!(neigh->nud_state & NUD_VALID)) {
+		nfp_tun_del_route_from_cache(app, payload.dst_ipv4);
+		/* Trigger ARP to verify invalid neighbour state. */
+		neigh_event_send(neigh, NULL);
+		goto send_msg;
+	}
+
+	/* Have a valid neighbour so populate rest of entry. */
+	payload.src_ipv4 = flow->saddr;
+	ether_addr_copy(payload.src_addr, netdev->dev_addr);
+	neigh_ha_snapshot(payload.dst_addr, neigh, netdev);
+	payload.port_id = cpu_to_be32(nfp_repr_get_port_id(netdev));
+	/* Add destination of new route to NFP cache. */
+	nfp_tun_add_route_to_cache(app, payload.dst_ipv4);
+
+send_msg:
+	nfp_flower_xmit_tun_conf(app, NFP_FLOWER_CMSG_TYPE_TUN_NEIGH,
+				 sizeof(struct nfp_tun_neigh),
+				 (unsigned char *)&payload);
+}
+
+static int
+nfp_tun_neigh_event_handler(struct notifier_block *nb, unsigned long event,
+			    void *ptr)
+{
+	struct nfp_flower_priv *app_priv;
+	struct netevent_redirect *redir;
+	struct flowi4 flow = {};
+	struct neighbour *n;
+	struct nfp_app *app;
+	struct rtable *rt;
+	int err;
+
+	switch (event) {
+	case NETEVENT_REDIRECT:
+		redir = (struct netevent_redirect *)ptr;
+		n = redir->neigh;
+		break;
+	case NETEVENT_NEIGH_UPDATE:
+		n = (struct neighbour *)ptr;
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	flow.daddr = *(__be32 *)n->primary_key;
+
+	/* Only concerned with route changes for representors. */
+	if (!nfp_netdev_is_nfp_repr(n->dev))
+		return NOTIFY_DONE;
+
+	app_priv = container_of(nb, struct nfp_flower_priv, nfp_tun_neigh_nb);
+	app = app_priv->app;
+
+	/* Only concerned with changes to routes already added to NFP. */
+	if (!nfp_tun_has_route(app, flow.daddr))
+		return NOTIFY_DONE;
+
+#if IS_ENABLED(CONFIG_INET)
+	/* Do a route lookup to populate flow data. */
+	rt = ip_route_output_key(dev_net(n->dev), &flow);
+	err = PTR_ERR_OR_ZERO(rt);
+	if (err)
+		return NOTIFY_DONE;
+#else
+	return NOTIFY_DONE;
+#endif
+
+	flow.flowi4_proto = IPPROTO_UDP;
+	nfp_tun_write_neigh(n->dev, app, &flow, n);
+
+	return NOTIFY_OK;
+}
+
+void nfp_tunnel_request_route(struct nfp_app *app, struct sk_buff *skb)
+{
+	struct nfp_tun_req_route_ipv4 *payload;
+	struct net_device *netdev;
+	struct flowi4 flow = {};
+	struct neighbour *n;
+	struct rtable *rt;
+	int err;
+
+	payload = nfp_flower_cmsg_get_data(skb);
+
+	netdev = nfp_app_repr_get(app, be32_to_cpu(payload->ingress_port));
+	if (!netdev)
+		goto route_fail_warning;
+
+	flow.daddr = payload->ipv4_addr;
+	flow.flowi4_proto = IPPROTO_UDP;
+
+#if IS_ENABLED(CONFIG_INET)
+	/* Do a route lookup on same namespace as ingress port. */
+	rt = ip_route_output_key(dev_net(netdev), &flow);
+	err = PTR_ERR_OR_ZERO(rt);
+	if (err)
+		goto route_fail_warning;
+#else
+	goto route_fail_warning;
+#endif
+
+	/* Get the neighbour entry for the lookup */
+	n = dst_neigh_lookup(&rt->dst, &flow.daddr);
+	ip_rt_put(rt);
+	if (!n)
+		goto route_fail_warning;
+	nfp_tun_write_neigh(n->dev, app, &flow, n);
+	neigh_release(n);
+	return;
+
+route_fail_warning:
+	nfp_flower_cmsg_warn(app, "Requested route not found.\n");
+}
+
 static void nfp_tun_write_ipv4_list(struct nfp_app *app)
 {
 	struct nfp_flower_priv *priv = app->priv;
@@ -434,10 +664,19 @@ int nfp_tunnel_config_start(struct nfp_app *app)
 	mutex_init(&priv->nfp_ipv4_off_lock);
 	INIT_LIST_HEAD(&priv->nfp_ipv4_off_list);
 
+	/* Initialise priv data for neighbour offloading. */
+	mutex_init(&priv->nfp_neigh_off_lock);
+	INIT_LIST_HEAD(&priv->nfp_neigh_off_list);
+	priv->nfp_tun_neigh_nb.notifier_call = nfp_tun_neigh_event_handler;
+
 	err = register_netdevice_notifier(&priv->nfp_tun_mac_nb);
 	if (err)
 		goto err_free_mac_ida;
 
+	err = register_netevent_notifier(&priv->nfp_tun_neigh_nb);
+	if (err)
+		goto err_unreg_mac_nb;
+
 	/* Parse netdevs already registered for MACs that need offloaded. */
 	rtnl_lock();
 	for_each_netdev(&init_net, netdev)
@@ -446,6 +685,8 @@ int nfp_tunnel_config_start(struct nfp_app *app)
 
 	return 0;
 
+err_unreg_mac_nb:
+	unregister_netdevice_notifier(&priv->nfp_tun_mac_nb);
 err_free_mac_ida:
 	ida_destroy(&priv->nfp_mac_off_ids);
 	return err;
@@ -455,11 +696,13 @@ void nfp_tunnel_config_stop(struct nfp_app *app)
 {
 	struct nfp_tun_mac_offload_entry *mac_entry;
 	struct nfp_flower_priv *priv = app->priv;
+	struct nfp_ipv4_route_entry *route_entry;
 	struct nfp_tun_mac_non_nfp_idx *mac_idx;
 	struct nfp_ipv4_addr_entry *ip_entry;
 	struct list_head *ptr, *storage;
 
 	unregister_netdevice_notifier(&priv->nfp_tun_mac_nb);
+	unregister_netevent_notifier(&priv->nfp_tun_neigh_nb);
 
 	/* Free any memory that may be occupied by MAC list. */
 	mutex_lock(&priv->nfp_mac_off_lock);
@@ -491,4 +734,14 @@ void nfp_tunnel_config_stop(struct nfp_app *app)
 		kfree(ip_entry);
 	}
 	mutex_unlock(&priv->nfp_ipv4_off_lock);
+
+	/* Free any memory that may be occupied by the route list. */
+	mutex_lock(&priv->nfp_neigh_off_lock);
+	list_for_each_safe(ptr, storage, &priv->nfp_neigh_off_list) {
+		route_entry = list_entry(ptr, struct nfp_ipv4_route_entry,
+					 list);
+		list_del(&route_entry->list);
+		kfree(route_entry);
+	}
+	mutex_unlock(&priv->nfp_neigh_off_lock);
 }
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next 7/7] nfp: flower vxlan neighbour keep-alive
From: Simon Horman @ 2017-09-25 10:23 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: netdev, oss-drivers, John Hurley, Simon Horman
In-Reply-To: <1506335021-32024-1-git-send-email-simon.horman@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Periodically receive messages containing the destination IPs of tunnels
that have recently forwarded traffic. Update the neighbour entries 'used'
value for these IPs next hop.

This prevents the neighbour entry from expiring on timeout but rather
signals an ARP to verify the connection. From an NFP perspective, packets
will not fall back mid-flow unless the link is verified to be down.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   |  3 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  1 +
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  1 +
 .../ethernet/netronome/nfp/flower/tunnel_conf.c    | 64 ++++++++++++++++++++++
 4 files changed, 69 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index 862787daaa68..6b71c719deba 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -184,6 +184,9 @@ nfp_flower_cmsg_process_one_rx(struct nfp_app *app, struct sk_buff *skb)
 	case NFP_FLOWER_CMSG_TYPE_NO_NEIGH:
 		nfp_tunnel_request_route(app, skb);
 		break;
+	case NFP_FLOWER_CMSG_TYPE_ACTIVE_TUNS:
+		nfp_tunnel_keep_alive(app, skb);
+		break;
 	case NFP_FLOWER_CMSG_TYPE_TUN_NEIGH:
 		/* Acks from the NFP that the route is added - ignore. */
 		break;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 1dc72a1ed577..504ddaa21701 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -319,6 +319,7 @@ enum nfp_flower_cmsg_type_port {
 	NFP_FLOWER_CMSG_TYPE_PORT_MOD =		8,
 	NFP_FLOWER_CMSG_TYPE_NO_NEIGH =		10,
 	NFP_FLOWER_CMSG_TYPE_TUN_MAC =		11,
+	NFP_FLOWER_CMSG_TYPE_ACTIVE_TUNS =	12,
 	NFP_FLOWER_CMSG_TYPE_TUN_NEIGH =	13,
 	NFP_FLOWER_CMSG_TYPE_TUN_IPS =		14,
 	NFP_FLOWER_CMSG_TYPE_FLOW_STATS =	15,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 93ad969c3653..12c319a219d8 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -196,5 +196,6 @@ void nfp_tunnel_write_macs(struct nfp_app *app);
 void nfp_tunnel_del_ipv4_off(struct nfp_app *app, __be32 ipv4);
 void nfp_tunnel_add_ipv4_off(struct nfp_app *app, __be32 ipv4);
 void nfp_tunnel_request_route(struct nfp_app *app, struct sk_buff *skb);
+void nfp_tunnel_keep_alive(struct nfp_app *app, struct sk_buff *skb);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 8c6b88a1306b..c495f8f38506 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -36,12 +36,36 @@
 #include <net/netevent.h>
 #include <linux/idr.h>
 #include <net/dst_metadata.h>
+#include <net/arp.h>
 
 #include "cmsg.h"
 #include "main.h"
 #include "../nfp_net_repr.h"
 #include "../nfp_net.h"
 
+#define NFP_FL_MAX_ROUTES               32
+
+/**
+ * struct nfp_tun_active_tuns - periodic message of active tunnels
+ * @seq:		sequence number of the message
+ * @count:		number of tunnels report in message
+ * @flags:		options part of the request
+ * @ipv4:		dest IPv4 address of active route
+ * @egress_port:	port the encapsulated packet egressed
+ * @extra:		reserved for future use
+ * @tun_info:		tunnels that have sent traffic in reported period
+ */
+struct nfp_tun_active_tuns {
+	__be32 seq;
+	__be32 count;
+	__be32 flags;
+	struct route_ip_info {
+		__be32 ipv4;
+		__be32 egress_port;
+		__be32 extra[2];
+	} tun_info[];
+};
+
 /**
  * struct nfp_tun_neigh - neighbour/route entry on the NFP
  * @dst_ipv4:	destination IPv4 address
@@ -147,6 +171,46 @@ struct nfp_tun_mac_non_nfp_idx {
 	struct list_head list;
 };
 
+void nfp_tunnel_keep_alive(struct nfp_app *app, struct sk_buff *skb)
+{
+	struct nfp_tun_active_tuns *payload;
+	struct net_device *netdev;
+	int count, i, pay_len;
+	struct neighbour *n;
+	__be32 ipv4_addr;
+	u32 port;
+
+	payload = nfp_flower_cmsg_get_data(skb);
+	count = be32_to_cpu(payload->count);
+	if (count > NFP_FL_MAX_ROUTES) {
+		nfp_flower_cmsg_warn(app, "Tunnel keep-alive request exceeds max routes.\n");
+		return;
+	}
+
+	pay_len = nfp_flower_cmsg_get_data_len(skb);
+	if (pay_len != sizeof(struct nfp_tun_active_tuns) +
+	    sizeof(struct route_ip_info) * count) {
+		nfp_flower_cmsg_warn(app, "Corruption in tunnel keep-alive message.\n");
+		return;
+	}
+
+	for (i = 0; i < count; i++) {
+		ipv4_addr = payload->tun_info[i].ipv4;
+		port = be32_to_cpu(payload->tun_info[i].egress_port);
+		netdev = nfp_app_repr_get(app, port);
+		if (!netdev)
+			continue;
+
+		n = neigh_lookup(&arp_tbl, &ipv4_addr, netdev);
+		if (!n)
+			continue;
+
+		/* Update the used timestamp of neighbour */
+		neigh_event_send(n, NULL);
+		neigh_release(n);
+	}
+}
+
 static bool nfp_tun_is_netdev_to_offload(struct net_device *netdev)
 {
 	if (!netdev->rtnl_link_ops)
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next] hv_netvsc: Fix the real number of queues of non-vRSS cases
From: Stephen Hemminger @ 2017-09-25 10:38 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: haiyangz, davem, netdev, kys, olaf, vkuznets, linux-kernel
In-Reply-To: <20170922223138.28348-1-haiyangz@exchange.microsoft.com>

On Fri, 22 Sep 2017 15:31:38 -0700
Haiyang Zhang <haiyangz@exchange.microsoft.com> wrote:

> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> For older hosts without multi-channel (vRSS) support, and some error
> cases, we still need to set the real number of queues to one.
> This patch adds this missing setting.
> 
> Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Yes, this should fix issues with WS2012.

Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>

^ permalink raw reply

* Re: [patch net-next v2 07/12] mlxsw: spectrum: Add the multicast routing offloading logic
From: Nikolay Aleksandrov @ 2017-09-25 10:40 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: davem, yotamg, idosch, mlxsw, andrew
In-Reply-To: <20170924172212.10096-8-jiri@resnulli.us>

On 24/09/17 20:22, Jiri Pirko wrote:
> From: Yotam Gigi <yotamg@mellanox.com>
> 
> Add the multicast router offloading logic, which is in charge of handling
> the VIF and MFC notifications and translating it to the hardware logic API.
> 
> The offloading logic has to overcome several obstacles in order to safely
> comply with the kernel multicast router user API:
>  - It must keep track of the mapping between VIFs to netdevices. The user
>    can add an MFC cache entry pointing to a VIF, delete the VIF and add
>    re-add it with a different netdevice. The offloading logic has to handle
>    this in order to be compatible with the kernel logic.
>  - It must keep track of the mapping between netdevices to spectrum RIFs,
>    as the current hardware implementation assume having a RIF for every
>    port in a multicast router.
>  - It must handle routes pointing to pimreg device to be trapped to the
>    kernel, as the packet should be delivered to userspace.
>  - It must handle routes pointing tunnel VIFs. The current implementation
>    does not support multicast forwarding to tunnels, thus routes that point
>    to a tunnel should be trapped to the kernel.
>  - It must be aware of proxy multicast routes, which include both (*,*)
>    routes and duplicate routes. Currently proxy routes are not offloaded
>    and trigger the abort mechanism: removal of all routes from hardware and
>    triggering the traffic to go through the kernel.
> 
> The multicast routing offloading logic also updates the counters of the
> offloaded MFC routes in a periodic work.
> 
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
> v1->v2:
>  - Update the lastuse MFC entry field too, in addition to packets an bytes.
> ---
>  drivers/net/ethernet/mellanox/mlxsw/Makefile      |    3 +-
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.h    |    1 +
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c | 1014 +++++++++++++++++++++
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h |  133 +++
>  4 files changed, 1150 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h
> 
[snip]
> +static void mlxsw_sp_mr_route_erase(struct mlxsw_sp_mr_table *mr_table,
> +				    struct mlxsw_sp_mr_route *mr_route)
> +{
> +	struct mlxsw_sp *mlxsw_sp = mr_table->mlxsw_sp;
> +	struct mlxsw_sp_mr *mr = mlxsw_sp->mr;
> +
> +	mr->mr_ops->route_destroy(mlxsw_sp, mr->priv, mr_route->route_priv);
> +	kfree(mr_route->route_priv);
> +}
> +
> +static struct mlxsw_sp_mr_route *
> +mlxsw_sp_mr_route4_create(struct mlxsw_sp_mr_table *mr_table,
> +			  struct mfc_cache *mfc)
> +{
> +	struct mlxsw_sp_mr_route_vif_entry *rve, *tmp;
> +	struct mlxsw_sp_mr_route *mr_route;
> +	int err;
> +	int i;
> +
> +	/* Allocate and init a new route and fill it with parameters */
> +	mr_route = kzalloc(sizeof(*mr_table), GFP_KERNEL);

sizeof(*mr_table) ? Shouldn't you allocate sizeof struct mlsw_sp_mr_route (*mr_route) here ?

> +	if (!mr_route)
> +		return ERR_PTR(-ENOMEM);
> +	INIT_LIST_HEAD(&mr_route->evif_list);
> +	mlxsw_sp_mr_route4_key(mr_table, &mr_route->key, mfc);
> +
> +	/* Find min_mtu and link iVIF and eVIFs */
> +	mr_route->min_mtu = ETH_MAX_MTU;
> +	ipmr_cache_hold(mfc);
> +	mr_route->mfc4 = mfc;
> +	mr_route->mr_table = mr_table;
> +	for (i = 0; i < MAXVIFS; i++) {
> +		if (mfc->mfc_un.res.ttls[i] != 255) {
> +			err = mlxsw_sp_mr_route_evif_link(mr_route,
> +							  &mr_table->vifs[i]);
> +			if (err)
> +				goto err;
> +			if (mr_table->vifs[i].dev &&
> +			    mr_table->vifs[i].dev->mtu < mr_route->min_mtu)
> +				mr_route->min_mtu = mr_table->vifs[i].dev->mtu;
> +		}
> +	}
> +	mlxsw_sp_mr_route_ivif_link(mr_route, &mr_table->vifs[mfc->mfc_parent]);
> +	if (err)
> +		goto err;
> +
> +	mr_route->route_action = mlxsw_sp_mr_route_action(mr_route);
> +	return mr_route;
> +err:
> +	ipmr_cache_put(mfc);
> +	list_for_each_entry_safe(rve, tmp, &mr_route->evif_list, route_node)
> +		mlxsw_sp_mr_route_evif_unlink(rve);
> +	kfree(mr_route);
> +	return ERR_PTR(err);
> +}
> +
> +static void mlxsw_sp_mr_route4_destroy(struct mlxsw_sp_mr_table *mr_table,
> +				       struct mlxsw_sp_mr_route *mr_route)
> +{
> +	struct mlxsw_sp_mr_route_vif_entry *rve, *tmp;
> +
> +	mlxsw_sp_mr_route_ivif_unlink(mr_route);
> +	ipmr_cache_put(mr_route->mfc4);
> +	list_for_each_entry_safe(rve, tmp, &mr_route->evif_list, route_node)
> +		mlxsw_sp_mr_route_evif_unlink(rve);
> +	kfree(mr_route);
> +}
[snip]

^ permalink raw reply

* Re: [PATCH net-next] net: mvpp2: phylink support
From: Russell King - ARM Linux @ 2017-09-25 10:45 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, andrew, gregory.clement, thomas.petazzoni, miquel.raynal,
	nadavh, linux-kernel, mw, stefanc, netdev
In-Reply-To: <20170925095514.GA19364@kwain>

On Mon, Sep 25, 2017 at 11:55:14AM +0200, Antoine Tenart wrote:
> Hi Russell,
> 
> On Fri, Sep 22, 2017 at 12:07:31PM +0100, Russell King - ARM Linux wrote:
> > On Thu, Sep 21, 2017 at 03:45:22PM +0200, Antoine Tenart wrote:
> > > Convert the PPv2 driver to use phylink, which models the MAC to PHY
> > > link. The phylink support is made such a way the GoP link IRQ can still
> > > be used: the two modes are incompatible and the GoP link IRQ will be
> > > used if no PHY is described in the device tree. This is the same
> > > behaviour as before.
> > 
> > This makes no sense.  The point of phylink is to be able to support SFP
> > cages, and SFP cages do not have a PHY described in DT.  So, when you
> > want to use phylink because of SFP, you can't, because if you omit
> > the PHY the driver avoids using phylink.
> 
> Yes that's an issue. However we do need to support the GoP link IRQ
> which is also needed in some cases where there is no PHY (and when
> phylink cannot be used). What would you propose to differentiate those
> two cases: no PHY using phylink, and no PHY using the GoP link IRQ?

Can you describe what the GoP link IRQ is doing please?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* Re: [patch net-next v2 07/12] mlxsw: spectrum: Add the multicast routing offloading logic
From: Yotam Gigi @ 2017-09-25 10:53 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Jiri Pirko, netdev; +Cc: davem, idosch, mlxsw, andrew
In-Reply-To: <5ded28ba-0a0c-927c-5a5e-c0fc761ca49e@cumulusnetworks.com>

On 09/25/2017 01:40 PM, Nikolay Aleksandrov wrote:
> On 24/09/17 20:22, Jiri Pirko wrote:
>> From: Yotam Gigi <yotamg@mellanox.com>
>>
>> Add the multicast router offloading logic, which is in charge of handling
>> the VIF and MFC notifications and translating it to the hardware logic API.
>>
>> The offloading logic has to overcome several obstacles in order to safely
>> comply with the kernel multicast router user API:
>>  - It must keep track of the mapping between VIFs to netdevices. The user
>>    can add an MFC cache entry pointing to a VIF, delete the VIF and add
>>    re-add it with a different netdevice. The offloading logic has to handle
>>    this in order to be compatible with the kernel logic.
>>  - It must keep track of the mapping between netdevices to spectrum RIFs,
>>    as the current hardware implementation assume having a RIF for every
>>    port in a multicast router.
>>  - It must handle routes pointing to pimreg device to be trapped to the
>>    kernel, as the packet should be delivered to userspace.
>>  - It must handle routes pointing tunnel VIFs. The current implementation
>>    does not support multicast forwarding to tunnels, thus routes that point
>>    to a tunnel should be trapped to the kernel.
>>  - It must be aware of proxy multicast routes, which include both (*,*)
>>    routes and duplicate routes. Currently proxy routes are not offloaded
>>    and trigger the abort mechanism: removal of all routes from hardware and
>>    triggering the traffic to go through the kernel.
>>
>> The multicast routing offloading logic also updates the counters of the
>> offloaded MFC routes in a periodic work.
>>
>> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
>> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>> v1->v2:
>>  - Update the lastuse MFC entry field too, in addition to packets an bytes.
>> ---
>>  drivers/net/ethernet/mellanox/mlxsw/Makefile      |    3 +-
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum.h    |    1 +
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c | 1014 +++++++++++++++++++++
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h |  133 +++
>>  4 files changed, 1150 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h
>>
> [snip]
>> +static void mlxsw_sp_mr_route_erase(struct mlxsw_sp_mr_table *mr_table,
>> +				    struct mlxsw_sp_mr_route *mr_route)
>> +{
>> +	struct mlxsw_sp *mlxsw_sp = mr_table->mlxsw_sp;
>> +	struct mlxsw_sp_mr *mr = mlxsw_sp->mr;
>> +
>> +	mr->mr_ops->route_destroy(mlxsw_sp, mr->priv, mr_route->route_priv);
>> +	kfree(mr_route->route_priv);
>> +}
>> +
>> +static struct mlxsw_sp_mr_route *
>> +mlxsw_sp_mr_route4_create(struct mlxsw_sp_mr_table *mr_table,
>> +			  struct mfc_cache *mfc)
>> +{
>> +	struct mlxsw_sp_mr_route_vif_entry *rve, *tmp;
>> +	struct mlxsw_sp_mr_route *mr_route;
>> +	int err;
>> +	int i;
>> +
>> +	/* Allocate and init a new route and fill it with parameters */
>> +	mr_route = kzalloc(sizeof(*mr_table), GFP_KERNEL);
> sizeof(*mr_table) ? Shouldn't you allocate sizeof struct mlsw_sp_mr_route (*mr_route) here ?
>

Seems like you are right. Because of the fact that sizeof(*mr_table) is much
bigger than sizeof(*mr_route), all our tests did not notice it.

Thanks for that!

>> +	if (!mr_route)
>> +		return ERR_PTR(-ENOMEM);
>> +	INIT_LIST_HEAD(&mr_route->evif_list);
>> +	mlxsw_sp_mr_route4_key(mr_table, &mr_route->key, mfc);
>> +
>> +	/* Find min_mtu and link iVIF and eVIFs */
>> +	mr_route->min_mtu = ETH_MAX_MTU;
>> +	ipmr_cache_hold(mfc);
>> +	mr_route->mfc4 = mfc;
>> +	mr_route->mr_table = mr_table;
>> +	for (i = 0; i < MAXVIFS; i++) {
>> +		if (mfc->mfc_un.res.ttls[i] != 255) {
>> +			err = mlxsw_sp_mr_route_evif_link(mr_route,
>> +							  &mr_table->vifs[i]);
>> +			if (err)
>> +				goto err;
>> +			if (mr_table->vifs[i].dev &&
>> +			    mr_table->vifs[i].dev->mtu < mr_route->min_mtu)
>> +				mr_route->min_mtu = mr_table->vifs[i].dev->mtu;
>> +		}
>> +	}
>> +	mlxsw_sp_mr_route_ivif_link(mr_route, &mr_table->vifs[mfc->mfc_parent]);
>> +	if (err)
>> +		goto err;
>> +
>> +	mr_route->route_action = mlxsw_sp_mr_route_action(mr_route);
>> +	return mr_route;
>> +err:
>> +	ipmr_cache_put(mfc);
>> +	list_for_each_entry_safe(rve, tmp, &mr_route->evif_list, route_node)
>> +		mlxsw_sp_mr_route_evif_unlink(rve);
>> +	kfree(mr_route);
>> +	return ERR_PTR(err);
>> +}
>> +
>> +static void mlxsw_sp_mr_route4_destroy(struct mlxsw_sp_mr_table *mr_table,
>> +				       struct mlxsw_sp_mr_route *mr_route)
>> +{
>> +	struct mlxsw_sp_mr_route_vif_entry *rve, *tmp;
>> +
>> +	mlxsw_sp_mr_route_ivif_unlink(mr_route);
>> +	ipmr_cache_put(mr_route->mfc4);
>> +	list_for_each_entry_safe(rve, tmp, &mr_route->evif_list, route_node)
>> +		mlxsw_sp_mr_route_evif_unlink(rve);
>> +	kfree(mr_route);
>> +}
> [snip]
>

^ permalink raw reply

* Re: [PATCH net-next 0/7] nfp: flower vxlan tunnel offload
From: Jakub Kicinski @ 2017-09-25 11:00 UTC (permalink / raw)
  To: Simon Horman; +Cc: David Miller, netdev, oss-drivers
In-Reply-To: <1506335021-32024-1-git-send-email-simon.horman@netronome.com>

On Mon, 25 Sep 2017 12:23:34 +0200, Simon Horman wrote:
> From: Simon Horman <simon.horman@netronome.com>
> 
> John says:
> 
> This patch set allows offloading of TC flower match and set tunnel fields
> to the NFP. The initial focus is on VXLAN traffic. Due to the current
> state of the NFP firmware, only VXLAN traffic on well known port 4789 is
> handled. The match and action fields must explicity set this value to be
> supported. Tunnel end point information is also offloaded to the NFP for
> both encapsulation and decapsulation. The NFP expects 3 separate data sets
> to be supplied.
...

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* [PATCH v2 00/16] Thunderbolt networking
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev

Hi all,

In addition of tunneling PCIe, Display Port and USB traffic, Thunderbolt
allows connecting two hosts (domains) over a Thunderbolt cable. It is
possible to tunnel arbitrary data packets over such connection using
high-speed DMA rings available in the Thunderbolt host controller.

In order to discover Thunderbolt services the other host supports, there is
a software protocol running on top of the automatically configured control
channel (ring 0). This protocol is called XDomain discovery protocol and it
uses XDomain properties to describe the host (domain) and the services it
supports.

Once both sides have agreed what services are supported they can enable
high-speed DMA rings to transfer data over the cable.

This series adds support for the XDomain protocol so that we expose each
remote connection as Thunderbolt XDomain device and each service as
Thunderbolt service device. On top of that we create an API that allows
writing drivers for these services and finally we provide an example
Thunderbolt service driver that creates virtual ethernet inferface that
allows tunneling networking packets over Thunderbolt cable. The API could
be used for creating other Thunderbolt services, such as tunneling SCSI for
example.

The XDomain protocol and networking support is also available in macOS and
Windows so this makes it possible to connect Linux to macOS and Windows as
well.

The patches are based on previous Thunderbolt networking patch series by
Amir Levy and Michael Jamet, that can be found here:

  https://lwn.net/Articles/705998/

The main difference to that patch series is that we have the XDomain
protocol running in the kernel now so there is no need for a separate
userspace daemon.

Note this does not affect the existing functionality, so security levels
and NVM firmware upgrade continue to work as before (with the small
exception that now sysfs also shows the XDomain connections and services in
addition to normal Thunderbolt devices). It is also possible to connect up
to 5 Thunderbolt devices and then another host, and the network driver
works exactly the same.

This is second version of the patch series. The previous version (v1) can
be found here:

  https://lwn.net/Articles/734019/

Changes from the v1:

  * Add include/linux/thunderbolt.h to MAINTAINERS
  * Correct Linux version and date of new sysfs entries in
    Documentation/ABI/testing/sysfs-bus-thunderbolt
  * Move network driver from drivers/thunderbolt/net.c to
    drivers/net/thunderbolt.c and update it to follow coding style in
    drivers/net/*.
  * Add MAINTAINERS entry for the network driver
  * Minor cleanups

In case someone wants to try this out, patch [16/16] adds documentation how
the networking driver can be used. In short, if you connect Linux to a
macOS or Windows, everything is done automatically (as those systems have
the networking service enabled by default). For Linux to Linux connection
one host needs to load the networking driver first (so that the other side
can locate the networking service and load the corresponding driver).

Amir Levy (1):
  net: Add support for networking over Thunderbolt cable

Mika Westerberg (15):
  byteorder: Move {cpu_to_be32,be32_to_cpu}_array() from Thunderbolt to core
  thunderbolt: Add support for XDomain properties
  thunderbolt: Move enum tb_cfg_pkg_type to thunderbolt.h
  thunderbolt: Move thunderbolt domain structure to thunderbolt.h
  thunderbolt: Move tb_switch_phy_port_from_link() to thunderbolt.h
  thunderbolt: Add support for XDomain discovery protocol
  thunderbolt: Configure interrupt throttling for all interrupts
  thunderbolt: Add support for frame mode
  thunderbolt: Export ring handling functions to modules
  thunderbolt: Move ring descriptor flags to thunderbolt.h
  thunderbolt: Use spinlock in ring serialization
  thunderbolt: Use spinlock in NHI serialization
  thunderbolt: Add polling mode for rings
  thunderbolt: Add function to retrieve DMA device for the ring
  thunderbolt: Allocate ring HopID automatically if requested

 Documentation/ABI/testing/sysfs-bus-thunderbolt |   48 +
 Documentation/admin-guide/thunderbolt.rst       |   24 +
 MAINTAINERS                                     |    7 +
 drivers/net/Kconfig                             |   12 +
 drivers/net/Makefile                            |    3 +
 drivers/net/thunderbolt.c                       | 1379 ++++++++++++++++++++
 drivers/thunderbolt/Makefile                    |    2 +-
 drivers/thunderbolt/ctl.c                       |   46 +-
 drivers/thunderbolt/ctl.h                       |    3 +-
 drivers/thunderbolt/domain.c                    |  197 ++-
 drivers/thunderbolt/icm.c                       |  218 +++-
 drivers/thunderbolt/nhi.c                       |  409 ++++--
 drivers/thunderbolt/nhi.h                       |  141 +-
 drivers/thunderbolt/nhi_regs.h                  |   11 +-
 drivers/thunderbolt/property.c                  |  670 ++++++++++
 drivers/thunderbolt/switch.c                    |    7 +-
 drivers/thunderbolt/tb.h                        |   88 +-
 drivers/thunderbolt/tb_msgs.h                   |  140 +-
 drivers/thunderbolt/xdomain.c                   | 1576 +++++++++++++++++++++++
 include/linux/byteorder/generic.h               |   16 +
 include/linux/mod_devicetable.h                 |   26 +
 include/linux/thunderbolt.h                     |  598 +++++++++
 scripts/mod/devicetable-offsets.c               |    7 +
 scripts/mod/file2alias.c                        |   25 +
 24 files changed, 5304 insertions(+), 349 deletions(-)
 create mode 100644 drivers/net/thunderbolt.c
 create mode 100644 drivers/thunderbolt/property.c
 create mode 100644 drivers/thunderbolt/xdomain.c
 create mode 100644 include/linux/thunderbolt.h

-- 
2.14.1

^ permalink raw reply

* [PATCH v2 01/16] byteorder: Move {cpu_to_be32,be32_to_cpu}_array() from Thunderbolt to core
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

We will be using these when communicating XDomain discovery protocol
over Thunderbolt link but they might be useful for other drivers as
well.

Make them available through byteorder/generic.h.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/ctl.c         | 14 --------------
 include/linux/byteorder/generic.h | 16 ++++++++++++++++
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index fb40dd0588b9..e6a4c9458c76 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -289,20 +289,6 @@ static void tb_cfg_print_error(struct tb_ctl *ctl,
 	}
 }
 
-static void cpu_to_be32_array(__be32 *dst, const u32 *src, size_t len)
-{
-	int i;
-	for (i = 0; i < len; i++)
-		dst[i] = cpu_to_be32(src[i]);
-}
-
-static void be32_to_cpu_array(u32 *dst, __be32 *src, size_t len)
-{
-	int i;
-	for (i = 0; i < len; i++)
-		dst[i] = be32_to_cpu(src[i]);
-}
-
 static __be32 tb_crc(const void *data, size_t len)
 {
 	return cpu_to_be32(~__crc32c_le(~0, data, len));
diff --git a/include/linux/byteorder/generic.h b/include/linux/byteorder/generic.h
index 89f67c1c3160..805d16654459 100644
--- a/include/linux/byteorder/generic.h
+++ b/include/linux/byteorder/generic.h
@@ -170,4 +170,20 @@ static inline void be64_add_cpu(__be64 *var, u64 val)
 	*var = cpu_to_be64(be64_to_cpu(*var) + val);
 }
 
+static inline void cpu_to_be32_array(__be32 *dst, const u32 *src, size_t len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		dst[i] = cpu_to_be32(src[i]);
+}
+
+static inline void be32_to_cpu_array(u32 *dst, const __be32 *src, size_t len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		dst[i] = be32_to_cpu(src[i]);
+}
+
 #endif /* _LINUX_BYTEORDER_GENERIC_H */
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 03/16] thunderbolt: Move enum tb_cfg_pkg_type to thunderbolt.h
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

These will be needed by Thunderbolt services when sending and receiving
XDomain control messages. While there change TB_CFG_PKG_PREPARE_TO_SLEEP
value to be decimal in order to be consistent with other members.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/ctl.h     |  1 +
 drivers/thunderbolt/tb_msgs.h | 17 -----------------
 include/linux/thunderbolt.h   | 17 +++++++++++++++++
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/thunderbolt/ctl.h b/drivers/thunderbolt/ctl.h
index 36fd28b1c1c5..d0f21e1e0b8b 100644
--- a/drivers/thunderbolt/ctl.h
+++ b/drivers/thunderbolt/ctl.h
@@ -8,6 +8,7 @@
 #define _TB_CFG
 
 #include <linux/kref.h>
+#include <linux/thunderbolt.h>
 
 #include "nhi.h"
 #include "tb_msgs.h"
diff --git a/drivers/thunderbolt/tb_msgs.h b/drivers/thunderbolt/tb_msgs.h
index de6441e4a060..fe3039b05da6 100644
--- a/drivers/thunderbolt/tb_msgs.h
+++ b/drivers/thunderbolt/tb_msgs.h
@@ -15,23 +15,6 @@
 #include <linux/types.h>
 #include <linux/uuid.h>
 
-enum tb_cfg_pkg_type {
-	TB_CFG_PKG_READ = 1,
-	TB_CFG_PKG_WRITE = 2,
-	TB_CFG_PKG_ERROR = 3,
-	TB_CFG_PKG_NOTIFY_ACK = 4,
-	TB_CFG_PKG_EVENT = 5,
-	TB_CFG_PKG_XDOMAIN_REQ = 6,
-	TB_CFG_PKG_XDOMAIN_RESP = 7,
-	TB_CFG_PKG_OVERRIDE = 8,
-	TB_CFG_PKG_RESET = 9,
-	TB_CFG_PKG_ICM_EVENT = 10,
-	TB_CFG_PKG_ICM_CMD = 11,
-	TB_CFG_PKG_ICM_RESP = 12,
-	TB_CFG_PKG_PREPARE_TO_SLEEP = 0xd,
-
-};
-
 enum tb_cfg_space {
 	TB_CFG_HOPS = 0,
 	TB_CFG_PORT = 1,
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index 96561c1265ae..b512b1e2b4f2 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -1,6 +1,7 @@
 /*
  * Thunderbolt service API
  *
+ * Copyright (C) 2014 Andreas Noever <andreas.noever@gmail.com>
  * Copyright (C) 2017, Intel Corporation
  * Authors: Michael Jamet <michael.jamet@intel.com>
  *          Mika Westerberg <mika.westerberg@linux.intel.com>
@@ -16,6 +17,22 @@
 #include <linux/list.h>
 #include <linux/uuid.h>
 
+enum tb_cfg_pkg_type {
+	TB_CFG_PKG_READ = 1,
+	TB_CFG_PKG_WRITE = 2,
+	TB_CFG_PKG_ERROR = 3,
+	TB_CFG_PKG_NOTIFY_ACK = 4,
+	TB_CFG_PKG_EVENT = 5,
+	TB_CFG_PKG_XDOMAIN_REQ = 6,
+	TB_CFG_PKG_XDOMAIN_RESP = 7,
+	TB_CFG_PKG_OVERRIDE = 8,
+	TB_CFG_PKG_RESET = 9,
+	TB_CFG_PKG_ICM_EVENT = 10,
+	TB_CFG_PKG_ICM_CMD = 11,
+	TB_CFG_PKG_ICM_RESP = 12,
+	TB_CFG_PKG_PREPARE_TO_SLEEP = 13,
+};
+
 /**
  * struct tb_property_dir - XDomain property directory
  * @uuid: Directory UUID or %NULL if root directory
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 04/16] thunderbolt: Move thunderbolt domain structure to thunderbolt.h
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

These are needed by Thunderbolt services so move them to thunderbolt.h
to make sure they are available outside of drivers/thunderbolt.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/tb.h    | 42 ------------------------------------------
 include/linux/thunderbolt.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h
index e0deee4f1eb0..2fefe76621ca 100644
--- a/drivers/thunderbolt/tb.h
+++ b/drivers/thunderbolt/tb.h
@@ -39,20 +39,6 @@ struct tb_switch_nvm {
 	bool authenticating;
 };
 
-/**
- * enum tb_security_level - Thunderbolt security level
- * @TB_SECURITY_NONE: No security, legacy mode
- * @TB_SECURITY_USER: User approval required at minimum
- * @TB_SECURITY_SECURE: One time saved key required at minimum
- * @TB_SECURITY_DPONLY: Only tunnel Display port (and USB)
- */
-enum tb_security_level {
-	TB_SECURITY_NONE,
-	TB_SECURITY_USER,
-	TB_SECURITY_SECURE,
-	TB_SECURITY_DPONLY,
-};
-
 #define TB_SWITCH_KEY_SIZE		32
 /* Each physical port contains 2 links on modern controllers */
 #define TB_SWITCH_LINKS_PER_PHY_PORT	2
@@ -223,33 +209,6 @@ struct tb_cm_ops {
 	int (*disconnect_pcie_paths)(struct tb *tb);
 };
 
-/**
- * struct tb - main thunderbolt bus structure
- * @dev: Domain device
- * @lock: Big lock. Must be held when accessing any struct
- *	  tb_switch / struct tb_port.
- * @nhi: Pointer to the NHI structure
- * @ctl: Control channel for this domain
- * @wq: Ordered workqueue for all domain specific work
- * @root_switch: Root switch of this domain
- * @cm_ops: Connection manager specific operations vector
- * @index: Linux assigned domain number
- * @security_level: Current security level
- * @privdata: Private connection manager specific data
- */
-struct tb {
-	struct device dev;
-	struct mutex lock;
-	struct tb_nhi *nhi;
-	struct tb_ctl *ctl;
-	struct workqueue_struct *wq;
-	struct tb_switch *root_switch;
-	const struct tb_cm_ops *cm_ops;
-	int index;
-	enum tb_security_level security_level;
-	unsigned long privdata[0];
-};
-
 static inline void *tb_priv(struct tb *tb)
 {
 	return (void *)tb->privdata;
@@ -368,7 +327,6 @@ static inline int tb_port_write(struct tb_port *port, const void *buffer,
 struct tb *icm_probe(struct tb_nhi *nhi);
 struct tb *tb_probe(struct tb_nhi *nhi);
 
-extern struct bus_type tb_bus_type;
 extern struct device_type tb_domain_type;
 extern struct device_type tb_switch_type;
 
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index b512b1e2b4f2..910b1bf92112 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -14,7 +14,9 @@
 #ifndef THUNDERBOLT_H_
 #define THUNDERBOLT_H_
 
+#include <linux/device.h>
 #include <linux/list.h>
+#include <linux/mutex.h>
 #include <linux/uuid.h>
 
 enum tb_cfg_pkg_type {
@@ -33,6 +35,49 @@ enum tb_cfg_pkg_type {
 	TB_CFG_PKG_PREPARE_TO_SLEEP = 13,
 };
 
+/**
+ * enum tb_security_level - Thunderbolt security level
+ * @TB_SECURITY_NONE: No security, legacy mode
+ * @TB_SECURITY_USER: User approval required at minimum
+ * @TB_SECURITY_SECURE: One time saved key required at minimum
+ * @TB_SECURITY_DPONLY: Only tunnel Display port (and USB)
+ */
+enum tb_security_level {
+	TB_SECURITY_NONE,
+	TB_SECURITY_USER,
+	TB_SECURITY_SECURE,
+	TB_SECURITY_DPONLY,
+};
+
+/**
+ * struct tb - main thunderbolt bus structure
+ * @dev: Domain device
+ * @lock: Big lock. Must be held when accessing any struct
+ *	  tb_switch / struct tb_port.
+ * @nhi: Pointer to the NHI structure
+ * @ctl: Control channel for this domain
+ * @wq: Ordered workqueue for all domain specific work
+ * @root_switch: Root switch of this domain
+ * @cm_ops: Connection manager specific operations vector
+ * @index: Linux assigned domain number
+ * @security_level: Current security level
+ * @privdata: Private connection manager specific data
+ */
+struct tb {
+	struct device dev;
+	struct mutex lock;
+	struct tb_nhi *nhi;
+	struct tb_ctl *ctl;
+	struct workqueue_struct *wq;
+	struct tb_switch *root_switch;
+	const struct tb_cm_ops *cm_ops;
+	int index;
+	enum tb_security_level security_level;
+	unsigned long privdata[0];
+};
+
+extern struct bus_type tb_bus_type;
+
 /**
  * struct tb_property_dir - XDomain property directory
  * @uuid: Directory UUID or %NULL if root directory
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 05/16] thunderbolt: Move tb_switch_phy_port_from_link() to thunderbolt.h
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

A Thunderbolt service might need to find the physical port from a link
the cable is connected to. For instance networking driver uses this
information to generate MAC address according the Apple ThunderboltIP
protocol.

Move this function to thunderbolt.h and rename it to
tb_phy_port_from_link() to reflect the fact that it does not take switch
as parameter.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/icm.c   | 2 +-
 drivers/thunderbolt/tb.h    | 7 -------
 include/linux/thunderbolt.h | 7 +++++++
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/thunderbolt/icm.c b/drivers/thunderbolt/icm.c
index 53250fc057e1..8c22b91ed040 100644
--- a/drivers/thunderbolt/icm.c
+++ b/drivers/thunderbolt/icm.c
@@ -89,7 +89,7 @@ static inline struct tb *icm_to_tb(struct icm *icm)
 
 static inline u8 phy_port_from_route(u64 route, u8 depth)
 {
-	return tb_switch_phy_port_from_link(route >> ((depth - 1) * 8));
+	return tb_phy_port_from_link(route >> ((depth - 1) * 8));
 }
 
 static inline u8 dual_link_from_link(u8 link)
diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h
index 2fefe76621ca..ea21d927bd09 100644
--- a/drivers/thunderbolt/tb.h
+++ b/drivers/thunderbolt/tb.h
@@ -40,8 +40,6 @@ struct tb_switch_nvm {
 };
 
 #define TB_SWITCH_KEY_SIZE		32
-/* Each physical port contains 2 links on modern controllers */
-#define TB_SWITCH_LINKS_PER_PHY_PORT	2
 
 /**
  * struct tb_switch - a thunderbolt switch
@@ -367,11 +365,6 @@ struct tb_switch *tb_switch_find_by_link_depth(struct tb *tb, u8 link,
 					       u8 depth);
 struct tb_switch *tb_switch_find_by_uuid(struct tb *tb, const uuid_t *uuid);
 
-static inline unsigned int tb_switch_phy_port_from_link(unsigned int link)
-{
-	return (link - 1) / TB_SWITCH_LINKS_PER_PHY_PORT;
-}
-
 static inline void tb_switch_put(struct tb_switch *sw)
 {
 	put_device(&sw->dev);
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index 910b1bf92112..43b8d1e09341 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -78,6 +78,13 @@ struct tb {
 
 extern struct bus_type tb_bus_type;
 
+#define TB_LINKS_PER_PHY_PORT	2
+
+static inline unsigned int tb_phy_port_from_link(unsigned int link)
+{
+	return (link - 1) / TB_LINKS_PER_PHY_PORT;
+}
+
 /**
  * struct tb_property_dir - XDomain property directory
  * @uuid: Directory UUID or %NULL if root directory
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 06/16] thunderbolt: Add support for XDomain discovery protocol
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

When two hosts are connected over a Thunderbolt cable, there is a
protocol they can use to communicate capabilities supported by the host.
The discovery protocol uses automatically configured control channel
(ring 0) and is build on top of request/response transactions using
special XDomain primitives provided by the Thunderbolt base protocol.

The capabilities consists of a root directory block of basic properties
used for identification of the host, and then there can be zero or more
directories each describing a Thunderbolt service and its capabilities.

Once both sides have discovered what is supported the two hosts can
setup high-speed DMA paths and transfer data to the other side using
whatever protocol was agreed based on the properties. The software
protocol used to communicate which DMA paths to enable is service
specific.

This patch adds support for the XDomain discovery protocol to the
Thunderbolt bus. We model each remote host connection as a Linux XDomain
device. For each Thunderbolt service found supported on the XDomain
device, we create Linux Thunderbolt service device which Thunderbolt
service drivers can then bind to based on the protocol identification
information retrieved from the property directory describing the
service.

This code is based on the work done by Amir Levy and Michael Jamet.

Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-thunderbolt |   48 +
 drivers/thunderbolt/Makefile                    |    2 +-
 drivers/thunderbolt/ctl.c                       |   11 +-
 drivers/thunderbolt/ctl.h                       |    2 +-
 drivers/thunderbolt/domain.c                    |  197 ++-
 drivers/thunderbolt/icm.c                       |  218 +++-
 drivers/thunderbolt/nhi.h                       |    2 +
 drivers/thunderbolt/switch.c                    |    7 +-
 drivers/thunderbolt/tb.h                        |   39 +-
 drivers/thunderbolt/tb_msgs.h                   |  123 ++
 drivers/thunderbolt/xdomain.c                   | 1576 +++++++++++++++++++++++
 include/linux/mod_devicetable.h                 |   26 +
 include/linux/thunderbolt.h                     |  242 ++++
 scripts/mod/devicetable-offsets.c               |    7 +
 scripts/mod/file2alias.c                        |   25 +
 15 files changed, 2507 insertions(+), 18 deletions(-)
 create mode 100644 drivers/thunderbolt/xdomain.c

diff --git a/Documentation/ABI/testing/sysfs-bus-thunderbolt b/Documentation/ABI/testing/sysfs-bus-thunderbolt
index 392bef5bd399..93798c02e28b 100644
--- a/Documentation/ABI/testing/sysfs-bus-thunderbolt
+++ b/Documentation/ABI/testing/sysfs-bus-thunderbolt
@@ -110,3 +110,51 @@ Description:	When new NVM image is written to the non-active NVM
 		is directly the status value from the DMA configuration
 		based mailbox before the device is power cycled. Writing
 		0 here clears the status.
+
+What:		/sys/bus/thunderbolt/devices/<xdomain>.<service>/key
+Date:		Jan 2018
+KernelVersion:	4.15
+Contact:	thunderbolt-software@lists.01.org
+Description:	This contains name of the property directory the XDomain
+		service exposes. This entry describes the protocol in
+		question. Following directories are already reserved by
+		the Apple XDomain specification:
+
+		network:  IP/ethernet over Thunderbolt
+		targetdm: Target disk mode protocol over Thunderbolt
+		extdisp:  External display mode protocol over Thunderbolt
+
+What:		/sys/bus/thunderbolt/devices/<xdomain>.<service>/modalias
+Date:		Jan 2018
+KernelVersion:	4.15
+Contact:	thunderbolt-software@lists.01.org
+Description:	Stores the same MODALIAS value emitted by uevent for
+		the XDomain service. Format: tbtsvc:kSpNvNrN
+
+What:		/sys/bus/thunderbolt/devices/<xdomain>.<service>/prtcid
+Date:		Jan 2018
+KernelVersion:	4.15
+Contact:	thunderbolt-software@lists.01.org
+Description:	This contains XDomain protocol identifier the XDomain
+		service supports.
+
+What:		/sys/bus/thunderbolt/devices/<xdomain>.<service>/prtcvers
+Date:		Jan 2018
+KernelVersion:	4.15
+Contact:	thunderbolt-software@lists.01.org
+Description:	This contains XDomain protocol version the XDomain
+		service supports.
+
+What:		/sys/bus/thunderbolt/devices/<xdomain>.<service>/prtcrevs
+Date:		Jan 2018
+KernelVersion:	4.15
+Contact:	thunderbolt-software@lists.01.org
+Description:	This contains XDomain software version the XDomain
+		service supports.
+
+What:		/sys/bus/thunderbolt/devices/<xdomain>.<service>/prtcstns
+Date:		Jan 2018
+KernelVersion:	4.15
+Contact:	thunderbolt-software@lists.01.org
+Description:	This contains XDomain service specific settings as
+		bitmask. Format: %x
diff --git a/drivers/thunderbolt/Makefile b/drivers/thunderbolt/Makefile
index 7afd21f5383a..f2f0de27252b 100644
--- a/drivers/thunderbolt/Makefile
+++ b/drivers/thunderbolt/Makefile
@@ -1,3 +1,3 @@
 obj-${CONFIG_THUNDERBOLT} := thunderbolt.o
 thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o eeprom.o
-thunderbolt-objs += domain.o dma_port.o icm.o property.o
+thunderbolt-objs += domain.o dma_port.o icm.o property.o xdomain.o
diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index e6a4c9458c76..46e393c5fd1d 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -368,10 +368,10 @@ static int tb_ctl_tx(struct tb_ctl *ctl, const void *data, size_t len,
 /**
  * tb_ctl_handle_event() - acknowledge a plug event, invoke ctl->callback
  */
-static void tb_ctl_handle_event(struct tb_ctl *ctl, enum tb_cfg_pkg_type type,
+static bool tb_ctl_handle_event(struct tb_ctl *ctl, enum tb_cfg_pkg_type type,
 				struct ctl_pkg *pkg, size_t size)
 {
-	ctl->callback(ctl->callback_data, type, pkg->buffer, size);
+	return ctl->callback(ctl->callback_data, type, pkg->buffer, size);
 }
 
 static void tb_ctl_rx_submit(struct ctl_pkg *pkg)
@@ -444,6 +444,8 @@ static void tb_ctl_rx_callback(struct tb_ring *ring, struct ring_frame *frame,
 		break;
 
 	case TB_CFG_PKG_EVENT:
+	case TB_CFG_PKG_XDOMAIN_RESP:
+	case TB_CFG_PKG_XDOMAIN_REQ:
 		if (*(__be32 *)(pkg->buffer + frame->size) != crc32) {
 			tb_ctl_err(pkg->ctl,
 				   "RX: checksum mismatch, dropping packet\n");
@@ -451,8 +453,9 @@ static void tb_ctl_rx_callback(struct tb_ring *ring, struct ring_frame *frame,
 		}
 		/* Fall through */
 	case TB_CFG_PKG_ICM_EVENT:
-		tb_ctl_handle_event(pkg->ctl, frame->eof, pkg, frame->size);
-		goto rx;
+		if (tb_ctl_handle_event(pkg->ctl, frame->eof, pkg, frame->size))
+			goto rx;
+		break;
 
 	default:
 		break;
diff --git a/drivers/thunderbolt/ctl.h b/drivers/thunderbolt/ctl.h
index d0f21e1e0b8b..85c49dd301ea 100644
--- a/drivers/thunderbolt/ctl.h
+++ b/drivers/thunderbolt/ctl.h
@@ -16,7 +16,7 @@
 /* control channel */
 struct tb_ctl;
 
-typedef void (*event_cb)(void *data, enum tb_cfg_pkg_type type,
+typedef bool (*event_cb)(void *data, enum tb_cfg_pkg_type type,
 			 const void *buf, size_t size);
 
 struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, event_cb cb, void *cb_data);
diff --git a/drivers/thunderbolt/domain.c b/drivers/thunderbolt/domain.c
index 9f2dcd48974d..9b90115319ce 100644
--- a/drivers/thunderbolt/domain.c
+++ b/drivers/thunderbolt/domain.c
@@ -20,6 +20,98 @@
 
 static DEFINE_IDA(tb_domain_ida);
 
+static bool match_service_id(const struct tb_service_id *id,
+			     const struct tb_service *svc)
+{
+	if (id->match_flags & TBSVC_MATCH_PROTOCOL_KEY) {
+		if (strcmp(id->protocol_key, svc->key))
+			return false;
+	}
+
+	if (id->match_flags & TBSVC_MATCH_PROTOCOL_ID) {
+		if (id->protocol_id != svc->prtcid)
+			return false;
+	}
+
+	if (id->match_flags & TBSVC_MATCH_PROTOCOL_VERSION) {
+		if (id->protocol_version != svc->prtcvers)
+			return false;
+	}
+
+	if (id->match_flags & TBSVC_MATCH_PROTOCOL_VERSION) {
+		if (id->protocol_revision != svc->prtcrevs)
+			return false;
+	}
+
+	return true;
+}
+
+static const struct tb_service_id *__tb_service_match(struct device *dev,
+						      struct device_driver *drv)
+{
+	struct tb_service_driver *driver;
+	const struct tb_service_id *ids;
+	struct tb_service *svc;
+
+	svc = tb_to_service(dev);
+	if (!svc)
+		return NULL;
+
+	driver = container_of(drv, struct tb_service_driver, driver);
+	if (!driver->id_table)
+		return NULL;
+
+	for (ids = driver->id_table; ids->match_flags != 0; ids++) {
+		if (match_service_id(ids, svc))
+			return ids;
+	}
+
+	return NULL;
+}
+
+static int tb_service_match(struct device *dev, struct device_driver *drv)
+{
+	return !!__tb_service_match(dev, drv);
+}
+
+static int tb_service_probe(struct device *dev)
+{
+	struct tb_service *svc = tb_to_service(dev);
+	struct tb_service_driver *driver;
+	const struct tb_service_id *id;
+
+	driver = container_of(dev->driver, struct tb_service_driver, driver);
+	id = __tb_service_match(dev, &driver->driver);
+
+	return driver->probe(svc, id);
+}
+
+static int tb_service_remove(struct device *dev)
+{
+	struct tb_service *svc = tb_to_service(dev);
+	struct tb_service_driver *driver;
+
+	driver = container_of(dev->driver, struct tb_service_driver, driver);
+	if (driver->remove)
+		driver->remove(svc);
+
+	return 0;
+}
+
+static void tb_service_shutdown(struct device *dev)
+{
+	struct tb_service_driver *driver;
+	struct tb_service *svc;
+
+	svc = tb_to_service(dev);
+	if (!svc || !dev->driver)
+		return;
+
+	driver = container_of(dev->driver, struct tb_service_driver, driver);
+	if (driver->shutdown)
+		driver->shutdown(svc);
+}
+
 static const char * const tb_security_names[] = {
 	[TB_SECURITY_NONE] = "none",
 	[TB_SECURITY_USER] = "user",
@@ -52,6 +144,10 @@ static const struct attribute_group *domain_attr_groups[] = {
 
 struct bus_type tb_bus_type = {
 	.name = "thunderbolt",
+	.match = tb_service_match,
+	.probe = tb_service_probe,
+	.remove = tb_service_remove,
+	.shutdown = tb_service_shutdown,
 };
 
 static void tb_domain_release(struct device *dev)
@@ -128,17 +224,26 @@ struct tb *tb_domain_alloc(struct tb_nhi *nhi, size_t privsize)
 	return NULL;
 }
 
-static void tb_domain_event_cb(void *data, enum tb_cfg_pkg_type type,
+static bool tb_domain_event_cb(void *data, enum tb_cfg_pkg_type type,
 			       const void *buf, size_t size)
 {
 	struct tb *tb = data;
 
 	if (!tb->cm_ops->handle_event) {
 		tb_warn(tb, "domain does not have event handler\n");
-		return;
+		return true;
 	}
 
-	tb->cm_ops->handle_event(tb, type, buf, size);
+	switch (type) {
+	case TB_CFG_PKG_XDOMAIN_REQ:
+	case TB_CFG_PKG_XDOMAIN_RESP:
+		return tb_xdomain_handle_request(tb, type, buf, size);
+
+	default:
+		tb->cm_ops->handle_event(tb, type, buf, size);
+	}
+
+	return true;
 }
 
 /**
@@ -443,9 +548,92 @@ int tb_domain_disconnect_pcie_paths(struct tb *tb)
 	return tb->cm_ops->disconnect_pcie_paths(tb);
 }
 
+/**
+ * tb_domain_approve_xdomain_paths() - Enable DMA paths for XDomain
+ * @tb: Domain enabling the DMA paths
+ * @xd: XDomain DMA paths are created to
+ *
+ * Calls connection manager specific method to enable DMA paths to the
+ * XDomain in question.
+ *
+ * Return: 0% in case of success and negative errno otherwise. In
+ * particular returns %-ENOTSUPP if the connection manager
+ * implementation does not support XDomains.
+ */
+int tb_domain_approve_xdomain_paths(struct tb *tb, struct tb_xdomain *xd)
+{
+	if (!tb->cm_ops->approve_xdomain_paths)
+		return -ENOTSUPP;
+
+	return tb->cm_ops->approve_xdomain_paths(tb, xd);
+}
+
+/**
+ * tb_domain_disconnect_xdomain_paths() - Disable DMA paths for XDomain
+ * @tb: Domain disabling the DMA paths
+ * @xd: XDomain whose DMA paths are disconnected
+ *
+ * Calls connection manager specific method to disconnect DMA paths to
+ * the XDomain in question.
+ *
+ * Return: 0% in case of success and negative errno otherwise. In
+ * particular returns %-ENOTSUPP if the connection manager
+ * implementation does not support XDomains.
+ */
+int tb_domain_disconnect_xdomain_paths(struct tb *tb, struct tb_xdomain *xd)
+{
+	if (!tb->cm_ops->disconnect_xdomain_paths)
+		return -ENOTSUPP;
+
+	return tb->cm_ops->disconnect_xdomain_paths(tb, xd);
+}
+
+static int disconnect_xdomain(struct device *dev, void *data)
+{
+	struct tb_xdomain *xd;
+	struct tb *tb = data;
+	int ret = 0;
+
+	xd = tb_to_xdomain(dev);
+	if (xd && xd->tb == tb)
+		ret = tb_xdomain_disable_paths(xd);
+
+	return ret;
+}
+
+/**
+ * tb_domain_disconnect_all_paths() - Disconnect all paths for the domain
+ * @tb: Domain whose paths are disconnected
+ *
+ * This function can be used to disconnect all paths (PCIe, XDomain) for
+ * example in preparation for host NVM firmware upgrade. After this is
+ * called the paths cannot be established without resetting the switch.
+ *
+ * Return: %0 in case of success and negative errno otherwise.
+ */
+int tb_domain_disconnect_all_paths(struct tb *tb)
+{
+	int ret;
+
+	ret = tb_domain_disconnect_pcie_paths(tb);
+	if (ret)
+		return ret;
+
+	return bus_for_each_dev(&tb_bus_type, NULL, tb, disconnect_xdomain);
+}
+
 int tb_domain_init(void)
 {
-	return bus_register(&tb_bus_type);
+	int ret;
+
+	ret = tb_xdomain_init();
+	if (ret)
+		return ret;
+	ret = bus_register(&tb_bus_type);
+	if (ret)
+		tb_xdomain_exit();
+
+	return ret;
 }
 
 void tb_domain_exit(void)
@@ -453,4 +641,5 @@ void tb_domain_exit(void)
 	bus_unregister(&tb_bus_type);
 	ida_destroy(&tb_domain_ida);
 	tb_switch_exit();
+	tb_xdomain_exit();
 }
diff --git a/drivers/thunderbolt/icm.c b/drivers/thunderbolt/icm.c
index 8c22b91ed040..ab02d13f40b7 100644
--- a/drivers/thunderbolt/icm.c
+++ b/drivers/thunderbolt/icm.c
@@ -60,6 +60,8 @@
  * @get_route: Find a route string for given switch
  * @device_connected: Handle device connected ICM message
  * @device_disconnected: Handle device disconnected ICM message
+ * @xdomain_connected - Handle XDomain connected ICM message
+ * @xdomain_disconnected - Handle XDomain disconnected ICM message
  */
 struct icm {
 	struct mutex request_lock;
@@ -74,6 +76,10 @@ struct icm {
 				 const struct icm_pkg_header *hdr);
 	void (*device_disconnected)(struct tb *tb,
 				    const struct icm_pkg_header *hdr);
+	void (*xdomain_connected)(struct tb *tb,
+				  const struct icm_pkg_header *hdr);
+	void (*xdomain_disconnected)(struct tb *tb,
+				     const struct icm_pkg_header *hdr);
 };
 
 struct icm_notification {
@@ -89,7 +95,10 @@ static inline struct tb *icm_to_tb(struct icm *icm)
 
 static inline u8 phy_port_from_route(u64 route, u8 depth)
 {
-	return tb_phy_port_from_link(route >> ((depth - 1) * 8));
+	u8 link;
+
+	link = depth ? route >> ((depth - 1) * 8) : route;
+	return tb_phy_port_from_link(link);
 }
 
 static inline u8 dual_link_from_link(u8 link)
@@ -320,6 +329,51 @@ static int icm_fr_challenge_switch_key(struct tb *tb, struct tb_switch *sw,
 	return 0;
 }
 
+static int icm_fr_approve_xdomain_paths(struct tb *tb, struct tb_xdomain *xd)
+{
+	struct icm_fr_pkg_approve_xdomain_response reply;
+	struct icm_fr_pkg_approve_xdomain request;
+	int ret;
+
+	memset(&request, 0, sizeof(request));
+	request.hdr.code = ICM_APPROVE_XDOMAIN;
+	request.link_info = xd->depth << ICM_LINK_INFO_DEPTH_SHIFT | xd->link;
+	memcpy(&request.remote_uuid, xd->remote_uuid, sizeof(*xd->remote_uuid));
+
+	request.transmit_path = xd->transmit_path;
+	request.transmit_ring = xd->transmit_ring;
+	request.receive_path = xd->receive_path;
+	request.receive_ring = xd->receive_ring;
+
+	memset(&reply, 0, sizeof(reply));
+	ret = icm_request(tb, &request, sizeof(request), &reply, sizeof(reply),
+			  1, ICM_TIMEOUT);
+	if (ret)
+		return ret;
+
+	if (reply.hdr.flags & ICM_FLAGS_ERROR)
+		return -EIO;
+
+	return 0;
+}
+
+static int icm_fr_disconnect_xdomain_paths(struct tb *tb, struct tb_xdomain *xd)
+{
+	u8 phy_port;
+	u8 cmd;
+
+	phy_port = tb_phy_port_from_link(xd->link);
+	if (phy_port == 0)
+		cmd = NHI_MAILBOX_DISCONNECT_PA;
+	else
+		cmd = NHI_MAILBOX_DISCONNECT_PB;
+
+	nhi_mailbox_cmd(tb->nhi, cmd, 1);
+	usleep_range(10, 50);
+	nhi_mailbox_cmd(tb->nhi, cmd, 2);
+	return 0;
+}
+
 static void remove_switch(struct tb_switch *sw)
 {
 	struct tb_switch *parent_sw;
@@ -475,6 +529,141 @@ icm_fr_device_disconnected(struct tb *tb, const struct icm_pkg_header *hdr)
 	tb_switch_put(sw);
 }
 
+static void remove_xdomain(struct tb_xdomain *xd)
+{
+	struct tb_switch *sw;
+
+	sw = tb_to_switch(xd->dev.parent);
+	tb_port_at(xd->route, sw)->xdomain = NULL;
+	tb_xdomain_remove(xd);
+}
+
+static void
+icm_fr_xdomain_connected(struct tb *tb, const struct icm_pkg_header *hdr)
+{
+	const struct icm_fr_event_xdomain_connected *pkg =
+		(const struct icm_fr_event_xdomain_connected *)hdr;
+	struct tb_xdomain *xd;
+	struct tb_switch *sw;
+	u8 link, depth;
+	bool approved;
+	u64 route;
+
+	/*
+	 * After NVM upgrade adding root switch device fails because we
+	 * initiated reset. During that time ICM might still send
+	 * XDomain connected message which we ignore here.
+	 */
+	if (!tb->root_switch)
+		return;
+
+	link = pkg->link_info & ICM_LINK_INFO_LINK_MASK;
+	depth = (pkg->link_info & ICM_LINK_INFO_DEPTH_MASK) >>
+		ICM_LINK_INFO_DEPTH_SHIFT;
+	approved = pkg->link_info & ICM_LINK_INFO_APPROVED;
+
+	if (link > ICM_MAX_LINK || depth > ICM_MAX_DEPTH) {
+		tb_warn(tb, "invalid topology %u.%u, ignoring\n", link, depth);
+		return;
+	}
+
+	route = get_route(pkg->local_route_hi, pkg->local_route_lo);
+
+	xd = tb_xdomain_find_by_uuid(tb, &pkg->remote_uuid);
+	if (xd) {
+		u8 xd_phy_port, phy_port;
+
+		xd_phy_port = phy_port_from_route(xd->route, xd->depth);
+		phy_port = phy_port_from_route(route, depth);
+
+		if (xd->depth == depth && xd_phy_port == phy_port) {
+			xd->link = link;
+			xd->route = route;
+			xd->is_unplugged = false;
+			tb_xdomain_put(xd);
+			return;
+		}
+
+		/*
+		 * If we find an existing XDomain connection remove it
+		 * now. We need to go through login handshake and
+		 * everything anyway to be able to re-establish the
+		 * connection.
+		 */
+		remove_xdomain(xd);
+		tb_xdomain_put(xd);
+	}
+
+	/*
+	 * Look if there already exists an XDomain in the same place
+	 * than the new one and in that case remove it because it is
+	 * most likely another host that got disconnected.
+	 */
+	xd = tb_xdomain_find_by_link_depth(tb, link, depth);
+	if (!xd) {
+		u8 dual_link;
+
+		dual_link = dual_link_from_link(link);
+		if (dual_link)
+			xd = tb_xdomain_find_by_link_depth(tb, dual_link,
+							   depth);
+	}
+	if (xd) {
+		remove_xdomain(xd);
+		tb_xdomain_put(xd);
+	}
+
+	/*
+	 * If the user disconnected a switch during suspend and
+	 * connected another host to the same port, remove the switch
+	 * first.
+	 */
+	sw = get_switch_at_route(tb->root_switch, route);
+	if (sw)
+		remove_switch(sw);
+
+	sw = tb_switch_find_by_link_depth(tb, link, depth);
+	if (!sw) {
+		tb_warn(tb, "no switch exists at %u.%u, ignoring\n", link,
+			depth);
+		return;
+	}
+
+	xd = tb_xdomain_alloc(sw->tb, &sw->dev, route,
+			      &pkg->local_uuid, &pkg->remote_uuid);
+	if (!xd) {
+		tb_switch_put(sw);
+		return;
+	}
+
+	xd->link = link;
+	xd->depth = depth;
+
+	tb_port_at(route, sw)->xdomain = xd;
+
+	tb_xdomain_add(xd);
+	tb_switch_put(sw);
+}
+
+static void
+icm_fr_xdomain_disconnected(struct tb *tb, const struct icm_pkg_header *hdr)
+{
+	const struct icm_fr_event_xdomain_disconnected *pkg =
+		(const struct icm_fr_event_xdomain_disconnected *)hdr;
+	struct tb_xdomain *xd;
+
+	/*
+	 * If the connection is through one or multiple devices, the
+	 * XDomain device is removed along with them so it is fine if we
+	 * cannot find it here.
+	 */
+	xd = tb_xdomain_find_by_uuid(tb, &pkg->remote_uuid);
+	if (xd) {
+		remove_xdomain(xd);
+		tb_xdomain_put(xd);
+	}
+}
+
 static struct pci_dev *get_upstream_port(struct pci_dev *pdev)
 {
 	struct pci_dev *parent;
@@ -594,6 +783,12 @@ static void icm_handle_notification(struct work_struct *work)
 	case ICM_EVENT_DEVICE_DISCONNECTED:
 		icm->device_disconnected(tb, n->pkg);
 		break;
+	case ICM_EVENT_XDOMAIN_CONNECTED:
+		icm->xdomain_connected(tb, n->pkg);
+		break;
+	case ICM_EVENT_XDOMAIN_DISCONNECTED:
+		icm->xdomain_disconnected(tb, n->pkg);
+		break;
 	}
 
 	mutex_unlock(&tb->lock);
@@ -927,6 +1122,10 @@ static void icm_unplug_children(struct tb_switch *sw)
 
 		if (tb_is_upstream_port(port))
 			continue;
+		if (port->xdomain) {
+			port->xdomain->is_unplugged = true;
+			continue;
+		}
 		if (!port->remote)
 			continue;
 
@@ -943,6 +1142,13 @@ static void icm_free_unplugged_children(struct tb_switch *sw)
 
 		if (tb_is_upstream_port(port))
 			continue;
+
+		if (port->xdomain && port->xdomain->is_unplugged) {
+			tb_xdomain_remove(port->xdomain);
+			port->xdomain = NULL;
+			continue;
+		}
+
 		if (!port->remote)
 			continue;
 
@@ -1009,8 +1215,10 @@ static int icm_start(struct tb *tb)
 	tb->root_switch->no_nvm_upgrade = x86_apple_machine;
 
 	ret = tb_switch_add(tb->root_switch);
-	if (ret)
+	if (ret) {
 		tb_switch_put(tb->root_switch);
+		tb->root_switch = NULL;
+	}
 
 	return ret;
 }
@@ -1042,6 +1250,8 @@ static const struct tb_cm_ops icm_fr_ops = {
 	.add_switch_key = icm_fr_add_switch_key,
 	.challenge_switch_key = icm_fr_challenge_switch_key,
 	.disconnect_pcie_paths = icm_disconnect_pcie_paths,
+	.approve_xdomain_paths = icm_fr_approve_xdomain_paths,
+	.disconnect_xdomain_paths = icm_fr_disconnect_xdomain_paths,
 };
 
 struct tb *icm_probe(struct tb_nhi *nhi)
@@ -1064,6 +1274,8 @@ struct tb *icm_probe(struct tb_nhi *nhi)
 		icm->get_route = icm_fr_get_route;
 		icm->device_connected = icm_fr_device_connected;
 		icm->device_disconnected = icm_fr_device_disconnected;
+		icm->xdomain_connected = icm_fr_xdomain_connected;
+		icm->xdomain_disconnected = icm_fr_xdomain_disconnected;
 		tb->cm_ops = &icm_fr_ops;
 		break;
 
@@ -1077,6 +1289,8 @@ struct tb *icm_probe(struct tb_nhi *nhi)
 		icm->get_route = icm_ar_get_route;
 		icm->device_connected = icm_fr_device_connected;
 		icm->device_disconnected = icm_fr_device_disconnected;
+		icm->xdomain_connected = icm_fr_xdomain_connected;
+		icm->xdomain_disconnected = icm_fr_xdomain_disconnected;
 		tb->cm_ops = &icm_fr_ops;
 		break;
 	}
diff --git a/drivers/thunderbolt/nhi.h b/drivers/thunderbolt/nhi.h
index 5b5bb2c436be..0e05828983db 100644
--- a/drivers/thunderbolt/nhi.h
+++ b/drivers/thunderbolt/nhi.h
@@ -157,6 +157,8 @@ enum nhi_mailbox_cmd {
 	NHI_MAILBOX_SAVE_DEVS = 0x05,
 	NHI_MAILBOX_DISCONNECT_PCIE_PATHS = 0x06,
 	NHI_MAILBOX_DRV_UNLOADS = 0x07,
+	NHI_MAILBOX_DISCONNECT_PA = 0x10,
+	NHI_MAILBOX_DISCONNECT_PB = 0x11,
 	NHI_MAILBOX_ALLOW_ALL_DEVS = 0x23,
 };
 
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index 53f40c57df59..dfc357d33e1e 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -171,11 +171,11 @@ static int nvm_authenticate_host(struct tb_switch *sw)
 
 	/*
 	 * Root switch NVM upgrade requires that we disconnect the
-	 * existing PCIe paths first (in case it is not in safe mode
+	 * existing paths first (in case it is not in safe mode
 	 * already).
 	 */
 	if (!sw->safe_mode) {
-		ret = tb_domain_disconnect_pcie_paths(sw->tb);
+		ret = tb_domain_disconnect_all_paths(sw->tb);
 		if (ret)
 			return ret;
 		/*
@@ -1363,6 +1363,9 @@ void tb_switch_remove(struct tb_switch *sw)
 		if (sw->ports[i].remote)
 			tb_switch_remove(sw->ports[i].remote->sw);
 		sw->ports[i].remote = NULL;
+		if (sw->ports[i].xdomain)
+			tb_xdomain_remove(sw->ports[i].xdomain);
+		sw->ports[i].xdomain = NULL;
 	}
 
 	if (!sw->is_unplugged)
diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h
index ea21d927bd09..74af9d4929ab 100644
--- a/drivers/thunderbolt/tb.h
+++ b/drivers/thunderbolt/tb.h
@@ -9,6 +9,7 @@
 
 #include <linux/nvmem-provider.h>
 #include <linux/pci.h>
+#include <linux/thunderbolt.h>
 #include <linux/uuid.h>
 
 #include "tb_regs.h"
@@ -109,14 +110,25 @@ struct tb_switch {
 
 /**
  * struct tb_port - a thunderbolt port, part of a tb_switch
+ * @config: Cached port configuration read from registers
+ * @sw: Switch the port belongs to
+ * @remote: Remote port (%NULL if not connected)
+ * @xdomain: Remote host (%NULL if not connected)
+ * @cap_phy: Offset, zero if not found
+ * @port: Port number on switch
+ * @disabled: Disabled by eeprom
+ * @dual_link_port: If the switch is connected using two ports, points
+ *		    to the other port.
+ * @link_nr: Is this primary or secondary port on the dual_link.
  */
 struct tb_port {
 	struct tb_regs_port_header config;
 	struct tb_switch *sw;
-	struct tb_port *remote; /* remote port, NULL if not connected */
-	int cap_phy; /* offset, zero if not found */
-	u8 port; /* port number on switch */
-	bool disabled; /* disabled by eeprom */
+	struct tb_port *remote;
+	struct tb_xdomain *xdomain;
+	int cap_phy;
+	u8 port;
+	bool disabled;
 	struct tb_port *dual_link_port;
 	u8 link_nr:1;
 };
@@ -189,6 +201,8 @@ struct tb_path {
  * @add_switch_key: Add key to switch
  * @challenge_switch_key: Challenge switch using key
  * @disconnect_pcie_paths: Disconnects PCIe paths before NVM update
+ * @approve_xdomain_paths: Approve (establish) XDomain DMA paths
+ * @disconnect_xdomain_paths: Disconnect XDomain DMA paths
  */
 struct tb_cm_ops {
 	int (*driver_ready)(struct tb *tb);
@@ -205,6 +219,8 @@ struct tb_cm_ops {
 	int (*challenge_switch_key)(struct tb *tb, struct tb_switch *sw,
 				    const u8 *challenge, u8 *response);
 	int (*disconnect_pcie_paths)(struct tb *tb);
+	int (*approve_xdomain_paths)(struct tb *tb, struct tb_xdomain *xd);
+	int (*disconnect_xdomain_paths)(struct tb *tb, struct tb_xdomain *xd);
 };
 
 static inline void *tb_priv(struct tb *tb)
@@ -331,6 +347,8 @@ extern struct device_type tb_switch_type;
 int tb_domain_init(void);
 void tb_domain_exit(void);
 void tb_switch_exit(void);
+int tb_xdomain_init(void);
+void tb_xdomain_exit(void);
 
 struct tb *tb_domain_alloc(struct tb_nhi *nhi, size_t privsize);
 int tb_domain_add(struct tb *tb);
@@ -343,6 +361,9 @@ int tb_domain_approve_switch(struct tb *tb, struct tb_switch *sw);
 int tb_domain_approve_switch_key(struct tb *tb, struct tb_switch *sw);
 int tb_domain_challenge_switch_key(struct tb *tb, struct tb_switch *sw);
 int tb_domain_disconnect_pcie_paths(struct tb *tb);
+int tb_domain_approve_xdomain_paths(struct tb *tb, struct tb_xdomain *xd);
+int tb_domain_disconnect_xdomain_paths(struct tb *tb, struct tb_xdomain *xd);
+int tb_domain_disconnect_all_paths(struct tb *tb);
 
 static inline void tb_domain_put(struct tb *tb)
 {
@@ -422,4 +443,14 @@ static inline u64 tb_downstream_route(struct tb_port *port)
 	       | ((u64) port->port << (port->sw->config.depth * 8));
 }
 
+bool tb_xdomain_handle_request(struct tb *tb, enum tb_cfg_pkg_type type,
+			       const void *buf, size_t size);
+struct tb_xdomain *tb_xdomain_alloc(struct tb *tb, struct device *parent,
+				    u64 route, const uuid_t *local_uuid,
+				    const uuid_t *remote_uuid);
+void tb_xdomain_add(struct tb_xdomain *xd);
+void tb_xdomain_remove(struct tb_xdomain *xd);
+struct tb_xdomain *tb_xdomain_find_by_link_depth(struct tb *tb, u8 link,
+						 u8 depth);
+
 #endif
diff --git a/drivers/thunderbolt/tb_msgs.h b/drivers/thunderbolt/tb_msgs.h
index fe3039b05da6..2a76908537a6 100644
--- a/drivers/thunderbolt/tb_msgs.h
+++ b/drivers/thunderbolt/tb_msgs.h
@@ -101,11 +101,14 @@ enum icm_pkg_code {
 	ICM_CHALLENGE_DEVICE = 0x5,
 	ICM_ADD_DEVICE_KEY = 0x6,
 	ICM_GET_ROUTE = 0xa,
+	ICM_APPROVE_XDOMAIN = 0x10,
 };
 
 enum icm_event_code {
 	ICM_EVENT_DEVICE_CONNECTED = 3,
 	ICM_EVENT_DEVICE_DISCONNECTED = 4,
+	ICM_EVENT_XDOMAIN_CONNECTED = 6,
+	ICM_EVENT_XDOMAIN_DISCONNECTED = 7,
 };
 
 struct icm_pkg_header {
@@ -188,6 +191,25 @@ struct icm_fr_event_device_disconnected {
 	u16 link_info;
 } __packed;
 
+struct icm_fr_event_xdomain_connected {
+	struct icm_pkg_header hdr;
+	u16 reserved;
+	u16 link_info;
+	uuid_t remote_uuid;
+	uuid_t local_uuid;
+	u32 local_route_hi;
+	u32 local_route_lo;
+	u32 remote_route_hi;
+	u32 remote_route_lo;
+} __packed;
+
+struct icm_fr_event_xdomain_disconnected {
+	struct icm_pkg_header hdr;
+	u16 reserved;
+	u16 link_info;
+	uuid_t remote_uuid;
+} __packed;
+
 struct icm_fr_pkg_add_device_key {
 	struct icm_pkg_header hdr;
 	uuid_t ep_uuid;
@@ -224,6 +246,28 @@ struct icm_fr_pkg_challenge_device_response {
 	u32 response[8];
 } __packed;
 
+struct icm_fr_pkg_approve_xdomain {
+	struct icm_pkg_header hdr;
+	u16 reserved;
+	u16 link_info;
+	uuid_t remote_uuid;
+	u16 transmit_path;
+	u16 transmit_ring;
+	u16 receive_path;
+	u16 receive_ring;
+} __packed;
+
+struct icm_fr_pkg_approve_xdomain_response {
+	struct icm_pkg_header hdr;
+	u16 reserved;
+	u16 link_info;
+	uuid_t remote_uuid;
+	u16 transmit_path;
+	u16 transmit_ring;
+	u16 receive_path;
+	u16 receive_ring;
+} __packed;
+
 /* Alpine Ridge only messages */
 
 struct icm_ar_pkg_get_route {
@@ -240,4 +284,83 @@ struct icm_ar_pkg_get_route_response {
 	u32 route_lo;
 } __packed;
 
+/* XDomain messages */
+
+struct tb_xdomain_header {
+	u32 route_hi;
+	u32 route_lo;
+	u32 length_sn;
+} __packed;
+
+#define TB_XDOMAIN_LENGTH_MASK	GENMASK(5, 0)
+#define TB_XDOMAIN_SN_MASK	GENMASK(28, 27)
+#define TB_XDOMAIN_SN_SHIFT	27
+
+enum tb_xdp_type {
+	UUID_REQUEST_OLD = 1,
+	UUID_RESPONSE = 2,
+	PROPERTIES_REQUEST,
+	PROPERTIES_RESPONSE,
+	PROPERTIES_CHANGED_REQUEST,
+	PROPERTIES_CHANGED_RESPONSE,
+	ERROR_RESPONSE,
+	UUID_REQUEST = 12,
+};
+
+struct tb_xdp_header {
+	struct tb_xdomain_header xd_hdr;
+	uuid_t uuid;
+	u32 type;
+} __packed;
+
+struct tb_xdp_properties {
+	struct tb_xdp_header hdr;
+	uuid_t src_uuid;
+	uuid_t dst_uuid;
+	u16 offset;
+	u16 reserved;
+} __packed;
+
+struct tb_xdp_properties_response {
+	struct tb_xdp_header hdr;
+	uuid_t src_uuid;
+	uuid_t dst_uuid;
+	u16 offset;
+	u16 data_length;
+	u32 generation;
+	u32 data[0];
+} __packed;
+
+/*
+ * Max length of data array single XDomain property response is allowed
+ * to carry.
+ */
+#define TB_XDP_PROPERTIES_MAX_DATA_LENGTH	\
+	(((256 - 4 - sizeof(struct tb_xdp_properties_response))) / 4)
+
+/* Maximum size of the total property block in dwords we allow */
+#define TB_XDP_PROPERTIES_MAX_LENGTH		500
+
+struct tb_xdp_properties_changed {
+	struct tb_xdp_header hdr;
+	uuid_t src_uuid;
+} __packed;
+
+struct tb_xdp_properties_changed_response {
+	struct tb_xdp_header hdr;
+} __packed;
+
+enum tb_xdp_error {
+	ERROR_SUCCESS,
+	ERROR_UNKNOWN_PACKET,
+	ERROR_UNKNOWN_DOMAIN,
+	ERROR_NOT_SUPPORTED,
+	ERROR_NOT_READY,
+};
+
+struct tb_xdp_error_response {
+	struct tb_xdp_header hdr;
+	u32 error;
+} __packed;
+
 #endif
diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
new file mode 100644
index 000000000000..1b929be8fdd6
--- /dev/null
+++ b/drivers/thunderbolt/xdomain.c
@@ -0,0 +1,1576 @@
+/*
+ * Thunderbolt XDomain discovery protocol support
+ *
+ * Copyright (C) 2017, Intel Corporation
+ * Authors: Michael Jamet <michael.jamet@intel.com>
+ *          Mika Westerberg <mika.westerberg@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/device.h>
+#include <linux/kmod.h>
+#include <linux/module.h>
+#include <linux/utsname.h>
+#include <linux/uuid.h>
+#include <linux/workqueue.h>
+
+#include "tb.h"
+
+#define XDOMAIN_DEFAULT_TIMEOUT			5000 /* ms */
+#define XDOMAIN_PROPERTIES_RETRIES		60
+#define XDOMAIN_PROPERTIES_CHANGED_RETRIES	10
+
+struct xdomain_request_work {
+	struct work_struct work;
+	struct tb_xdp_header *pkg;
+	struct tb *tb;
+};
+
+/* Serializes access to the properties and protocol handlers below */
+static DEFINE_MUTEX(xdomain_lock);
+
+/* Properties exposed to the remote domains */
+static struct tb_property_dir *xdomain_property_dir;
+static u32 *xdomain_property_block;
+static u32 xdomain_property_block_len;
+static u32 xdomain_property_block_gen;
+
+/* Additional protocol handlers */
+static LIST_HEAD(protocol_handlers);
+
+/* UUID for XDomain discovery protocol */
+static const uuid_t tb_xdp_uuid =
+	UUID_INIT(0xb638d70e, 0x42ff, 0x40bb,
+		  0x97, 0xc2, 0x90, 0xe2, 0xc0, 0xb2, 0xff, 0x07);
+
+static bool tb_xdomain_match(const struct tb_cfg_request *req,
+			     const struct ctl_pkg *pkg)
+{
+	switch (pkg->frame.eof) {
+	case TB_CFG_PKG_ERROR:
+		return true;
+
+	case TB_CFG_PKG_XDOMAIN_RESP: {
+		const struct tb_xdp_header *res_hdr = pkg->buffer;
+		const struct tb_xdp_header *req_hdr = req->request;
+		u8 req_seq, res_seq;
+
+		if (pkg->frame.size < req->response_size / 4)
+			return false;
+
+		/* Make sure route matches */
+		if ((res_hdr->xd_hdr.route_hi & ~BIT(31)) !=
+		     req_hdr->xd_hdr.route_hi)
+			return false;
+		if ((res_hdr->xd_hdr.route_lo) != req_hdr->xd_hdr.route_lo)
+			return false;
+
+		/* Then check that the sequence number matches */
+		res_seq = res_hdr->xd_hdr.length_sn & TB_XDOMAIN_SN_MASK;
+		res_seq >>= TB_XDOMAIN_SN_SHIFT;
+		req_seq = req_hdr->xd_hdr.length_sn & TB_XDOMAIN_SN_MASK;
+		req_seq >>= TB_XDOMAIN_SN_SHIFT;
+		if (res_seq != req_seq)
+			return false;
+
+		/* Check that the XDomain protocol matches */
+		if (!uuid_equal(&res_hdr->uuid, &req_hdr->uuid))
+			return false;
+
+		return true;
+	}
+
+	default:
+		return false;
+	}
+}
+
+static bool tb_xdomain_copy(struct tb_cfg_request *req,
+			    const struct ctl_pkg *pkg)
+{
+	memcpy(req->response, pkg->buffer, req->response_size);
+	req->result.err = 0;
+	return true;
+}
+
+static void response_ready(void *data)
+{
+	tb_cfg_request_put(data);
+}
+
+static int __tb_xdomain_response(struct tb_ctl *ctl, const void *response,
+				 size_t size, enum tb_cfg_pkg_type type)
+{
+	struct tb_cfg_request *req;
+
+	req = tb_cfg_request_alloc();
+	if (!req)
+		return -ENOMEM;
+
+	req->match = tb_xdomain_match;
+	req->copy = tb_xdomain_copy;
+	req->request = response;
+	req->request_size = size;
+	req->request_type = type;
+
+	return tb_cfg_request(ctl, req, response_ready, req);
+}
+
+/**
+ * tb_xdomain_response() - Send a XDomain response message
+ * @xd: XDomain to send the message
+ * @response: Response to send
+ * @size: Size of the response
+ * @type: PDF type of the response
+ *
+ * This can be used to send a XDomain response message to the other
+ * domain. No response for the message is expected.
+ *
+ * Return: %0 in case of success and negative errno in case of failure
+ */
+int tb_xdomain_response(struct tb_xdomain *xd, const void *response,
+			size_t size, enum tb_cfg_pkg_type type)
+{
+	return __tb_xdomain_response(xd->tb->ctl, response, size, type);
+}
+EXPORT_SYMBOL_GPL(tb_xdomain_response);
+
+static int __tb_xdomain_request(struct tb_ctl *ctl, const void *request,
+	size_t request_size, enum tb_cfg_pkg_type request_type, void *response,
+	size_t response_size, enum tb_cfg_pkg_type response_type,
+	unsigned int timeout_msec)
+{
+	struct tb_cfg_request *req;
+	struct tb_cfg_result res;
+
+	req = tb_cfg_request_alloc();
+	if (!req)
+		return -ENOMEM;
+
+	req->match = tb_xdomain_match;
+	req->copy = tb_xdomain_copy;
+	req->request = request;
+	req->request_size = request_size;
+	req->request_type = request_type;
+	req->response = response;
+	req->response_size = response_size;
+	req->response_type = response_type;
+
+	res = tb_cfg_request_sync(ctl, req, timeout_msec);
+
+	tb_cfg_request_put(req);
+
+	return res.err == 1 ? -EIO : res.err;
+}
+
+/**
+ * tb_xdomain_request() - Send a XDomain request
+ * @xd: XDomain to send the request
+ * @request: Request to send
+ * @request_size: Size of the request in bytes
+ * @request_type: PDF type of the request
+ * @response: Response is copied here
+ * @response_size: Expected size of the response in bytes
+ * @response_type: Expected PDF type of the response
+ * @timeout_msec: Timeout in milliseconds to wait for the response
+ *
+ * This function can be used to send XDomain control channel messages to
+ * the other domain. The function waits until the response is received
+ * or when timeout triggers. Whichever comes first.
+ *
+ * Return: %0 in case of success and negative errno in case of failure
+ */
+int tb_xdomain_request(struct tb_xdomain *xd, const void *request,
+	size_t request_size, enum tb_cfg_pkg_type request_type,
+	void *response, size_t response_size,
+	enum tb_cfg_pkg_type response_type, unsigned int timeout_msec)
+{
+	return __tb_xdomain_request(xd->tb->ctl, request, request_size,
+				    request_type, response, response_size,
+				    response_type, timeout_msec);
+}
+EXPORT_SYMBOL_GPL(tb_xdomain_request);
+
+static inline void tb_xdp_fill_header(struct tb_xdp_header *hdr, u64 route,
+	u8 sequence, enum tb_xdp_type type, size_t size)
+{
+	u32 length_sn;
+
+	length_sn = (size - sizeof(hdr->xd_hdr)) / 4;
+	length_sn |= (sequence << TB_XDOMAIN_SN_SHIFT) & TB_XDOMAIN_SN_MASK;
+
+	hdr->xd_hdr.route_hi = upper_32_bits(route);
+	hdr->xd_hdr.route_lo = lower_32_bits(route);
+	hdr->xd_hdr.length_sn = length_sn;
+	hdr->type = type;
+	memcpy(&hdr->uuid, &tb_xdp_uuid, sizeof(tb_xdp_uuid));
+}
+
+static int tb_xdp_handle_error(const struct tb_xdp_header *hdr)
+{
+	const struct tb_xdp_error_response *error;
+
+	if (hdr->type != ERROR_RESPONSE)
+		return 0;
+
+	error = (const struct tb_xdp_error_response *)hdr;
+
+	switch (error->error) {
+	case ERROR_UNKNOWN_PACKET:
+	case ERROR_UNKNOWN_DOMAIN:
+		return -EIO;
+	case ERROR_NOT_SUPPORTED:
+		return -ENOTSUPP;
+	case ERROR_NOT_READY:
+		return -EAGAIN;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int tb_xdp_error_response(struct tb_ctl *ctl, u64 route, u8 sequence,
+				 enum tb_xdp_error error)
+{
+	struct tb_xdp_error_response res;
+
+	memset(&res, 0, sizeof(res));
+	tb_xdp_fill_header(&res.hdr, route, sequence, ERROR_RESPONSE,
+			   sizeof(res));
+	res.error = error;
+
+	return __tb_xdomain_response(ctl, &res, sizeof(res),
+				     TB_CFG_PKG_XDOMAIN_RESP);
+}
+
+static int tb_xdp_properties_request(struct tb_ctl *ctl, u64 route,
+	const uuid_t *src_uuid, const uuid_t *dst_uuid, int retry,
+	u32 **block, u32 *generation)
+{
+	struct tb_xdp_properties_response *res;
+	struct tb_xdp_properties req;
+	u16 data_len, len;
+	size_t total_size;
+	u32 *data = NULL;
+	int ret;
+
+	total_size = sizeof(*res) + TB_XDP_PROPERTIES_MAX_DATA_LENGTH * 4;
+	res = kzalloc(total_size, GFP_KERNEL);
+	if (!res)
+		return -ENOMEM;
+
+	memset(&req, 0, sizeof(req));
+	tb_xdp_fill_header(&req.hdr, route, retry % 4, PROPERTIES_REQUEST,
+			   sizeof(req));
+	memcpy(&req.src_uuid, src_uuid, sizeof(*src_uuid));
+	memcpy(&req.dst_uuid, dst_uuid, sizeof(*dst_uuid));
+
+	len = 0;
+	data_len = 0;
+
+	do {
+		ret = __tb_xdomain_request(ctl, &req, sizeof(req),
+					   TB_CFG_PKG_XDOMAIN_REQ, res,
+					   total_size, TB_CFG_PKG_XDOMAIN_RESP,
+					   XDOMAIN_DEFAULT_TIMEOUT);
+		if (ret)
+			goto err;
+
+		ret = tb_xdp_handle_error(&res->hdr);
+		if (ret)
+			goto err;
+
+		/*
+		 * Package length includes the whole payload without the
+		 * XDomain header. Validate first that the package is at
+		 * least size of the response structure.
+		 */
+		len = res->hdr.xd_hdr.length_sn & TB_XDOMAIN_LENGTH_MASK;
+		if (len < sizeof(*res) / 4) {
+			ret = -EINVAL;
+			goto err;
+		}
+
+		len += sizeof(res->hdr.xd_hdr) / 4;
+		len -= sizeof(*res) / 4;
+
+		if (res->offset != req.offset) {
+			ret = -EINVAL;
+			goto err;
+		}
+
+		/*
+		 * First time allocate block that has enough space for
+		 * the whole properties block.
+		 */
+		if (!data) {
+			data_len = res->data_length;
+			if (data_len > TB_XDP_PROPERTIES_MAX_LENGTH) {
+				ret = -E2BIG;
+				goto err;
+			}
+
+			data = kcalloc(data_len, sizeof(u32), GFP_KERNEL);
+			if (!data) {
+				ret = -ENOMEM;
+				goto err;
+			}
+		}
+
+		memcpy(data + req.offset, res->data, len * 4);
+		req.offset += len;
+	} while (!data_len || req.offset < data_len);
+
+	*block = data;
+	*generation = res->generation;
+
+	kfree(res);
+
+	return data_len;
+
+err:
+	kfree(data);
+	kfree(res);
+
+	return ret;
+}
+
+static int tb_xdp_properties_response(struct tb *tb, struct tb_ctl *ctl,
+	u64 route, u8 sequence, const uuid_t *src_uuid,
+	const struct tb_xdp_properties *req)
+{
+	struct tb_xdp_properties_response *res;
+	size_t total_size;
+	u16 len;
+	int ret;
+
+	/*
+	 * Currently we expect all requests to be directed to us. The
+	 * protocol supports forwarding, though which we might add
+	 * support later on.
+	 */
+	if (!uuid_equal(src_uuid, &req->dst_uuid)) {
+		tb_xdp_error_response(ctl, route, sequence,
+				      ERROR_UNKNOWN_DOMAIN);
+		return 0;
+	}
+
+	mutex_lock(&xdomain_lock);
+
+	if (req->offset >= xdomain_property_block_len) {
+		mutex_unlock(&xdomain_lock);
+		return -EINVAL;
+	}
+
+	len = xdomain_property_block_len - req->offset;
+	len = min_t(u16, len, TB_XDP_PROPERTIES_MAX_DATA_LENGTH);
+	total_size = sizeof(*res) + len * 4;
+
+	res = kzalloc(total_size, GFP_KERNEL);
+	if (!res) {
+		mutex_unlock(&xdomain_lock);
+		return -ENOMEM;
+	}
+
+	tb_xdp_fill_header(&res->hdr, route, sequence, PROPERTIES_RESPONSE,
+			   total_size);
+	res->generation = xdomain_property_block_gen;
+	res->data_length = xdomain_property_block_len;
+	res->offset = req->offset;
+	uuid_copy(&res->src_uuid, src_uuid);
+	uuid_copy(&res->dst_uuid, &req->src_uuid);
+	memcpy(res->data, &xdomain_property_block[req->offset], len * 4);
+
+	mutex_unlock(&xdomain_lock);
+
+	ret = __tb_xdomain_response(ctl, res, total_size,
+				    TB_CFG_PKG_XDOMAIN_RESP);
+
+	kfree(res);
+	return ret;
+}
+
+static int tb_xdp_properties_changed_request(struct tb_ctl *ctl, u64 route,
+					     int retry, const uuid_t *uuid)
+{
+	struct tb_xdp_properties_changed_response res;
+	struct tb_xdp_properties_changed req;
+	int ret;
+
+	memset(&req, 0, sizeof(req));
+	tb_xdp_fill_header(&req.hdr, route, retry % 4,
+			   PROPERTIES_CHANGED_REQUEST, sizeof(req));
+	uuid_copy(&req.src_uuid, uuid);
+
+	memset(&res, 0, sizeof(res));
+	ret = __tb_xdomain_request(ctl, &req, sizeof(req),
+				   TB_CFG_PKG_XDOMAIN_REQ, &res, sizeof(res),
+				   TB_CFG_PKG_XDOMAIN_RESP,
+				   XDOMAIN_DEFAULT_TIMEOUT);
+	if (ret)
+		return ret;
+
+	return tb_xdp_handle_error(&res.hdr);
+}
+
+static int
+tb_xdp_properties_changed_response(struct tb_ctl *ctl, u64 route, u8 sequence)
+{
+	struct tb_xdp_properties_changed_response res;
+
+	memset(&res, 0, sizeof(res));
+	tb_xdp_fill_header(&res.hdr, route, sequence,
+			   PROPERTIES_CHANGED_RESPONSE, sizeof(res));
+	return __tb_xdomain_response(ctl, &res, sizeof(res),
+				     TB_CFG_PKG_XDOMAIN_RESP);
+}
+
+/**
+ * tb_register_protocol_handler() - Register protocol handler
+ * @handler: Handler to register
+ *
+ * This allows XDomain service drivers to hook into incoming XDomain
+ * messages. After this function is called the service driver needs to
+ * be able to handle calls to callback whenever a package with the
+ * registered protocol is received.
+ */
+int tb_register_protocol_handler(struct tb_protocol_handler *handler)
+{
+	if (!handler->uuid || !handler->callback)
+		return -EINVAL;
+	if (uuid_equal(handler->uuid, &tb_xdp_uuid))
+		return -EINVAL;
+
+	mutex_lock(&xdomain_lock);
+	list_add_tail(&handler->list, &protocol_handlers);
+	mutex_unlock(&xdomain_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(tb_register_protocol_handler);
+
+/**
+ * tb_unregister_protocol_handler() - Unregister protocol handler
+ * @handler: Handler to unregister
+ *
+ * Removes the previously registered protocol handler.
+ */
+void tb_unregister_protocol_handler(struct tb_protocol_handler *handler)
+{
+	mutex_lock(&xdomain_lock);
+	list_del_init(&handler->list);
+	mutex_unlock(&xdomain_lock);
+}
+EXPORT_SYMBOL_GPL(tb_unregister_protocol_handler);
+
+static void tb_xdp_handle_request(struct work_struct *work)
+{
+	struct xdomain_request_work *xw = container_of(work, typeof(*xw), work);
+	const struct tb_xdp_header *pkg = xw->pkg;
+	const struct tb_xdomain_header *xhdr = &pkg->xd_hdr;
+	struct tb *tb = xw->tb;
+	struct tb_ctl *ctl = tb->ctl;
+	const uuid_t *uuid;
+	int ret = 0;
+	u8 sequence;
+	u64 route;
+
+	route = ((u64)xhdr->route_hi << 32 | xhdr->route_lo) & ~BIT_ULL(63);
+	sequence = xhdr->length_sn & TB_XDOMAIN_SN_MASK;
+	sequence >>= TB_XDOMAIN_SN_SHIFT;
+
+	mutex_lock(&tb->lock);
+	if (tb->root_switch)
+		uuid = tb->root_switch->uuid;
+	else
+		uuid = NULL;
+	mutex_unlock(&tb->lock);
+
+	if (!uuid) {
+		tb_xdp_error_response(ctl, route, sequence, ERROR_NOT_READY);
+		goto out;
+	}
+
+	switch (pkg->type) {
+	case PROPERTIES_REQUEST:
+		ret = tb_xdp_properties_response(tb, ctl, route, sequence, uuid,
+			(const struct tb_xdp_properties *)pkg);
+		break;
+
+	case PROPERTIES_CHANGED_REQUEST: {
+		const struct tb_xdp_properties_changed *xchg =
+			(const struct tb_xdp_properties_changed *)pkg;
+		struct tb_xdomain *xd;
+
+		ret = tb_xdp_properties_changed_response(ctl, route, sequence);
+
+		/*
+		 * Since the properties have been changed, let's update
+		 * the xdomain related to this connection as well in
+		 * case there is a change in services it offers.
+		 */
+		xd = tb_xdomain_find_by_uuid_locked(tb, &xchg->src_uuid);
+		if (xd) {
+			queue_delayed_work(tb->wq, &xd->get_properties_work,
+					   msecs_to_jiffies(50));
+			tb_xdomain_put(xd);
+		}
+
+		break;
+	}
+
+	default:
+		break;
+	}
+
+	if (ret) {
+		tb_warn(tb, "failed to send XDomain response for %#x\n",
+			pkg->type);
+	}
+
+out:
+	kfree(xw->pkg);
+	kfree(xw);
+}
+
+static void
+tb_xdp_schedule_request(struct tb *tb, const struct tb_xdp_header *hdr,
+			size_t size)
+{
+	struct xdomain_request_work *xw;
+
+	xw = kmalloc(sizeof(*xw), GFP_KERNEL);
+	if (!xw)
+		return;
+
+	INIT_WORK(&xw->work, tb_xdp_handle_request);
+	xw->pkg = kmemdup(hdr, size, GFP_KERNEL);
+	xw->tb = tb;
+
+	queue_work(tb->wq, &xw->work);
+}
+
+/**
+ * tb_register_service_driver() - Register XDomain service driver
+ * @drv: Driver to register
+ *
+ * Registers new service driver from @drv to the bus.
+ */
+int tb_register_service_driver(struct tb_service_driver *drv)
+{
+	drv->driver.bus = &tb_bus_type;
+	return driver_register(&drv->driver);
+}
+EXPORT_SYMBOL_GPL(tb_register_service_driver);
+
+/**
+ * tb_unregister_service_driver() - Unregister XDomain service driver
+ * @xdrv: Driver to unregister
+ *
+ * Unregisters XDomain service driver from the bus.
+ */
+void tb_unregister_service_driver(struct tb_service_driver *drv)
+{
+	driver_unregister(&drv->driver);
+}
+EXPORT_SYMBOL_GPL(tb_unregister_service_driver);
+
+static ssize_t key_show(struct device *dev, struct device_attribute *attr,
+			char *buf)
+{
+	struct tb_service *svc = container_of(dev, struct tb_service, dev);
+
+	/*
+	 * It should be null terminated but anything else is pretty much
+	 * allowed.
+	 */
+	return sprintf(buf, "%*pEp\n", (int)strlen(svc->key), svc->key);
+}
+static DEVICE_ATTR_RO(key);
+
+static int get_modalias(struct tb_service *svc, char *buf, size_t size)
+{
+	return snprintf(buf, size, "tbsvc:k%sp%08Xv%08Xr%08X", svc->key,
+			svc->prtcid, svc->prtcvers, svc->prtcrevs);
+}
+
+static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct tb_service *svc = container_of(dev, struct tb_service, dev);
+
+	/* Full buffer size except new line and null termination */
+	get_modalias(svc, buf, PAGE_SIZE - 2);
+	return sprintf(buf, "%s\n", buf);
+}
+static DEVICE_ATTR_RO(modalias);
+
+static ssize_t prtcid_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct tb_service *svc = container_of(dev, struct tb_service, dev);
+
+	return sprintf(buf, "%u\n", svc->prtcid);
+}
+static DEVICE_ATTR_RO(prtcid);
+
+static ssize_t prtcvers_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct tb_service *svc = container_of(dev, struct tb_service, dev);
+
+	return sprintf(buf, "%u\n", svc->prtcvers);
+}
+static DEVICE_ATTR_RO(prtcvers);
+
+static ssize_t prtcrevs_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct tb_service *svc = container_of(dev, struct tb_service, dev);
+
+	return sprintf(buf, "%u\n", svc->prtcrevs);
+}
+static DEVICE_ATTR_RO(prtcrevs);
+
+static ssize_t prtcstns_show(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct tb_service *svc = container_of(dev, struct tb_service, dev);
+
+	return sprintf(buf, "0x%08x\n", svc->prtcstns);
+}
+static DEVICE_ATTR_RO(prtcstns);
+
+static struct attribute *tb_service_attrs[] = {
+	&dev_attr_key.attr,
+	&dev_attr_modalias.attr,
+	&dev_attr_prtcid.attr,
+	&dev_attr_prtcvers.attr,
+	&dev_attr_prtcrevs.attr,
+	&dev_attr_prtcstns.attr,
+	NULL,
+};
+
+static struct attribute_group tb_service_attr_group = {
+	.attrs = tb_service_attrs,
+};
+
+static const struct attribute_group *tb_service_attr_groups[] = {
+	&tb_service_attr_group,
+	NULL,
+};
+
+static int tb_service_uevent(struct device *dev, struct kobj_uevent_env *env)
+{
+	struct tb_service *svc = container_of(dev, struct tb_service, dev);
+	char modalias[64];
+
+	get_modalias(svc, modalias, sizeof(modalias));
+	return add_uevent_var(env, "MODALIAS=%s", modalias);
+}
+
+static void tb_service_release(struct device *dev)
+{
+	struct tb_service *svc = container_of(dev, struct tb_service, dev);
+	struct tb_xdomain *xd = tb_service_parent(svc);
+
+	ida_simple_remove(&xd->service_ids, svc->id);
+	kfree(svc->key);
+	kfree(svc);
+}
+
+struct device_type tb_service_type = {
+	.name = "thunderbolt_service",
+	.groups = tb_service_attr_groups,
+	.uevent = tb_service_uevent,
+	.release = tb_service_release,
+};
+EXPORT_SYMBOL_GPL(tb_service_type);
+
+static int remove_missing_service(struct device *dev, void *data)
+{
+	struct tb_xdomain *xd = data;
+	struct tb_service *svc;
+
+	svc = tb_to_service(dev);
+	if (!svc)
+		return 0;
+
+	if (!tb_property_find(xd->properties, svc->key,
+			      TB_PROPERTY_TYPE_DIRECTORY))
+		device_unregister(dev);
+
+	return 0;
+}
+
+static int find_service(struct device *dev, void *data)
+{
+	const struct tb_property *p = data;
+	struct tb_service *svc;
+
+	svc = tb_to_service(dev);
+	if (!svc)
+		return 0;
+
+	return !strcmp(svc->key, p->key);
+}
+
+static int populate_service(struct tb_service *svc,
+			    struct tb_property *property)
+{
+	struct tb_property_dir *dir = property->value.dir;
+	struct tb_property *p;
+
+	/* Fill in standard properties */
+	p = tb_property_find(dir, "prtcid", TB_PROPERTY_TYPE_VALUE);
+	if (p)
+		svc->prtcid = p->value.immediate;
+	p = tb_property_find(dir, "prtcvers", TB_PROPERTY_TYPE_VALUE);
+	if (p)
+		svc->prtcvers = p->value.immediate;
+	p = tb_property_find(dir, "prtcrevs", TB_PROPERTY_TYPE_VALUE);
+	if (p)
+		svc->prtcrevs = p->value.immediate;
+	p = tb_property_find(dir, "prtcstns", TB_PROPERTY_TYPE_VALUE);
+	if (p)
+		svc->prtcstns = p->value.immediate;
+
+	svc->key = kstrdup(property->key, GFP_KERNEL);
+	if (!svc->key)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void enumerate_services(struct tb_xdomain *xd)
+{
+	struct tb_service *svc;
+	struct tb_property *p;
+	struct device *dev;
+
+	/*
+	 * First remove all services that are not available anymore in
+	 * the updated property block.
+	 */
+	device_for_each_child_reverse(&xd->dev, xd, remove_missing_service);
+
+	/* Then re-enumerate properties creating new services as we go */
+	tb_property_for_each(xd->properties, p) {
+		if (p->type != TB_PROPERTY_TYPE_DIRECTORY)
+			continue;
+
+		/* If the service exists already we are fine */
+		dev = device_find_child(&xd->dev, p, find_service);
+		if (dev) {
+			put_device(dev);
+			continue;
+		}
+
+		svc = kzalloc(sizeof(*svc), GFP_KERNEL);
+		if (!svc)
+			break;
+
+		if (populate_service(svc, p)) {
+			kfree(svc);
+			break;
+		}
+
+		svc->id = ida_simple_get(&xd->service_ids, 0, 0, GFP_KERNEL);
+		svc->dev.bus = &tb_bus_type;
+		svc->dev.type = &tb_service_type;
+		svc->dev.parent = &xd->dev;
+		dev_set_name(&svc->dev, "%s.%d", dev_name(&xd->dev), svc->id);
+
+		if (device_register(&svc->dev)) {
+			put_device(&svc->dev);
+			break;
+		}
+	}
+}
+
+static int populate_properties(struct tb_xdomain *xd,
+			       struct tb_property_dir *dir)
+{
+	const struct tb_property *p;
+
+	/* Required properties */
+	p = tb_property_find(dir, "deviceid", TB_PROPERTY_TYPE_VALUE);
+	if (!p)
+		return -EINVAL;
+	xd->device = p->value.immediate;
+
+	p = tb_property_find(dir, "vendorid", TB_PROPERTY_TYPE_VALUE);
+	if (!p)
+		return -EINVAL;
+	xd->vendor = p->value.immediate;
+
+	kfree(xd->device_name);
+	xd->device_name = NULL;
+	kfree(xd->vendor_name);
+	xd->vendor_name = NULL;
+
+	/* Optional properties */
+	p = tb_property_find(dir, "deviceid", TB_PROPERTY_TYPE_TEXT);
+	if (p)
+		xd->device_name = kstrdup(p->value.text, GFP_KERNEL);
+	p = tb_property_find(dir, "vendorid", TB_PROPERTY_TYPE_TEXT);
+	if (p)
+		xd->vendor_name = kstrdup(p->value.text, GFP_KERNEL);
+
+	return 0;
+}
+
+/* Called with @xd->lock held */
+static void tb_xdomain_restore_paths(struct tb_xdomain *xd)
+{
+	if (!xd->resume)
+		return;
+
+	xd->resume = false;
+	if (xd->transmit_path) {
+		dev_dbg(&xd->dev, "re-establishing DMA path\n");
+		tb_domain_approve_xdomain_paths(xd->tb, xd);
+	}
+}
+
+static void tb_xdomain_get_properties(struct work_struct *work)
+{
+	struct tb_xdomain *xd = container_of(work, typeof(*xd),
+					     get_properties_work.work);
+	struct tb_property_dir *dir;
+	struct tb *tb = xd->tb;
+	bool update = false;
+	u32 *block = NULL;
+	u32 gen = 0;
+	int ret;
+
+	ret = tb_xdp_properties_request(tb->ctl, xd->route, xd->local_uuid,
+					xd->remote_uuid, xd->properties_retries,
+					&block, &gen);
+	if (ret < 0) {
+		if (xd->properties_retries-- > 0) {
+			queue_delayed_work(xd->tb->wq, &xd->get_properties_work,
+					   msecs_to_jiffies(1000));
+		} else {
+			/* Give up now */
+			dev_err(&xd->dev,
+				"failed read XDomain properties from %pUb\n",
+				xd->remote_uuid);
+		}
+		return;
+	}
+
+	xd->properties_retries = XDOMAIN_PROPERTIES_RETRIES;
+
+	mutex_lock(&xd->lock);
+
+	/* Only accept newer generation properties */
+	if (xd->properties && gen <= xd->property_block_gen) {
+		/*
+		 * On resume it is likely that the properties block is
+		 * not changed (unless the other end added or removed
+		 * services). However, we need to make sure the existing
+		 * DMA paths are restored properly.
+		 */
+		tb_xdomain_restore_paths(xd);
+		goto err_free_block;
+	}
+
+	dir = tb_property_parse_dir(block, ret);
+	if (!dir) {
+		dev_err(&xd->dev, "failed to parse XDomain properties\n");
+		goto err_free_block;
+	}
+
+	ret = populate_properties(xd, dir);
+	if (ret) {
+		dev_err(&xd->dev, "missing XDomain properties in response\n");
+		goto err_free_dir;
+	}
+
+	/* Release the existing one */
+	if (xd->properties) {
+		tb_property_free_dir(xd->properties);
+		update = true;
+	}
+
+	xd->properties = dir;
+	xd->property_block_gen = gen;
+
+	tb_xdomain_restore_paths(xd);
+
+	mutex_unlock(&xd->lock);
+
+	kfree(block);
+
+	/*
+	 * Now the device should be ready enough so we can add it to the
+	 * bus and let userspace know about it. If the device is already
+	 * registered, we notify the userspace that it has changed.
+	 */
+	if (!update) {
+		if (device_add(&xd->dev)) {
+			dev_err(&xd->dev, "failed to add XDomain device\n");
+			return;
+		}
+	} else {
+		kobject_uevent(&xd->dev.kobj, KOBJ_CHANGE);
+	}
+
+	enumerate_services(xd);
+	return;
+
+err_free_dir:
+	tb_property_free_dir(dir);
+err_free_block:
+	kfree(block);
+	mutex_unlock(&xd->lock);
+}
+
+static void tb_xdomain_properties_changed(struct work_struct *work)
+{
+	struct tb_xdomain *xd = container_of(work, typeof(*xd),
+					     properties_changed_work.work);
+	int ret;
+
+	ret = tb_xdp_properties_changed_request(xd->tb->ctl, xd->route,
+				xd->properties_changed_retries, xd->local_uuid);
+	if (ret) {
+		if (xd->properties_changed_retries-- > 0)
+			queue_delayed_work(xd->tb->wq,
+					   &xd->properties_changed_work,
+					   msecs_to_jiffies(1000));
+		return;
+	}
+
+	xd->properties_changed_retries = XDOMAIN_PROPERTIES_CHANGED_RETRIES;
+}
+
+static ssize_t device_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct tb_xdomain *xd = container_of(dev, struct tb_xdomain, dev);
+
+	return sprintf(buf, "%#x\n", xd->device);
+}
+static DEVICE_ATTR_RO(device);
+
+static ssize_t
+device_name_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tb_xdomain *xd = container_of(dev, struct tb_xdomain, dev);
+	int ret;
+
+	if (mutex_lock_interruptible(&xd->lock))
+		return -ERESTARTSYS;
+	ret = sprintf(buf, "%s\n", xd->device_name ? xd->device_name : "");
+	mutex_unlock(&xd->lock);
+
+	return ret;
+}
+static DEVICE_ATTR_RO(device_name);
+
+static ssize_t vendor_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct tb_xdomain *xd = container_of(dev, struct tb_xdomain, dev);
+
+	return sprintf(buf, "%#x\n", xd->vendor);
+}
+static DEVICE_ATTR_RO(vendor);
+
+static ssize_t
+vendor_name_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tb_xdomain *xd = container_of(dev, struct tb_xdomain, dev);
+	int ret;
+
+	if (mutex_lock_interruptible(&xd->lock))
+		return -ERESTARTSYS;
+	ret = sprintf(buf, "%s\n", xd->vendor_name ? xd->vendor_name : "");
+	mutex_unlock(&xd->lock);
+
+	return ret;
+}
+static DEVICE_ATTR_RO(vendor_name);
+
+static ssize_t unique_id_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct tb_xdomain *xd = container_of(dev, struct tb_xdomain, dev);
+
+	return sprintf(buf, "%pUb\n", xd->remote_uuid);
+}
+static DEVICE_ATTR_RO(unique_id);
+
+static struct attribute *xdomain_attrs[] = {
+	&dev_attr_device.attr,
+	&dev_attr_device_name.attr,
+	&dev_attr_unique_id.attr,
+	&dev_attr_vendor.attr,
+	&dev_attr_vendor_name.attr,
+	NULL,
+};
+
+static struct attribute_group xdomain_attr_group = {
+	.attrs = xdomain_attrs,
+};
+
+static const struct attribute_group *xdomain_attr_groups[] = {
+	&xdomain_attr_group,
+	NULL,
+};
+
+static void tb_xdomain_release(struct device *dev)
+{
+	struct tb_xdomain *xd = container_of(dev, struct tb_xdomain, dev);
+
+	put_device(xd->dev.parent);
+
+	tb_property_free_dir(xd->properties);
+	ida_destroy(&xd->service_ids);
+
+	kfree(xd->local_uuid);
+	kfree(xd->remote_uuid);
+	kfree(xd->device_name);
+	kfree(xd->vendor_name);
+	kfree(xd);
+}
+
+static void start_handshake(struct tb_xdomain *xd)
+{
+	xd->properties_retries = XDOMAIN_PROPERTIES_RETRIES;
+	xd->properties_changed_retries = XDOMAIN_PROPERTIES_CHANGED_RETRIES;
+
+	/* Start exchanging properties with the other host */
+	queue_delayed_work(xd->tb->wq, &xd->properties_changed_work,
+			   msecs_to_jiffies(100));
+	queue_delayed_work(xd->tb->wq, &xd->get_properties_work,
+			   msecs_to_jiffies(1000));
+}
+
+static void stop_handshake(struct tb_xdomain *xd)
+{
+	xd->properties_retries = 0;
+	xd->properties_changed_retries = 0;
+
+	cancel_delayed_work_sync(&xd->get_properties_work);
+	cancel_delayed_work_sync(&xd->properties_changed_work);
+}
+
+static int __maybe_unused tb_xdomain_suspend(struct device *dev)
+{
+	stop_handshake(tb_to_xdomain(dev));
+	return 0;
+}
+
+static int __maybe_unused tb_xdomain_resume(struct device *dev)
+{
+	struct tb_xdomain *xd = tb_to_xdomain(dev);
+
+	/*
+	 * Ask tb_xdomain_get_properties() restore any existing DMA
+	 * paths after properties are re-read.
+	 */
+	xd->resume = true;
+	start_handshake(xd);
+
+	return 0;
+}
+
+static const struct dev_pm_ops tb_xdomain_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(tb_xdomain_suspend, tb_xdomain_resume)
+};
+
+struct device_type tb_xdomain_type = {
+	.name = "thunderbolt_xdomain",
+	.release = tb_xdomain_release,
+	.pm = &tb_xdomain_pm_ops,
+};
+EXPORT_SYMBOL_GPL(tb_xdomain_type);
+
+/**
+ * tb_xdomain_alloc() - Allocate new XDomain object
+ * @tb: Domain where the XDomain belongs
+ * @parent: Parent device (the switch through the connection to the
+ *	    other domain is reached).
+ * @route: Route string used to reach the other domain
+ * @local_uuid: Our local domain UUID
+ * @remote_uuid: UUID of the other domain
+ *
+ * Allocates new XDomain structure and returns pointer to that. The
+ * object must be released by calling tb_xdomain_put().
+ */
+struct tb_xdomain *tb_xdomain_alloc(struct tb *tb, struct device *parent,
+				    u64 route, const uuid_t *local_uuid,
+				    const uuid_t *remote_uuid)
+{
+	struct tb_xdomain *xd;
+
+	xd = kzalloc(sizeof(*xd), GFP_KERNEL);
+	if (!xd)
+		return NULL;
+
+	xd->tb = tb;
+	xd->route = route;
+	ida_init(&xd->service_ids);
+	mutex_init(&xd->lock);
+	INIT_DELAYED_WORK(&xd->get_properties_work, tb_xdomain_get_properties);
+	INIT_DELAYED_WORK(&xd->properties_changed_work,
+			  tb_xdomain_properties_changed);
+
+	xd->local_uuid = kmemdup(local_uuid, sizeof(uuid_t), GFP_KERNEL);
+	if (!xd->local_uuid)
+		goto err_free;
+
+	xd->remote_uuid = kmemdup(remote_uuid, sizeof(uuid_t), GFP_KERNEL);
+	if (!xd->remote_uuid)
+		goto err_free_local_uuid;
+
+	device_initialize(&xd->dev);
+	xd->dev.parent = get_device(parent);
+	xd->dev.bus = &tb_bus_type;
+	xd->dev.type = &tb_xdomain_type;
+	xd->dev.groups = xdomain_attr_groups;
+	dev_set_name(&xd->dev, "%u-%llx", tb->index, route);
+
+	return xd;
+
+err_free_local_uuid:
+	kfree(xd->local_uuid);
+err_free:
+	kfree(xd);
+
+	return NULL;
+}
+
+/**
+ * tb_xdomain_add() - Add XDomain to the bus
+ * @xd: XDomain to add
+ *
+ * This function starts XDomain discovery protocol handshake and
+ * eventually adds the XDomain to the bus. After calling this function
+ * the caller needs to call tb_xdomain_remove() in order to remove and
+ * release the object regardless whether the handshake succeeded or not.
+ */
+void tb_xdomain_add(struct tb_xdomain *xd)
+{
+	/* Start exchanging properties with the other host */
+	start_handshake(xd);
+}
+
+static int unregister_service(struct device *dev, void *data)
+{
+	device_unregister(dev);
+	return 0;
+}
+
+/**
+ * tb_xdomain_remove() - Remove XDomain from the bus
+ * @xd: XDomain to remove
+ *
+ * This will stop all ongoing configuration work and remove the XDomain
+ * along with any services from the bus. When the last reference to @xd
+ * is released the object will be released as well.
+ */
+void tb_xdomain_remove(struct tb_xdomain *xd)
+{
+	stop_handshake(xd);
+
+	device_for_each_child_reverse(&xd->dev, xd, unregister_service);
+
+	if (!device_is_registered(&xd->dev))
+		put_device(&xd->dev);
+	else
+		device_unregister(&xd->dev);
+}
+
+/**
+ * tb_xdomain_enable_paths() - Enable DMA paths for XDomain connection
+ * @xd: XDomain connection
+ * @transmit_path: HopID of the transmit path the other end is using to
+ *		   send packets
+ * @transmit_ring: DMA ring used to receive packets from the other end
+ * @receive_path: HopID of the receive path the other end is using to
+ *		  receive packets
+ * @receive_ring: DMA ring used to send packets to the other end
+ *
+ * The function enables DMA paths accordingly so that after successful
+ * return the caller can send and receive packets using high-speed DMA
+ * path.
+ *
+ * Return: %0 in case of success and negative errno in case of error
+ */
+int tb_xdomain_enable_paths(struct tb_xdomain *xd, u16 transmit_path,
+			    u16 transmit_ring, u16 receive_path,
+			    u16 receive_ring)
+{
+	int ret;
+
+	mutex_lock(&xd->lock);
+
+	if (xd->transmit_path) {
+		ret = xd->transmit_path == transmit_path ? 0 : -EBUSY;
+		goto exit_unlock;
+	}
+
+	xd->transmit_path = transmit_path;
+	xd->transmit_ring = transmit_ring;
+	xd->receive_path = receive_path;
+	xd->receive_ring = receive_ring;
+
+	ret = tb_domain_approve_xdomain_paths(xd->tb, xd);
+
+exit_unlock:
+	mutex_unlock(&xd->lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tb_xdomain_enable_paths);
+
+/**
+ * tb_xdomain_disable_paths() - Disable DMA paths for XDomain connection
+ * @xd: XDomain connection
+ *
+ * This does the opposite of tb_xdomain_enable_paths(). After call to
+ * this the caller is not expected to use the rings anymore.
+ *
+ * Return: %0 in case of success and negative errno in case of error
+ */
+int tb_xdomain_disable_paths(struct tb_xdomain *xd)
+{
+	int ret = 0;
+
+	mutex_lock(&xd->lock);
+	if (xd->transmit_path) {
+		xd->transmit_path = 0;
+		xd->transmit_ring = 0;
+		xd->receive_path = 0;
+		xd->receive_ring = 0;
+
+		ret = tb_domain_disconnect_xdomain_paths(xd->tb, xd);
+	}
+	mutex_unlock(&xd->lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tb_xdomain_disable_paths);
+
+struct tb_xdomain_lookup {
+	const uuid_t *uuid;
+	u8 link;
+	u8 depth;
+};
+
+static struct tb_xdomain *switch_find_xdomain(struct tb_switch *sw,
+	const struct tb_xdomain_lookup *lookup)
+{
+	int i;
+
+	for (i = 1; i <= sw->config.max_port_number; i++) {
+		struct tb_port *port = &sw->ports[i];
+		struct tb_xdomain *xd;
+
+		if (tb_is_upstream_port(port))
+			continue;
+
+		if (port->xdomain) {
+			xd = port->xdomain;
+
+			if (lookup->uuid) {
+				if (uuid_equal(xd->remote_uuid, lookup->uuid))
+					return xd;
+			} else if (lookup->link == xd->link &&
+				   lookup->depth == xd->depth) {
+				return xd;
+			}
+		} else if (port->remote) {
+			xd = switch_find_xdomain(port->remote->sw, lookup);
+			if (xd)
+				return xd;
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * tb_xdomain_find_by_uuid() - Find an XDomain by UUID
+ * @tb: Domain where the XDomain belongs to
+ * @uuid: UUID to look for
+ *
+ * Finds XDomain by walking through the Thunderbolt topology below @tb.
+ * The returned XDomain will have its reference count increased so the
+ * caller needs to call tb_xdomain_put() when it is done with the
+ * object.
+ *
+ * This will find all XDomains including the ones that are not yet added
+ * to the bus (handshake is still in progress).
+ *
+ * The caller needs to hold @tb->lock.
+ */
+struct tb_xdomain *tb_xdomain_find_by_uuid(struct tb *tb, const uuid_t *uuid)
+{
+	struct tb_xdomain_lookup lookup;
+	struct tb_xdomain *xd;
+
+	memset(&lookup, 0, sizeof(lookup));
+	lookup.uuid = uuid;
+
+	xd = switch_find_xdomain(tb->root_switch, &lookup);
+	if (xd) {
+		get_device(&xd->dev);
+		return xd;
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(tb_xdomain_find_by_uuid);
+
+/**
+ * tb_xdomain_find_by_link_depth() - Find an XDomain by link and depth
+ * @tb: Domain where the XDomain belongs to
+ * @link: Root switch link number
+ * @depth: Depth in the link
+ *
+ * Finds XDomain by walking through the Thunderbolt topology below @tb.
+ * The returned XDomain will have its reference count increased so the
+ * caller needs to call tb_xdomain_put() when it is done with the
+ * object.
+ *
+ * This will find all XDomains including the ones that are not yet added
+ * to the bus (handshake is still in progress).
+ *
+ * The caller needs to hold @tb->lock.
+ */
+struct tb_xdomain *tb_xdomain_find_by_link_depth(struct tb *tb, u8 link,
+						 u8 depth)
+{
+	struct tb_xdomain_lookup lookup;
+	struct tb_xdomain *xd;
+
+	memset(&lookup, 0, sizeof(lookup));
+	lookup.link = link;
+	lookup.depth = depth;
+
+	xd = switch_find_xdomain(tb->root_switch, &lookup);
+	if (xd) {
+		get_device(&xd->dev);
+		return xd;
+	}
+
+	return NULL;
+}
+
+bool tb_xdomain_handle_request(struct tb *tb, enum tb_cfg_pkg_type type,
+			       const void *buf, size_t size)
+{
+	const struct tb_protocol_handler *handler, *tmp;
+	const struct tb_xdp_header *hdr = buf;
+	unsigned int length;
+	int ret = 0;
+
+	/* We expect the packet is at least size of the header */
+	length = hdr->xd_hdr.length_sn & TB_XDOMAIN_LENGTH_MASK;
+	if (length != size / 4 - sizeof(hdr->xd_hdr) / 4)
+		return true;
+	if (length < sizeof(*hdr) / 4 - sizeof(hdr->xd_hdr) / 4)
+		return true;
+
+	/*
+	 * Handle XDomain discovery protocol packets directly here. For
+	 * other protocols (based on their UUID) we call registered
+	 * handlers in turn.
+	 */
+	if (uuid_equal(&hdr->uuid, &tb_xdp_uuid)) {
+		if (type == TB_CFG_PKG_XDOMAIN_REQ) {
+			tb_xdp_schedule_request(tb, hdr, size);
+			return true;
+		}
+		return false;
+	}
+
+	mutex_lock(&xdomain_lock);
+	list_for_each_entry_safe(handler, tmp, &protocol_handlers, list) {
+		if (!uuid_equal(&hdr->uuid, handler->uuid))
+			continue;
+
+		mutex_unlock(&xdomain_lock);
+		ret = handler->callback(buf, size, handler->data);
+		mutex_lock(&xdomain_lock);
+
+		if (ret)
+			break;
+	}
+	mutex_unlock(&xdomain_lock);
+
+	return ret > 0;
+}
+
+static int rebuild_property_block(void)
+{
+	u32 *block, len;
+	int ret;
+
+	ret = tb_property_format_dir(xdomain_property_dir, NULL, 0);
+	if (ret < 0)
+		return ret;
+
+	len = ret;
+
+	block = kcalloc(len, sizeof(u32), GFP_KERNEL);
+	if (!block)
+		return -ENOMEM;
+
+	ret = tb_property_format_dir(xdomain_property_dir, block, len);
+	if (ret) {
+		kfree(block);
+		return ret;
+	}
+
+	kfree(xdomain_property_block);
+	xdomain_property_block = block;
+	xdomain_property_block_len = len;
+	xdomain_property_block_gen++;
+
+	return 0;
+}
+
+static int update_xdomain(struct device *dev, void *data)
+{
+	struct tb_xdomain *xd;
+
+	xd = tb_to_xdomain(dev);
+	if (xd) {
+		queue_delayed_work(xd->tb->wq, &xd->properties_changed_work,
+				   msecs_to_jiffies(50));
+	}
+
+	return 0;
+}
+
+static void update_all_xdomains(void)
+{
+	bus_for_each_dev(&tb_bus_type, NULL, NULL, update_xdomain);
+}
+
+static bool remove_directory(const char *key, const struct tb_property_dir *dir)
+{
+	struct tb_property *p;
+
+	p = tb_property_find(xdomain_property_dir, key,
+			     TB_PROPERTY_TYPE_DIRECTORY);
+	if (p && p->value.dir == dir) {
+		tb_property_remove(p);
+		return true;
+	}
+	return false;
+}
+
+/**
+ * tb_register_property_dir() - Register property directory to the host
+ * @key: Key (name) of the directory to add
+ * @dir: Directory to add
+ *
+ * Service drivers can use this function to add new property directory
+ * to the host available properties. The other connected hosts are
+ * notified so they can re-read properties of this host if they are
+ * interested.
+ *
+ * Return: %0 on success and negative errno on failure
+ */
+int tb_register_property_dir(const char *key, struct tb_property_dir *dir)
+{
+	int ret;
+
+	if (!key || strlen(key) > 8)
+		return -EINVAL;
+
+	mutex_lock(&xdomain_lock);
+	if (tb_property_find(xdomain_property_dir, key,
+			     TB_PROPERTY_TYPE_DIRECTORY)) {
+		ret = -EEXIST;
+		goto err_unlock;
+	}
+
+	ret = tb_property_add_dir(xdomain_property_dir, key, dir);
+	if (ret)
+		goto err_unlock;
+
+	ret = rebuild_property_block();
+	if (ret) {
+		remove_directory(key, dir);
+		goto err_unlock;
+	}
+
+	mutex_unlock(&xdomain_lock);
+	update_all_xdomains();
+	return 0;
+
+err_unlock:
+	mutex_unlock(&xdomain_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tb_register_property_dir);
+
+/**
+ * tb_unregister_property_dir() - Removes property directory from host
+ * @key: Key (name) of the directory
+ * @dir: Directory to remove
+ *
+ * This will remove the existing directory from this host and notify the
+ * connected hosts about the change.
+ */
+void tb_unregister_property_dir(const char *key, struct tb_property_dir *dir)
+{
+	int ret = 0;
+
+	mutex_lock(&xdomain_lock);
+	if (remove_directory(key, dir))
+		ret = rebuild_property_block();
+	mutex_unlock(&xdomain_lock);
+
+	if (!ret)
+		update_all_xdomains();
+}
+EXPORT_SYMBOL_GPL(tb_unregister_property_dir);
+
+int tb_xdomain_init(void)
+{
+	int ret;
+
+	xdomain_property_dir = tb_property_create_dir(NULL);
+	if (!xdomain_property_dir)
+		return -ENOMEM;
+
+	/*
+	 * Initialize standard set of properties without any service
+	 * directories. Those will be added by service drivers
+	 * themselves when they are loaded.
+	 */
+	tb_property_add_immediate(xdomain_property_dir, "vendorid",
+				  PCI_VENDOR_ID_INTEL);
+	tb_property_add_text(xdomain_property_dir, "vendorid", "Intel Corp.");
+	tb_property_add_immediate(xdomain_property_dir, "deviceid", 0x1);
+	tb_property_add_text(xdomain_property_dir, "deviceid",
+			     utsname()->nodename);
+	tb_property_add_immediate(xdomain_property_dir, "devicerv", 0x80000100);
+
+	ret = rebuild_property_block();
+	if (ret) {
+		tb_property_free_dir(xdomain_property_dir);
+		xdomain_property_dir = NULL;
+	}
+
+	return ret;
+}
+
+void tb_xdomain_exit(void)
+{
+	kfree(xdomain_property_block);
+	tb_property_free_dir(xdomain_property_dir);
+}
diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 694cebb50f72..7625c3b81f84 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -683,5 +683,31 @@ struct fsl_mc_device_id {
 	const char obj_type[16];
 };
 
+/**
+ * struct tb_service_id - Thunderbolt service identifiers
+ * @match_flags: Flags used to match the structure
+ * @protocol_key: Protocol key the service supports
+ * @protocol_id: Protocol id the service supports
+ * @protocol_version: Version of the protocol
+ * @protocol_revision: Revision of the protocol software
+ * @driver_data: Driver specific data
+ *
+ * Thunderbolt XDomain services are exposed as devices where each device
+ * carries the protocol information the service supports. Thunderbolt
+ * XDomain service drivers match against that information.
+ */
+struct tb_service_id {
+	__u32 match_flags;
+	char protocol_key[8 + 1];
+	__u32 protocol_id;
+	__u32 protocol_version;
+	__u32 protocol_revision;
+	kernel_ulong_t driver_data;
+};
+
+#define TBSVC_MATCH_PROTOCOL_KEY	0x0001
+#define TBSVC_MATCH_PROTOCOL_ID		0x0002
+#define TBSVC_MATCH_PROTOCOL_VERSION	0x0004
+#define TBSVC_MATCH_PROTOCOL_REVISION	0x0008
 
 #endif /* LINUX_MOD_DEVICETABLE_H */
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index 43b8d1e09341..18c0e3d5e85c 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -17,6 +17,7 @@
 #include <linux/device.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
+#include <linux/mod_devicetable.h>
 #include <linux/uuid.h>
 
 enum tb_cfg_pkg_type {
@@ -77,6 +78,8 @@ struct tb {
 };
 
 extern struct bus_type tb_bus_type;
+extern struct device_type tb_service_type;
+extern struct device_type tb_xdomain_type;
 
 #define TB_LINKS_PER_PHY_PORT	2
 
@@ -155,4 +158,243 @@ struct tb_property *tb_property_get_next(struct tb_property_dir *dir,
 	     property;						\
 	     property = tb_property_get_next(dir, property))
 
+int tb_register_property_dir(const char *key, struct tb_property_dir *dir);
+void tb_unregister_property_dir(const char *key, struct tb_property_dir *dir);
+
+/**
+ * struct tb_xdomain - Cross-domain (XDomain) connection
+ * @dev: XDomain device
+ * @tb: Pointer to the domain
+ * @remote_uuid: UUID of the remote domain (host)
+ * @local_uuid: Cached local UUID
+ * @route: Route string the other domain can be reached
+ * @vendor: Vendor ID of the remote domain
+ * @device: Device ID of the demote domain
+ * @lock: Lock to serialize access to the following fields of this structure
+ * @vendor_name: Name of the vendor (or %NULL if not known)
+ * @device_name: Name of the device (or %NULL if not known)
+ * @is_unplugged: The XDomain is unplugged
+ * @resume: The XDomain is being resumed
+ * @transmit_path: HopID which the remote end expects us to transmit
+ * @transmit_ring: Local ring (hop) where outgoing packets are pushed
+ * @receive_path: HopID which we expect the remote end to transmit
+ * @receive_ring: Local ring (hop) where incoming packets arrive
+ * @service_ids: Used to generate IDs for the services
+ * @properties: Properties exported by the remote domain
+ * @property_block_gen: Generation of @properties
+ * @properties_lock: Lock protecting @properties.
+ * @get_properties_work: Work used to get remote domain properties
+ * @properties_retries: Number of times left to read properties
+ * @properties_changed_work: Work used to notify the remote domain that
+ *			     our properties have changed
+ * @properties_changed_retries: Number of times left to send properties
+ *				changed notification
+ * @link: Root switch link the remote domain is connected (ICM only)
+ * @depth: Depth in the chain the remote domain is connected (ICM only)
+ *
+ * This structure represents connection across two domains (hosts).
+ * Each XDomain contains zero or more services which are exposed as
+ * &struct tb_service objects.
+ *
+ * Service drivers may access this structure if they need to enumerate
+ * non-standard properties but they need hold @lock when doing so
+ * because properties can be changed asynchronously in response to
+ * changes in the remote domain.
+ */
+struct tb_xdomain {
+	struct device dev;
+	struct tb *tb;
+	uuid_t *remote_uuid;
+	const uuid_t *local_uuid;
+	u64 route;
+	u16 vendor;
+	u16 device;
+	struct mutex lock;
+	const char *vendor_name;
+	const char *device_name;
+	bool is_unplugged;
+	bool resume;
+	u16 transmit_path;
+	u16 transmit_ring;
+	u16 receive_path;
+	u16 receive_ring;
+	struct ida service_ids;
+	struct tb_property_dir *properties;
+	u32 property_block_gen;
+	struct delayed_work get_properties_work;
+	int properties_retries;
+	struct delayed_work properties_changed_work;
+	int properties_changed_retries;
+	u8 link;
+	u8 depth;
+};
+
+int tb_xdomain_enable_paths(struct tb_xdomain *xd, u16 transmit_path,
+			    u16 transmit_ring, u16 receive_path,
+			    u16 receive_ring);
+int tb_xdomain_disable_paths(struct tb_xdomain *xd);
+struct tb_xdomain *tb_xdomain_find_by_uuid(struct tb *tb, const uuid_t *uuid);
+
+static inline struct tb_xdomain *
+tb_xdomain_find_by_uuid_locked(struct tb *tb, const uuid_t *uuid)
+{
+	struct tb_xdomain *xd;
+
+	mutex_lock(&tb->lock);
+	xd = tb_xdomain_find_by_uuid(tb, uuid);
+	mutex_unlock(&tb->lock);
+
+	return xd;
+}
+
+static inline struct tb_xdomain *tb_xdomain_get(struct tb_xdomain *xd)
+{
+	if (xd)
+		get_device(&xd->dev);
+	return xd;
+}
+
+static inline void tb_xdomain_put(struct tb_xdomain *xd)
+{
+	if (xd)
+		put_device(&xd->dev);
+}
+
+static inline bool tb_is_xdomain(const struct device *dev)
+{
+	return dev->type == &tb_xdomain_type;
+}
+
+static inline struct tb_xdomain *tb_to_xdomain(struct device *dev)
+{
+	if (tb_is_xdomain(dev))
+		return container_of(dev, struct tb_xdomain, dev);
+	return NULL;
+}
+
+int tb_xdomain_response(struct tb_xdomain *xd, const void *response,
+			size_t size, enum tb_cfg_pkg_type type);
+int tb_xdomain_request(struct tb_xdomain *xd, const void *request,
+		       size_t request_size, enum tb_cfg_pkg_type request_type,
+		       void *response, size_t response_size,
+		       enum tb_cfg_pkg_type response_type,
+		       unsigned int timeout_msec);
+
+/**
+ * tb_protocol_handler - Protocol specific handler
+ * @uuid: XDomain messages with this UUID are dispatched to this handler
+ * @callback: Callback called with the XDomain message. Returning %1
+ *	      here tells the XDomain core that the message was handled
+ *	      by this handler and should not be forwared to other
+ *	      handlers.
+ * @data: Data passed with the callback
+ * @list: Handlers are linked using this
+ *
+ * Thunderbolt services can hook into incoming XDomain requests by
+ * registering protocol handler. Only limitation is that the XDomain
+ * discovery protocol UUID cannot be registered since it is handled by
+ * the core XDomain code.
+ *
+ * The @callback must check that the message is really directed to the
+ * service the driver implements.
+ */
+struct tb_protocol_handler {
+	const uuid_t *uuid;
+	int (*callback)(const void *buf, size_t size, void *data);
+	void *data;
+	struct list_head list;
+};
+
+int tb_register_protocol_handler(struct tb_protocol_handler *handler);
+void tb_unregister_protocol_handler(struct tb_protocol_handler *handler);
+
+/**
+ * struct tb_service - Thunderbolt service
+ * @dev: XDomain device
+ * @id: ID of the service (shown in sysfs)
+ * @key: Protocol key from the properties directory
+ * @prtcid: Protocol ID from the properties directory
+ * @prtcvers: Protocol version from the properties directory
+ * @prtcrevs: Protocol software revision from the properties directory
+ * @prtcstns: Protocol settings mask from the properties directory
+ *
+ * Each domain exposes set of services it supports as collection of
+ * properties. For each service there will be one corresponding
+ * &struct tb_service. Service drivers are bound to these.
+ */
+struct tb_service {
+	struct device dev;
+	int id;
+	const char *key;
+	u32 prtcid;
+	u32 prtcvers;
+	u32 prtcrevs;
+	u32 prtcstns;
+};
+
+static inline struct tb_service *tb_service_get(struct tb_service *svc)
+{
+	if (svc)
+		get_device(&svc->dev);
+	return svc;
+}
+
+static inline void tb_service_put(struct tb_service *svc)
+{
+	if (svc)
+		put_device(&svc->dev);
+}
+
+static inline bool tb_is_service(const struct device *dev)
+{
+	return dev->type == &tb_service_type;
+}
+
+static inline struct tb_service *tb_to_service(struct device *dev)
+{
+	if (tb_is_service(dev))
+		return container_of(dev, struct tb_service, dev);
+	return NULL;
+}
+
+/**
+ * tb_service_driver - Thunderbolt service driver
+ * @driver: Driver structure
+ * @probe: Called when the driver is probed
+ * @remove: Called when the driver is removed (optional)
+ * @shutdown: Called at shutdown time to stop the service (optional)
+ * @id_table: Table of service identifiers the driver supports
+ */
+struct tb_service_driver {
+	struct device_driver driver;
+	int (*probe)(struct tb_service *svc, const struct tb_service_id *id);
+	void (*remove)(struct tb_service *svc);
+	void (*shutdown)(struct tb_service *svc);
+	const struct tb_service_id *id_table;
+};
+
+#define TB_SERVICE(key, id)				\
+	.match_flags = TBSVC_MATCH_PROTOCOL_KEY |	\
+		       TBSVC_MATCH_PROTOCOL_ID,		\
+	.protocol_key = (key),				\
+	.protocol_id = (id)
+
+int tb_register_service_driver(struct tb_service_driver *drv);
+void tb_unregister_service_driver(struct tb_service_driver *drv);
+
+static inline void *tb_service_get_drvdata(const struct tb_service *svc)
+{
+	return dev_get_drvdata(&svc->dev);
+}
+
+static inline void tb_service_set_drvdata(struct tb_service *svc, void *data)
+{
+	dev_set_drvdata(&svc->dev, data);
+}
+
+static inline struct tb_xdomain *tb_service_parent(struct tb_service *svc)
+{
+	return tb_to_xdomain(svc->dev.parent);
+}
+
 #endif /* THUNDERBOLT_H_ */
diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
index e4d90e50f6fe..57263f2f8f2f 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -206,5 +206,12 @@ int main(void)
 	DEVID_FIELD(fsl_mc_device_id, vendor);
 	DEVID_FIELD(fsl_mc_device_id, obj_type);
 
+	DEVID(tb_service_id);
+	DEVID_FIELD(tb_service_id, match_flags);
+	DEVID_FIELD(tb_service_id, protocol_key);
+	DEVID_FIELD(tb_service_id, protocol_id);
+	DEVID_FIELD(tb_service_id, protocol_version);
+	DEVID_FIELD(tb_service_id, protocol_revision);
+
 	return 0;
 }
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 29d6699d5a06..6ef6e63f96fd 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -1301,6 +1301,31 @@ static int do_fsl_mc_entry(const char *filename, void *symval,
 }
 ADD_TO_DEVTABLE("fslmc", fsl_mc_device_id, do_fsl_mc_entry);
 
+/* Looks like: tbsvc:kSpNvNrN */
+static int do_tbsvc_entry(const char *filename, void *symval, char *alias)
+{
+	DEF_FIELD(symval, tb_service_id, match_flags);
+	DEF_FIELD_ADDR(symval, tb_service_id, protocol_key);
+	DEF_FIELD(symval, tb_service_id, protocol_id);
+	DEF_FIELD(symval, tb_service_id, protocol_version);
+	DEF_FIELD(symval, tb_service_id, protocol_revision);
+
+	strcpy(alias, "tbsvc:");
+	if (match_flags & TBSVC_MATCH_PROTOCOL_KEY)
+		sprintf(alias + strlen(alias), "k%s", *protocol_key);
+	else
+		strcat(alias + strlen(alias), "k*");
+	ADD(alias, "p", match_flags & TBSVC_MATCH_PROTOCOL_ID, protocol_id);
+	ADD(alias, "v", match_flags & TBSVC_MATCH_PROTOCOL_VERSION,
+	    protocol_version);
+	ADD(alias, "r", match_flags & TBSVC_MATCH_PROTOCOL_REVISION,
+	    protocol_revision);
+
+	add_wildcard(alias);
+	return 1;
+}
+ADD_TO_DEVTABLE("tbsvc", tb_service_id, do_tbsvc_entry);
+
 /* Does namelen bytes of name exactly match the symbol? */
 static bool sym_is(const char *name, unsigned namelen, const char *symbol)
 {
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 07/16] thunderbolt: Configure interrupt throttling for all interrupts
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

This will keep the interrupt delivery rate reasonable. The value used
here (128 us) is a recommendation from the hardware people.

This code is based on the work done by Amir Levy and Michael Jamet.

Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/nhi.c      | 22 +++++++++++++++++++---
 drivers/thunderbolt/nhi_regs.h |  2 ++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 05af126a2435..bc44cbaaf05a 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -651,6 +651,21 @@ static int nhi_suspend_noirq(struct device *dev)
 	return tb_domain_suspend_noirq(tb);
 }
 
+static void nhi_enable_int_throttling(struct tb_nhi *nhi)
+{
+	u32 throttle = DIV_ROUND_UP(128 * NSEC_PER_USEC, 256);
+	unsigned int i;
+
+	/*
+	 * Configure interrupt throttling for all vectors even if we
+	 * only use few.
+	 */
+	for (i = 0; i < MSIX_MAX_VECS; i++) {
+		u32 reg = REG_INT_THROTTLING_RATE + i * 4;
+		iowrite32(throttle, nhi->iobase + reg);
+	}
+}
+
 static int nhi_resume_noirq(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -663,6 +678,8 @@ static int nhi_resume_noirq(struct device *dev)
 	 */
 	if (!pci_device_is_present(pdev))
 		tb->nhi->going_away = true;
+	else
+		nhi_enable_int_throttling(tb->nhi);
 
 	return tb_domain_resume_noirq(tb);
 }
@@ -717,6 +734,8 @@ static int nhi_init_msi(struct tb_nhi *nhi)
 	/* In case someone left them on. */
 	nhi_disable_interrupts(nhi);
 
+	nhi_enable_int_throttling(nhi);
+
 	ida_init(&nhi->msix_ida);
 
 	/*
@@ -796,9 +815,6 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	pci_set_master(pdev);
 
-	/* magic value - clock related? */
-	iowrite32(3906250 / 10000, nhi->iobase + 0x38c00);
-
 	tb = icm_probe(nhi);
 	if (!tb)
 		tb = tb_probe(nhi);
diff --git a/drivers/thunderbolt/nhi_regs.h b/drivers/thunderbolt/nhi_regs.h
index 09ed574e92ff..46eff69b19ad 100644
--- a/drivers/thunderbolt/nhi_regs.h
+++ b/drivers/thunderbolt/nhi_regs.h
@@ -95,6 +95,8 @@ struct ring_desc {
 #define REG_RING_INTERRUPT_BASE	0x38200
 #define RING_INTERRUPT_REG_COUNT(nhi) ((31 + 2 * nhi->hop_count) / 32)
 
+#define REG_INT_THROTTLING_RATE	0x38c00
+
 /* Interrupt Vector Allocation */
 #define REG_INT_VEC_ALLOC_BASE	0x38c40
 #define REG_INT_VEC_ALLOC_BITS	4
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 08/16] thunderbolt: Add support for frame mode
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

When high-speed DMA paths are used to transfer arbitrary data over a
Thunderbolt link, DMA rings should be in frame mode instead of raw mode.
The latter is used by the control channel (ring 0). In frame mode each
data frame can hold up to 4kB payload.

This patch modifies the DMA ring code to allow configuring a ring to be
in frame mode by passing a new flag (RING_FLAG_FRAME) to the ring when
it is allocated. In addition there might be need to enable end-to-end
(E2E) workaround for the ring to prevent losing Rx frames in certain
situations. We add another flag (RING_FLAG_E2E) that can be used for
this purpose.

This code is based on the work done by Amir Levy and Michael Jamet.

Signed-off-by: Michael Jamet <michael.jamet@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/ctl.c      |  3 +-
 drivers/thunderbolt/nhi.c      | 76 ++++++++++++++++++++++++++----------------
 drivers/thunderbolt/nhi.h      | 10 +++++-
 drivers/thunderbolt/nhi_regs.h |  2 ++
 4 files changed, 61 insertions(+), 30 deletions(-)

diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index 46e393c5fd1d..05400b77dcd7 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -618,7 +618,8 @@ struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, event_cb cb, void *cb_data)
 	if (!ctl->tx)
 		goto err;
 
-	ctl->rx = ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND);
+	ctl->rx = ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND, 0xffff,
+				0xffff);
 	if (!ctl->rx)
 		goto err;
 
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index bc44cbaaf05a..cde8b6d46147 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -21,6 +21,12 @@
 
 #define RING_TYPE(ring) ((ring)->is_tx ? "TX ring" : "RX ring")
 
+/*
+ * Used to enable end-to-end workaround for missing RX packets. Do not
+ * use this ring for anything else.
+ */
+#define RING_E2E_UNUSED_HOPID	2
+
 /*
  * Minimal number of vectors when we use MSI-X. Two for control channel
  * Rx/Tx and the rest four are for cross domain DMA paths.
@@ -229,23 +235,6 @@ static void ring_work(struct work_struct *work)
 			frame->eof = ring->descriptors[ring->tail].eof;
 			frame->sof = ring->descriptors[ring->tail].sof;
 			frame->flags = ring->descriptors[ring->tail].flags;
-			if (frame->sof != 0)
-				dev_WARN(&ring->nhi->pdev->dev,
-					 "%s %d got unexpected SOF: %#x\n",
-					 RING_TYPE(ring), ring->hop,
-					 frame->sof);
-			/*
-			 * known flags:
-			 * raw not enabled, interupt not set: 0x2=0010
-			 * raw enabled: 0xa=1010
-			 * raw not enabled: 0xb=1011
-			 * partial frame (>MAX_FRAME_SIZE): 0xe=1110
-			 */
-			if (frame->flags != 0xa)
-				dev_WARN(&ring->nhi->pdev->dev,
-					 "%s %d got unexpected flags: %#x\n",
-					 RING_TYPE(ring), ring->hop,
-					 frame->flags);
 		}
 		ring->tail = (ring->tail + 1) % ring->size;
 	}
@@ -321,12 +310,17 @@ static void ring_release_msix(struct tb_ring *ring)
 }
 
 static struct tb_ring *ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
-				  bool transmit, unsigned int flags)
+				  bool transmit, unsigned int flags,
+				  u16 sof_mask, u16 eof_mask)
 {
 	struct tb_ring *ring = NULL;
 	dev_info(&nhi->pdev->dev, "allocating %s ring %d of size %d\n",
 		 transmit ? "TX" : "RX", hop, size);
 
+	/* Tx Ring 2 is reserved for E2E workaround */
+	if (transmit && hop == RING_E2E_UNUSED_HOPID)
+		return NULL;
+
 	mutex_lock(&nhi->lock);
 	if (hop >= nhi->hop_count) {
 		dev_WARN(&nhi->pdev->dev, "invalid hop: %d\n", hop);
@@ -353,6 +347,8 @@ static struct tb_ring *ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
 	ring->is_tx = transmit;
 	ring->size = size;
 	ring->flags = flags;
+	ring->sof_mask = sof_mask;
+	ring->eof_mask = eof_mask;
 	ring->head = 0;
 	ring->tail = 0;
 	ring->running = false;
@@ -384,13 +380,13 @@ static struct tb_ring *ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
 struct tb_ring *ring_alloc_tx(struct tb_nhi *nhi, int hop, int size,
 			      unsigned int flags)
 {
-	return ring_alloc(nhi, hop, size, true, flags);
+	return ring_alloc(nhi, hop, size, true, flags, 0, 0);
 }
 
 struct tb_ring *ring_alloc_rx(struct tb_nhi *nhi, int hop, int size,
-			      unsigned int flags)
+			      unsigned int flags, u16 sof_mask, u16 eof_mask)
 {
-	return ring_alloc(nhi, hop, size, false, flags);
+	return ring_alloc(nhi, hop, size, false, flags, sof_mask, eof_mask);
 }
 
 /**
@@ -400,6 +396,9 @@ struct tb_ring *ring_alloc_rx(struct tb_nhi *nhi, int hop, int size,
  */
 void ring_start(struct tb_ring *ring)
 {
+	u16 frame_size;
+	u32 flags;
+
 	mutex_lock(&ring->nhi->lock);
 	mutex_lock(&ring->lock);
 	if (ring->nhi->going_away)
@@ -411,18 +410,39 @@ void ring_start(struct tb_ring *ring)
 	dev_info(&ring->nhi->pdev->dev, "starting %s %d\n",
 		 RING_TYPE(ring), ring->hop);
 
+	if (ring->flags & RING_FLAG_FRAME) {
+		/* Means 4096 */
+		frame_size = 0;
+		flags = RING_FLAG_ENABLE;
+	} else {
+		frame_size = TB_FRAME_SIZE;
+		flags = RING_FLAG_ENABLE | RING_FLAG_RAW;
+	}
+
+	if (ring->flags & RING_FLAG_E2E && !ring->is_tx) {
+		u32 hop;
+
+		/*
+		 * In order not to lose Rx packets we enable end-to-end
+		 * workaround which transfers Rx credits to an unused Tx
+		 * HopID.
+		 */
+		hop = RING_E2E_UNUSED_HOPID << REG_RX_OPTIONS_E2E_HOP_SHIFT;
+		hop &= REG_RX_OPTIONS_E2E_HOP_MASK;
+		flags |= hop | RING_FLAG_E2E_FLOW_CONTROL;
+	}
+
 	ring_iowrite64desc(ring, ring->descriptors_dma, 0);
 	if (ring->is_tx) {
 		ring_iowrite32desc(ring, ring->size, 12);
 		ring_iowrite32options(ring, 0, 4); /* time releated ? */
-		ring_iowrite32options(ring,
-				      RING_FLAG_ENABLE | RING_FLAG_RAW, 0);
+		ring_iowrite32options(ring, flags, 0);
 	} else {
-		ring_iowrite32desc(ring,
-				   (TB_FRAME_SIZE << 16) | ring->size, 12);
-		ring_iowrite32options(ring, 0xffffffff, 4); /* SOF EOF mask */
-		ring_iowrite32options(ring,
-				      RING_FLAG_ENABLE | RING_FLAG_RAW, 0);
+		u32 sof_eof_mask = ring->sof_mask << 16 | ring->eof_mask;
+
+		ring_iowrite32desc(ring, (frame_size << 16) | ring->size, 12);
+		ring_iowrite32options(ring, sof_eof_mask, 4);
+		ring_iowrite32options(ring, flags, 0);
 	}
 	ring_interrupt_active(ring, true);
 	ring->running = true;
diff --git a/drivers/thunderbolt/nhi.h b/drivers/thunderbolt/nhi.h
index 0e05828983db..4503ddbeccb3 100644
--- a/drivers/thunderbolt/nhi.h
+++ b/drivers/thunderbolt/nhi.h
@@ -56,6 +56,8 @@ struct tb_nhi {
  * @irq: MSI-X irq number if the ring uses MSI-X. %0 otherwise.
  * @vector: MSI-X vector number the ring uses (only set if @irq is > 0)
  * @flags: Ring specific flags
+ * @sof_mask: Bit mask used to detect start of frame PDF
+ * @eof_mask: Bit mask used to detect end of frame PDF
  */
 struct tb_ring {
 	struct mutex lock;
@@ -74,10 +76,16 @@ struct tb_ring {
 	int irq;
 	u8 vector;
 	unsigned int flags;
+	u16 sof_mask;
+	u16 eof_mask;
 };
 
 /* Leave ring interrupt enabled on suspend */
 #define RING_FLAG_NO_SUSPEND	BIT(0)
+/* Configure the ring to be in frame mode */
+#define RING_FLAG_FRAME		BIT(1)
+/* Enable end-to-end flow control */
+#define RING_FLAG_E2E		BIT(2)
 
 struct ring_frame;
 typedef void (*ring_cb)(struct tb_ring*, struct ring_frame*, bool canceled);
@@ -100,7 +108,7 @@ struct ring_frame {
 struct tb_ring *ring_alloc_tx(struct tb_nhi *nhi, int hop, int size,
 			      unsigned int flags);
 struct tb_ring *ring_alloc_rx(struct tb_nhi *nhi, int hop, int size,
-			      unsigned int flags);
+			      unsigned int flags, u16 sof_mask, u16 eof_mask);
 void ring_start(struct tb_ring *ring);
 void ring_stop(struct tb_ring *ring);
 void ring_free(struct tb_ring *ring);
diff --git a/drivers/thunderbolt/nhi_regs.h b/drivers/thunderbolt/nhi_regs.h
index 46eff69b19ad..491a4c0c18fc 100644
--- a/drivers/thunderbolt/nhi_regs.h
+++ b/drivers/thunderbolt/nhi_regs.h
@@ -77,6 +77,8 @@ struct ring_desc {
  * ..: unknown
  */
 #define REG_RX_OPTIONS_BASE	0x29800
+#define REG_RX_OPTIONS_E2E_HOP_MASK	GENMASK(22, 12)
+#define REG_RX_OPTIONS_E2E_HOP_SHIFT	12
 
 /*
  * three bitfields: tx, rx, rx overflow
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 09/16] thunderbolt: Export ring handling functions to modules
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

These are used by Thunderbolt services to send and receive frames over
the high-speed DMA rings.

We also put the functions to tb_ namespace to make sure we do not
collide with others and add missing kernel-doc comments for the exported
functions.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/ctl.c   |  20 +++---
 drivers/thunderbolt/nhi.c   |  62 +++++++++++------
 drivers/thunderbolt/nhi.h   | 147 +----------------------------------------
 include/linux/thunderbolt.h | 158 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 211 insertions(+), 176 deletions(-)

diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index 05400b77dcd7..dd10789e1dbb 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -359,7 +359,7 @@ static int tb_ctl_tx(struct tb_ctl *ctl, const void *data, size_t len,
 	cpu_to_be32_array(pkg->buffer, data, len / 4);
 	*(__be32 *) (pkg->buffer + len) = tb_crc(pkg->buffer, len);
 
-	res = ring_tx(ctl->tx, &pkg->frame);
+	res = tb_ring_tx(ctl->tx, &pkg->frame);
 	if (res) /* ring is stopped */
 		tb_ctl_pkg_free(pkg);
 	return res;
@@ -376,7 +376,7 @@ static bool tb_ctl_handle_event(struct tb_ctl *ctl, enum tb_cfg_pkg_type type,
 
 static void tb_ctl_rx_submit(struct ctl_pkg *pkg)
 {
-	ring_rx(pkg->ctl->rx, &pkg->frame); /*
+	tb_ring_rx(pkg->ctl->rx, &pkg->frame); /*
 					     * We ignore failures during stop.
 					     * All rx packets are referenced
 					     * from ctl->rx_packets, so we do
@@ -614,11 +614,11 @@ struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, event_cb cb, void *cb_data)
 	if (!ctl->frame_pool)
 		goto err;
 
-	ctl->tx = ring_alloc_tx(nhi, 0, 10, RING_FLAG_NO_SUSPEND);
+	ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, RING_FLAG_NO_SUSPEND);
 	if (!ctl->tx)
 		goto err;
 
-	ctl->rx = ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND, 0xffff,
+	ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND, 0xffff,
 				0xffff);
 	if (!ctl->rx)
 		goto err;
@@ -652,9 +652,9 @@ void tb_ctl_free(struct tb_ctl *ctl)
 		return;
 
 	if (ctl->rx)
-		ring_free(ctl->rx);
+		tb_ring_free(ctl->rx);
 	if (ctl->tx)
-		ring_free(ctl->tx);
+		tb_ring_free(ctl->tx);
 
 	/* free RX packets */
 	for (i = 0; i < TB_CTL_RX_PKG_COUNT; i++)
@@ -673,8 +673,8 @@ void tb_ctl_start(struct tb_ctl *ctl)
 {
 	int i;
 	tb_ctl_info(ctl, "control channel starting...\n");
-	ring_start(ctl->tx); /* is used to ack hotplug packets, start first */
-	ring_start(ctl->rx);
+	tb_ring_start(ctl->tx); /* is used to ack hotplug packets, start first */
+	tb_ring_start(ctl->rx);
 	for (i = 0; i < TB_CTL_RX_PKG_COUNT; i++)
 		tb_ctl_rx_submit(ctl->rx_packets[i]);
 
@@ -695,8 +695,8 @@ void tb_ctl_stop(struct tb_ctl *ctl)
 	ctl->running = false;
 	mutex_unlock(&ctl->request_queue_lock);
 
-	ring_stop(ctl->rx);
-	ring_stop(ctl->tx);
+	tb_ring_stop(ctl->rx);
+	tb_ring_stop(ctl->tx);
 
 	if (!list_empty(&ctl->request_queue))
 		tb_ctl_WARN(ctl, "dangling request in request_queue\n");
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index cde8b6d46147..4704ccd20c84 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -253,7 +253,7 @@ static void ring_work(struct work_struct *work)
 	}
 }
 
-int __ring_enqueue(struct tb_ring *ring, struct ring_frame *frame)
+int __tb_ring_enqueue(struct tb_ring *ring, struct ring_frame *frame)
 {
 	int ret = 0;
 	mutex_lock(&ring->lock);
@@ -266,6 +266,7 @@ int __ring_enqueue(struct tb_ring *ring, struct ring_frame *frame)
 	mutex_unlock(&ring->lock);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(__tb_ring_enqueue);
 
 static irqreturn_t ring_msix(int irq, void *data)
 {
@@ -309,9 +310,9 @@ static void ring_release_msix(struct tb_ring *ring)
 	ring->irq = 0;
 }
 
-static struct tb_ring *ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
-				  bool transmit, unsigned int flags,
-				  u16 sof_mask, u16 eof_mask)
+static struct tb_ring *tb_ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
+				     bool transmit, unsigned int flags,
+				     u16 sof_mask, u16 eof_mask)
 {
 	struct tb_ring *ring = NULL;
 	dev_info(&nhi->pdev->dev, "allocating %s ring %d of size %d\n",
@@ -377,24 +378,42 @@ static struct tb_ring *ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
 	return NULL;
 }
 
-struct tb_ring *ring_alloc_tx(struct tb_nhi *nhi, int hop, int size,
-			      unsigned int flags)
+/**
+ * tb_ring_alloc_tx() - Allocate DMA ring for transmit
+ * @nhi: Pointer to the NHI the ring is to be allocated
+ * @hop: HopID (ring) to allocate
+ * @size: Number of entries in the ring
+ * @flags: Flags for the ring
+ */
+struct tb_ring *tb_ring_alloc_tx(struct tb_nhi *nhi, int hop, int size,
+				 unsigned int flags)
 {
-	return ring_alloc(nhi, hop, size, true, flags, 0, 0);
+	return tb_ring_alloc(nhi, hop, size, true, flags, 0, 0);
 }
+EXPORT_SYMBOL_GPL(tb_ring_alloc_tx);
 
-struct tb_ring *ring_alloc_rx(struct tb_nhi *nhi, int hop, int size,
-			      unsigned int flags, u16 sof_mask, u16 eof_mask)
+/**
+ * tb_ring_alloc_rx() - Allocate DMA ring for receive
+ * @nhi: Pointer to the NHI the ring is to be allocated
+ * @hop: HopID (ring) to allocate
+ * @size: Number of entries in the ring
+ * @flags: Flags for the ring
+ * @sof_mask: Mask of PDF values that start a frame
+ * @eof_mask: Mask of PDF values that end a frame
+ */
+struct tb_ring *tb_ring_alloc_rx(struct tb_nhi *nhi, int hop, int size,
+				 unsigned int flags, u16 sof_mask, u16 eof_mask)
 {
-	return ring_alloc(nhi, hop, size, false, flags, sof_mask, eof_mask);
+	return tb_ring_alloc(nhi, hop, size, false, flags, sof_mask, eof_mask);
 }
+EXPORT_SYMBOL_GPL(tb_ring_alloc_rx);
 
 /**
- * ring_start() - enable a ring
+ * tb_ring_start() - enable a ring
  *
- * Must not be invoked in parallel with ring_stop().
+ * Must not be invoked in parallel with tb_ring_stop().
  */
-void ring_start(struct tb_ring *ring)
+void tb_ring_start(struct tb_ring *ring)
 {
 	u16 frame_size;
 	u32 flags;
@@ -450,21 +469,22 @@ void ring_start(struct tb_ring *ring)
 	mutex_unlock(&ring->lock);
 	mutex_unlock(&ring->nhi->lock);
 }
-
+EXPORT_SYMBOL_GPL(tb_ring_start);
 
 /**
- * ring_stop() - shutdown a ring
+ * tb_ring_stop() - shutdown a ring
  *
  * Must not be invoked from a callback.
  *
- * This method will disable the ring. Further calls to ring_tx/ring_rx will
- * return -ESHUTDOWN until ring_stop has been called.
+ * This method will disable the ring. Further calls to
+ * tb_ring_tx/tb_ring_rx will return -ESHUTDOWN until ring_stop has been
+ * called.
  *
  * All enqueued frames will be canceled and their callbacks will be executed
  * with frame->canceled set to true (on the callback thread). This method
  * returns only after all callback invocations have finished.
  */
-void ring_stop(struct tb_ring *ring)
+void tb_ring_stop(struct tb_ring *ring)
 {
 	mutex_lock(&ring->nhi->lock);
 	mutex_lock(&ring->lock);
@@ -497,9 +517,10 @@ void ring_stop(struct tb_ring *ring)
 	schedule_work(&ring->work);
 	flush_work(&ring->work);
 }
+EXPORT_SYMBOL_GPL(tb_ring_stop);
 
 /*
- * ring_free() - free ring
+ * tb_ring_free() - free ring
  *
  * When this method returns all invocations of ring->callback will have
  * finished.
@@ -508,7 +529,7 @@ void ring_stop(struct tb_ring *ring)
  *
  * Must NOT be called from ring_frame->callback!
  */
-void ring_free(struct tb_ring *ring)
+void tb_ring_free(struct tb_ring *ring)
 {
 	mutex_lock(&ring->nhi->lock);
 	/*
@@ -550,6 +571,7 @@ void ring_free(struct tb_ring *ring)
 	mutex_destroy(&ring->lock);
 	kfree(ring);
 }
+EXPORT_SYMBOL_GPL(tb_ring_free);
 
 /**
  * nhi_mailbox_cmd() - Send a command through NHI mailbox
diff --git a/drivers/thunderbolt/nhi.h b/drivers/thunderbolt/nhi.h
index 4503ddbeccb3..771d09ca5dc5 100644
--- a/drivers/thunderbolt/nhi.h
+++ b/drivers/thunderbolt/nhi.h
@@ -7,152 +7,7 @@
 #ifndef DSL3510_H_
 #define DSL3510_H_
 
-#include <linux/idr.h>
-#include <linux/mutex.h>
-#include <linux/workqueue.h>
-
-/**
- * struct tb_nhi - thunderbolt native host interface
- * @lock: Must be held during ring creation/destruction. Is acquired by
- *	  interrupt_work when dispatching interrupts to individual rings.
- * @pdev: Pointer to the PCI device
- * @iobase: MMIO space of the NHI
- * @tx_rings: All Tx rings available on this host controller
- * @rx_rings: All Rx rings available on this host controller
- * @msix_ida: Used to allocate MSI-X vectors for rings
- * @going_away: The host controller device is about to disappear so when
- *		this flag is set, avoid touching the hardware anymore.
- * @interrupt_work: Work scheduled to handle ring interrupt when no
- *		    MSI-X is used.
- * @hop_count: Number of rings (end point hops) supported by NHI.
- */
-struct tb_nhi {
-	struct mutex lock;
-	struct pci_dev *pdev;
-	void __iomem *iobase;
-	struct tb_ring **tx_rings;
-	struct tb_ring **rx_rings;
-	struct ida msix_ida;
-	bool going_away;
-	struct work_struct interrupt_work;
-	u32 hop_count;
-};
-
-/**
- * struct tb_ring - thunderbolt TX or RX ring associated with a NHI
- * @lock: Lock serializing actions to this ring. Must be acquired after
- *	  nhi->lock.
- * @nhi: Pointer to the native host controller interface
- * @size: Size of the ring
- * @hop: Hop (DMA channel) associated with this ring
- * @head: Head of the ring (write next descriptor here)
- * @tail: Tail of the ring (complete next descriptor here)
- * @descriptors: Allocated descriptors for this ring
- * @queue: Queue holding frames to be transferred over this ring
- * @in_flight: Queue holding frames that are currently in flight
- * @work: Interrupt work structure
- * @is_tx: Is the ring Tx or Rx
- * @running: Is the ring running
- * @irq: MSI-X irq number if the ring uses MSI-X. %0 otherwise.
- * @vector: MSI-X vector number the ring uses (only set if @irq is > 0)
- * @flags: Ring specific flags
- * @sof_mask: Bit mask used to detect start of frame PDF
- * @eof_mask: Bit mask used to detect end of frame PDF
- */
-struct tb_ring {
-	struct mutex lock;
-	struct tb_nhi *nhi;
-	int size;
-	int hop;
-	int head;
-	int tail;
-	struct ring_desc *descriptors;
-	dma_addr_t descriptors_dma;
-	struct list_head queue;
-	struct list_head in_flight;
-	struct work_struct work;
-	bool is_tx:1;
-	bool running:1;
-	int irq;
-	u8 vector;
-	unsigned int flags;
-	u16 sof_mask;
-	u16 eof_mask;
-};
-
-/* Leave ring interrupt enabled on suspend */
-#define RING_FLAG_NO_SUSPEND	BIT(0)
-/* Configure the ring to be in frame mode */
-#define RING_FLAG_FRAME		BIT(1)
-/* Enable end-to-end flow control */
-#define RING_FLAG_E2E		BIT(2)
-
-struct ring_frame;
-typedef void (*ring_cb)(struct tb_ring*, struct ring_frame*, bool canceled);
-
-/**
- * struct ring_frame - for use with ring_rx/ring_tx
- */
-struct ring_frame {
-	dma_addr_t buffer_phy;
-	ring_cb callback;
-	struct list_head list;
-	u32 size:12; /* TX: in, RX: out*/
-	u32 flags:12; /* RX: out */
-	u32 eof:4; /* TX:in, RX: out */
-	u32 sof:4; /* TX:in, RX: out */
-};
-
-#define TB_FRAME_SIZE 0x100    /* minimum size for ring_rx */
-
-struct tb_ring *ring_alloc_tx(struct tb_nhi *nhi, int hop, int size,
-			      unsigned int flags);
-struct tb_ring *ring_alloc_rx(struct tb_nhi *nhi, int hop, int size,
-			      unsigned int flags, u16 sof_mask, u16 eof_mask);
-void ring_start(struct tb_ring *ring);
-void ring_stop(struct tb_ring *ring);
-void ring_free(struct tb_ring *ring);
-
-int __ring_enqueue(struct tb_ring *ring, struct ring_frame *frame);
-
-/**
- * ring_rx() - enqueue a frame on an RX ring
- *
- * frame->buffer, frame->buffer_phy and frame->callback have to be set. The
- * buffer must contain at least TB_FRAME_SIZE bytes.
- *
- * frame->callback will be invoked with frame->size, frame->flags, frame->eof,
- * frame->sof set once the frame has been received.
- *
- * If ring_stop is called after the packet has been enqueued frame->callback
- * will be called with canceled set to true.
- *
- * Return: Returns ESHUTDOWN if ring_stop has been called. Zero otherwise.
- */
-static inline int ring_rx(struct tb_ring *ring, struct ring_frame *frame)
-{
-	WARN_ON(ring->is_tx);
-	return __ring_enqueue(ring, frame);
-}
-
-/**
- * ring_tx() - enqueue a frame on an TX ring
- *
- * frame->buffer, frame->buffer_phy, frame->callback, frame->size, frame->eof
- * and frame->sof have to be set.
- *
- * frame->callback will be invoked with once the frame has been transmitted.
- *
- * If ring_stop is called after the packet has been enqueued frame->callback
- * will be called with canceled set to true.
- *
- * Return: Returns ESHUTDOWN if ring_stop has been called. Zero otherwise.
- */
-static inline int ring_tx(struct tb_ring *ring, struct ring_frame *frame)
-{
-	WARN_ON(!ring->is_tx);
-	return __ring_enqueue(ring, frame);
-}
+#include <linux/thunderbolt.h>
 
 enum nhi_fw_mode {
 	NHI_FW_SAFE_MODE,
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index 18c0e3d5e85c..9ddb83ad890f 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -15,10 +15,12 @@
 #define THUNDERBOLT_H_
 
 #include <linux/device.h>
+#include <linux/idr.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/mod_devicetable.h>
 #include <linux/uuid.h>
+#include <linux/workqueue.h>
 
 enum tb_cfg_pkg_type {
 	TB_CFG_PKG_READ = 1,
@@ -397,4 +399,160 @@ static inline struct tb_xdomain *tb_service_parent(struct tb_service *svc)
 	return tb_to_xdomain(svc->dev.parent);
 }
 
+/**
+ * struct tb_nhi - thunderbolt native host interface
+ * @lock: Must be held during ring creation/destruction. Is acquired by
+ *	  interrupt_work when dispatching interrupts to individual rings.
+ * @pdev: Pointer to the PCI device
+ * @iobase: MMIO space of the NHI
+ * @tx_rings: All Tx rings available on this host controller
+ * @rx_rings: All Rx rings available on this host controller
+ * @msix_ida: Used to allocate MSI-X vectors for rings
+ * @going_away: The host controller device is about to disappear so when
+ *		this flag is set, avoid touching the hardware anymore.
+ * @interrupt_work: Work scheduled to handle ring interrupt when no
+ *		    MSI-X is used.
+ * @hop_count: Number of rings (end point hops) supported by NHI.
+ */
+struct tb_nhi {
+	struct mutex lock;
+	struct pci_dev *pdev;
+	void __iomem *iobase;
+	struct tb_ring **tx_rings;
+	struct tb_ring **rx_rings;
+	struct ida msix_ida;
+	bool going_away;
+	struct work_struct interrupt_work;
+	u32 hop_count;
+};
+
+/**
+ * struct tb_ring - thunderbolt TX or RX ring associated with a NHI
+ * @lock: Lock serializing actions to this ring. Must be acquired after
+ *	  nhi->lock.
+ * @nhi: Pointer to the native host controller interface
+ * @size: Size of the ring
+ * @hop: Hop (DMA channel) associated with this ring
+ * @head: Head of the ring (write next descriptor here)
+ * @tail: Tail of the ring (complete next descriptor here)
+ * @descriptors: Allocated descriptors for this ring
+ * @queue: Queue holding frames to be transferred over this ring
+ * @in_flight: Queue holding frames that are currently in flight
+ * @work: Interrupt work structure
+ * @is_tx: Is the ring Tx or Rx
+ * @running: Is the ring running
+ * @irq: MSI-X irq number if the ring uses MSI-X. %0 otherwise.
+ * @vector: MSI-X vector number the ring uses (only set if @irq is > 0)
+ * @flags: Ring specific flags
+ * @sof_mask: Bit mask used to detect start of frame PDF
+ * @eof_mask: Bit mask used to detect end of frame PDF
+ */
+struct tb_ring {
+	struct mutex lock;
+	struct tb_nhi *nhi;
+	int size;
+	int hop;
+	int head;
+	int tail;
+	struct ring_desc *descriptors;
+	dma_addr_t descriptors_dma;
+	struct list_head queue;
+	struct list_head in_flight;
+	struct work_struct work;
+	bool is_tx:1;
+	bool running:1;
+	int irq;
+	u8 vector;
+	unsigned int flags;
+	u16 sof_mask;
+	u16 eof_mask;
+};
+
+/* Leave ring interrupt enabled on suspend */
+#define RING_FLAG_NO_SUSPEND	BIT(0)
+/* Configure the ring to be in frame mode */
+#define RING_FLAG_FRAME		BIT(1)
+/* Enable end-to-end flow control */
+#define RING_FLAG_E2E		BIT(2)
+
+struct ring_frame;
+typedef void (*ring_cb)(struct tb_ring *, struct ring_frame *, bool canceled);
+
+/**
+ * struct ring_frame - For use with ring_rx/ring_tx
+ * @buffer_phy: DMA mapped address of the frame
+ * @callback: Callback called when the frame is finished
+ * @list: Frame is linked to a queue using this
+ * @size: Size of the frame in bytes (%0 means %4096)
+ * @flags: Flags for the frame (see &enum ring_desc_flags)
+ * @eof: End of frame protocol defined field
+ * @sof: Start of frame protocol defined field
+ */
+struct ring_frame {
+	dma_addr_t buffer_phy;
+	ring_cb callback;
+	struct list_head list;
+	u32 size:12;
+	u32 flags:12;
+	u32 eof:4;
+	u32 sof:4;
+};
+
+/* Minimum size for ring_rx */
+#define TB_FRAME_SIZE		0x100
+
+struct tb_ring *tb_ring_alloc_tx(struct tb_nhi *nhi, int hop, int size,
+				 unsigned int flags);
+struct tb_ring *tb_ring_alloc_rx(struct tb_nhi *nhi, int hop, int size,
+				 unsigned int flags, u16 sof_mask,
+				 u16 eof_mask);
+void tb_ring_start(struct tb_ring *ring);
+void tb_ring_stop(struct tb_ring *ring);
+void tb_ring_free(struct tb_ring *ring);
+
+int __tb_ring_enqueue(struct tb_ring *ring, struct ring_frame *frame);
+
+/**
+ * tb_ring_rx() - enqueue a frame on an RX ring
+ * @ring: Ring to enqueue the frame
+ * @frame: Frame to enqueue
+ *
+ * @frame->buffer, @frame->buffer_phy and @frame->callback have to be set. The
+ * buffer must contain at least %TB_FRAME_SIZE bytes.
+ *
+ * @frame->callback will be invoked with @frame->size, @frame->flags,
+ * @frame->eof, @frame->sof set once the frame has been received.
+ *
+ * If ring_stop() is called after the packet has been enqueued
+ * @frame->callback will be called with canceled set to true.
+ *
+ * Return: Returns %-ESHUTDOWN if ring_stop has been called. Zero otherwise.
+ */
+static inline int tb_ring_rx(struct tb_ring *ring, struct ring_frame *frame)
+{
+	WARN_ON(ring->is_tx);
+	return __tb_ring_enqueue(ring, frame);
+}
+
+/**
+ * tb_ring_tx() - enqueue a frame on an TX ring
+ * @ring: Ring the enqueue the frame
+ * @frame: Frame to enqueue
+ *
+ * @frame->buffer, @frame->buffer_phy, @frame->callback, @frame->size,
+ * @frame->eof and @frame->sof have to be set.
+ *
+ * @frame->callback will be invoked with once the frame has been transmitted.
+ *
+ * If ring_stop() is called after the packet has been enqueued @frame->callback
+ * will be called with canceled set to true.
+ *
+ * Return: Returns %-ESHUTDOWN if ring_stop has been called. Zero otherwise.
+ */
+static inline int tb_ring_tx(struct tb_ring *ring, struct ring_frame *frame)
+{
+	WARN_ON(!ring->is_tx);
+	return __tb_ring_enqueue(ring, frame);
+}
+
 #endif /* THUNDERBOLT_H_ */
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 10/16] thunderbolt: Move ring descriptor flags to thunderbolt.h
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

A Thunderbolt service driver might need to check if there was an error
with the descriptor when in frame mode. We also add two Rx specific
error flags RING_DESC_CRC_ERROR and RING_DESC_BUFFER_OVERRUN.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/nhi_regs.h |  7 -------
 include/linux/thunderbolt.h    | 18 ++++++++++++++++++
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/thunderbolt/nhi_regs.h b/drivers/thunderbolt/nhi_regs.h
index 491a4c0c18fc..5ed6934e31e7 100644
--- a/drivers/thunderbolt/nhi_regs.h
+++ b/drivers/thunderbolt/nhi_regs.h
@@ -17,13 +17,6 @@ enum ring_flags {
 	RING_FLAG_ENABLE = 1 << 31,
 };
 
-enum ring_desc_flags {
-	RING_DESC_ISOCH = 0x1, /* TX only? */
-	RING_DESC_COMPLETED = 0x2, /* set by NHI */
-	RING_DESC_POSTED = 0x4, /* always set this */
-	RING_DESC_INTERRUPT = 0x8, /* request an interrupt on completion */
-};
-
 /**
  * struct ring_desc - TX/RX ring entry
  *
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index 9ddb83ad890f..e3b9af7be0ad 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -478,6 +478,24 @@ struct tb_ring {
 struct ring_frame;
 typedef void (*ring_cb)(struct tb_ring *, struct ring_frame *, bool canceled);
 
+/**
+ * enum ring_desc_flags - Flags for DMA ring descriptor
+ * %RING_DESC_ISOCH: Enable isonchronous DMA (Tx only)
+ * %RING_DESC_CRC_ERROR: In frame mode CRC check failed for the frame (Rx only)
+ * %RING_DESC_COMPLETED: Descriptor completed (set by NHI)
+ * %RING_DESC_POSTED: Always set this
+ * %RING_DESC_BUFFER_OVERRUN: RX buffer overrun
+ * %RING_DESC_INTERRUPT: Request an interrupt on completion
+ */
+enum ring_desc_flags {
+	RING_DESC_ISOCH = 0x1,
+	RING_DESC_CRC_ERROR = 0x1,
+	RING_DESC_COMPLETED = 0x2,
+	RING_DESC_POSTED = 0x4,
+	RING_DESC_BUFFER_OVERRUN = 0x04,
+	RING_DESC_INTERRUPT = 0x8,
+};
+
 /**
  * struct ring_frame - For use with ring_rx/ring_tx
  * @buffer_phy: DMA mapped address of the frame
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 11/16] thunderbolt: Use spinlock in ring serialization
From: Mika Westerberg @ 2017-09-25 11:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, Amir Levy,
	Mario.Limonciello, Lukas Wunner, Andy Shevchenko, Andrew Lunn,
	Mika Westerberg, linux-kernel, netdev
In-Reply-To: <20170925110738.68382-1-mika.westerberg@linux.intel.com>

This makes it possible to enqueue frames also from atomic context which
is needed for example, when networking packets are sent over a
Thunderbolt cable.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Michael Jamet <michael.jamet@intel.com>
Reviewed-by: Yehezkel Bernat <yehezkel.bernat@intel.com>
---
 drivers/thunderbolt/nhi.c   | 26 ++++++++++++++------------
 include/linux/thunderbolt.h |  2 +-
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 4704ccd20c84..d1ad37c6eccf 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -212,8 +212,10 @@ static void ring_work(struct work_struct *work)
 	struct tb_ring *ring = container_of(work, typeof(*ring), work);
 	struct ring_frame *frame;
 	bool canceled = false;
+	unsigned long flags;
 	LIST_HEAD(done);
-	mutex_lock(&ring->lock);
+
+	spin_lock_irqsave(&ring->lock, flags);
 
 	if (!ring->running) {
 		/*  Move all frames to done and mark them as canceled. */
@@ -241,7 +243,8 @@ static void ring_work(struct work_struct *work)
 	ring_write_descriptors(ring);
 
 invoke_callback:
-	mutex_unlock(&ring->lock); /* allow callbacks to schedule new work */
+	/* allow callbacks to schedule new work */
+	spin_unlock_irqrestore(&ring->lock, flags);
 	while (!list_empty(&done)) {
 		frame = list_first_entry(&done, typeof(*frame), list);
 		/*
@@ -255,15 +258,17 @@ static void ring_work(struct work_struct *work)
 
 int __tb_ring_enqueue(struct tb_ring *ring, struct ring_frame *frame)
 {
+	unsigned long flags;
 	int ret = 0;
-	mutex_lock(&ring->lock);
+
+	spin_lock_irqsave(&ring->lock, flags);
 	if (ring->running) {
 		list_add_tail(&frame->list, &ring->queue);
 		ring_write_descriptors(ring);
 	} else {
 		ret = -ESHUTDOWN;
 	}
-	mutex_unlock(&ring->lock);
+	spin_unlock_irqrestore(&ring->lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(__tb_ring_enqueue);
@@ -338,7 +343,7 @@ static struct tb_ring *tb_ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
 	if (!ring)
 		goto err;
 
-	mutex_init(&ring->lock);
+	spin_lock_init(&ring->lock);
 	INIT_LIST_HEAD(&ring->queue);
 	INIT_LIST_HEAD(&ring->in_flight);
 	INIT_WORK(&ring->work, ring_work);
@@ -371,8 +376,6 @@ static struct tb_ring *tb_ring_alloc(struct tb_nhi *nhi, u32 hop, int size,
 	return ring;
 
 err:
-	if (ring)
-		mutex_destroy(&ring->lock);
 	kfree(ring);
 	mutex_unlock(&nhi->lock);
 	return NULL;
@@ -419,7 +422,7 @@ void tb_ring_start(struct tb_ring *ring)
 	u32 flags;
 
 	mutex_lock(&ring->nhi->lock);
-	mutex_lock(&ring->lock);
+	spin_lock_irq(&ring->lock);
 	if (ring->nhi->going_away)
 		goto err;
 	if (ring->running) {
@@ -466,7 +469,7 @@ void tb_ring_start(struct tb_ring *ring)
 	ring_interrupt_active(ring, true);
 	ring->running = true;
 err:
-	mutex_unlock(&ring->lock);
+	spin_unlock_irq(&ring->lock);
 	mutex_unlock(&ring->nhi->lock);
 }
 EXPORT_SYMBOL_GPL(tb_ring_start);
@@ -487,7 +490,7 @@ EXPORT_SYMBOL_GPL(tb_ring_start);
 void tb_ring_stop(struct tb_ring *ring)
 {
 	mutex_lock(&ring->nhi->lock);
-	mutex_lock(&ring->lock);
+	spin_lock_irq(&ring->lock);
 	dev_info(&ring->nhi->pdev->dev, "stopping %s %d\n",
 		 RING_TYPE(ring), ring->hop);
 	if (ring->nhi->going_away)
@@ -508,7 +511,7 @@ void tb_ring_stop(struct tb_ring *ring)
 	ring->running = false;
 
 err:
-	mutex_unlock(&ring->lock);
+	spin_unlock_irq(&ring->lock);
 	mutex_unlock(&ring->nhi->lock);
 
 	/*
@@ -568,7 +571,6 @@ void tb_ring_free(struct tb_ring *ring)
 	 * to finish before freeing the ring.
 	 */
 	flush_work(&ring->work);
-	mutex_destroy(&ring->lock);
 	kfree(ring);
 }
 EXPORT_SYMBOL_GPL(tb_ring_free);
diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h
index e3b9af7be0ad..cf9e42db780f 100644
--- a/include/linux/thunderbolt.h
+++ b/include/linux/thunderbolt.h
@@ -448,7 +448,7 @@ struct tb_nhi {
  * @eof_mask: Bit mask used to detect end of frame PDF
  */
 struct tb_ring {
-	struct mutex lock;
+	spinlock_t lock;
 	struct tb_nhi *nhi;
 	int size;
 	int hop;
-- 
2.14.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox