* [PATCH rdma-next v3 1/5] RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and UDP source port
2020-05-04 5:19 [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
@ 2020-05-04 5:19 ` Leon Romanovsky
2020-05-04 5:19 ` [PATCH rdma-next v3 2/5] RDMA/core: Consider flow label when building skb Leon Romanovsky
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-05-04 5:19 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb
From: Mark Zhang <markz@mellanox.com>
Add two hash functions to distribute RoCE v2 UDP source and Flowlabel
symmetrically. These are user visible API and any change in the
implementation needs to be tested for inter-operability between old and
new variant.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
include/rdma/ib_verbs.h | 44 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8d29f2f79da8..f44b76a43c6d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4709,4 +4709,48 @@ static inline struct ib_device *rdma_device_to_ibdev(struct device *device)
bool rdma_dev_access_netns(const struct ib_device *device,
const struct net *net);
+
+#define IB_ROCE_UDP_ENCAP_VALID_PORT_MIN (0xC000)
+#define IB_GRH_FLOWLABEL_MASK (0x000FFFFF)
+
+/**
+ * rdma_flow_label_to_udp_sport - generate a RoCE v2 UDP src port value based
+ * on the flow_label
+ *
+ * This function will convert the 20 bit flow_label input to a valid RoCE v2
+ * UDP src port 14 bit value. All RoCE V2 drivers should use this same
+ * convention.
+ */
+static inline u16 rdma_flow_label_to_udp_sport(u32 fl)
+{
+ u32 fl_low = fl & 0x03fff, fl_high = fl & 0xFC000;
+
+ fl_low ^= fl_high >> 14;
+ return (u16)(fl_low | IB_ROCE_UDP_ENCAP_VALID_PORT_MIN);
+}
+
+/**
+ * rdma_calc_flow_label - generate a RDMA symmetric flow label value based on
+ * local and remote qpn values
+ *
+ * This function folded the multiplication results of two qpns, 24 bit each,
+ * fields, and converts it to a 20 bit results.
+ *
+ * This function will create symmetric flow_label value based on the local
+ * and remote qpn values. this will allow both the requester and responder
+ * to calculate the same flow_label for a given connection.
+ *
+ * This helper function should be used by driver in case the upper layer
+ * provide a zero flow_label value. This is to improve entropy of RDMA
+ * traffic in the network.
+ */
+static inline u32 rdma_calc_flow_label(u32 lqpn, u32 rqpn)
+{
+ u64 v = (u64)lqpn * rqpn;
+
+ v ^= v >> 20;
+ v ^= v >> 40;
+
+ return (u32)(v & IB_GRH_FLOWLABEL_MASK);
+}
#endif /* IB_VERBS_H */
--
2.26.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH rdma-next v3 2/5] RDMA/core: Consider flow label when building skb
2020-05-04 5:19 [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
2020-05-04 5:19 ` [PATCH rdma-next v3 1/5] RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and UDP source port Leon Romanovsky
@ 2020-05-04 5:19 ` Leon Romanovsky
2020-05-04 5:19 ` [PATCH rdma-next v3 3/5] RDMA/mlx5: Define RoCEv2 udp source port when set path Leon Romanovsky
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-05-04 5:19 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Maor Gottlieb, linux-rdma
From: Maor Gottlieb <maorg@mellanox.com>
Use rdma_flow_label_to_udp_sport to calculate the UDP source port
of the RoCEV2 packet.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/lag.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/lag.c b/drivers/infiniband/core/lag.c
index a29533626a7c..7063e41eaf26 100644
--- a/drivers/infiniband/core/lag.c
+++ b/drivers/infiniband/core/lag.c
@@ -34,7 +34,8 @@ static struct sk_buff *rdma_build_skb(struct ib_device *device,
skb_push(skb, sizeof(struct udphdr));
skb_reset_transport_header(skb);
uh = udp_hdr(skb);
- uh->source = htons(0xC000);
+ uh->source =
+ htons(rdma_flow_label_to_udp_sport(ah_attr->grh.flow_label));
uh->dest = htons(ROCE_V2_UDP_DPORT);
uh->len = htons(sizeof(struct udphdr));
@@ -114,7 +115,8 @@ struct net_device *rdma_lag_get_ah_roce_slave(struct ib_device *device,
struct net_device *master;
if (!(ah_attr->type == RDMA_AH_ATTR_TYPE_ROCE &&
- ah_attr->grh.sgid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP))
+ ah_attr->grh.sgid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP &&
+ ah_attr->grh.flow_label))
return NULL;
rcu_read_lock();
--
2.26.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH rdma-next v3 3/5] RDMA/mlx5: Define RoCEv2 udp source port when set path
2020-05-04 5:19 [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
2020-05-04 5:19 ` [PATCH rdma-next v3 1/5] RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and UDP source port Leon Romanovsky
2020-05-04 5:19 ` [PATCH rdma-next v3 2/5] RDMA/core: Consider flow label when building skb Leon Romanovsky
@ 2020-05-04 5:19 ` Leon Romanovsky
2020-05-04 5:19 ` [PATCH rdma-next v3 4/5] RDMA/cma: Initialize the flow label of CM's route path record Leon Romanovsky
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-05-04 5:19 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb
From: Mark Zhang <markz@mellanox.com>
Calculate and set UDP source port based on the flow label. If flow label is
not defined in GRH then calculate it based on lqpn/rqpn.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/qp.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 810bbd52daec..e624886bcf85 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3133,6 +3133,21 @@ static int modify_raw_packet_tx_affinity(struct mlx5_core_dev *dev,
return err;
}
+static void mlx5_set_path_udp_sport(struct mlx5_qp_path *path,
+ const struct rdma_ah_attr *ah,
+ u32 lqpn, u32 rqpn)
+
+{
+ u32 fl = ah->grh.flow_label;
+ u16 sport;
+
+ if (!fl)
+ fl = rdma_calc_flow_label(lqpn, rqpn);
+
+ sport = rdma_flow_label_to_udp_sport(fl);
+ path->udp_sport = cpu_to_be16(sport);
+}
+
static int mlx5_set_path(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
const struct rdma_ah_attr *ah,
struct mlx5_qp_path *path, u8 port, int attr_mask,
@@ -3164,12 +3179,15 @@ static int mlx5_set_path(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
return -EINVAL;
memcpy(path->rmac, ah->roce.dmac, sizeof(ah->roce.dmac));
- if (qp->ibqp.qp_type == IB_QPT_RC ||
- qp->ibqp.qp_type == IB_QPT_UC ||
- qp->ibqp.qp_type == IB_QPT_XRC_INI ||
- qp->ibqp.qp_type == IB_QPT_XRC_TGT)
- path->udp_sport =
- mlx5_get_roce_udp_sport(dev, ah->grh.sgid_attr);
+ if ((qp->ibqp.qp_type == IB_QPT_RC ||
+ qp->ibqp.qp_type == IB_QPT_UC ||
+ qp->ibqp.qp_type == IB_QPT_XRC_INI ||
+ qp->ibqp.qp_type == IB_QPT_XRC_TGT) &&
+ (grh->sgid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) &&
+ (attr_mask & IB_QP_DEST_QPN))
+ mlx5_set_path_udp_sport(path, ah,
+ qp->ibqp.qp_num,
+ attr->dest_qp_num);
path->dci_cfi_prio_sl = (sl & 0x7) << 4;
gid_type = ah->grh.sgid_attr->gid_type;
if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
--
2.26.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH rdma-next v3 4/5] RDMA/cma: Initialize the flow label of CM's route path record
2020-05-04 5:19 [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
` (2 preceding siblings ...)
2020-05-04 5:19 ` [PATCH rdma-next v3 3/5] RDMA/mlx5: Define RoCEv2 udp source port when set path Leon Romanovsky
@ 2020-05-04 5:19 ` Leon Romanovsky
2020-05-04 5:19 ` [PATCH rdma-next v3 5/5] RDMA/mlx5: Set UDP source port based on the grh.flow_label Leon Romanovsky
2020-05-06 20:23 ` [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP Jason Gunthorpe
5 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-05-04 5:19 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb
From: Mark Zhang <markz@mellanox.com>
If flow label is not set by the user or it's not IPv4, initialize it with
the cma src/dst based on the "Kernighan and Ritchie's hash function".
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/cma.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 6406a597dfb6..e74c2ba82e7b 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2909,6 +2909,24 @@ static int iboe_tos_to_sl(struct net_device *ndev, int tos)
return 0;
}
+static __be32 cma_get_roce_udp_flow_label(struct rdma_id_private *id_priv)
+{
+ struct sockaddr_in6 *addr6;
+ u16 dport, sport;
+ u32 hash, fl;
+
+ addr6 = (struct sockaddr_in6 *)cma_src_addr(id_priv);
+ fl = be32_to_cpu(addr6->sin6_flowinfo) & IB_GRH_FLOWLABEL_MASK;
+ if ((cma_family(id_priv) != AF_INET6) || !fl) {
+ dport = be16_to_cpu(cma_port(cma_dst_addr(id_priv)));
+ sport = be16_to_cpu(cma_port(cma_src_addr(id_priv)));
+ hash = (u32)sport * 31 + dport;
+ fl = hash & IB_GRH_FLOWLABEL_MASK;
+ }
+
+ return cpu_to_be32(fl);
+}
+
static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
{
struct rdma_route *route = &id_priv->id.route;
@@ -2975,6 +2993,11 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
goto err2;
}
+ if (rdma_protocol_roce_udp_encap(id_priv->id.device,
+ id_priv->id.port_num))
+ route->path_rec->flow_label =
+ cma_get_roce_udp_flow_label(id_priv);
+
cma_init_resolve_route_work(work, id_priv);
queue_work(cma_wq, &work->work);
--
2.26.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH rdma-next v3 5/5] RDMA/mlx5: Set UDP source port based on the grh.flow_label
2020-05-04 5:19 [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
` (3 preceding siblings ...)
2020-05-04 5:19 ` [PATCH rdma-next v3 4/5] RDMA/cma: Initialize the flow label of CM's route path record Leon Romanovsky
@ 2020-05-04 5:19 ` Leon Romanovsky
2020-05-06 20:23 ` [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP Jason Gunthorpe
5 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-05-04 5:19 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Mark Zhang, linux-rdma, Maor Gottlieb
From: Mark Zhang <markz@mellanox.com>
Calculate UDP source port based on the grh.flow_label. If grh.flow_label
is not valid, we will use minimal supported UDP source port.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/ah.c | 21 +++++++++++++++++++--
drivers/infiniband/hw/mlx5/main.c | 4 ++--
drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 ++--
3 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
index cc858f658567..59e5ec39b447 100644
--- a/drivers/infiniband/hw/mlx5/ah.c
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -32,6 +32,24 @@
#include "mlx5_ib.h"
+static __be16 mlx5_ah_get_udp_sport(const struct mlx5_ib_dev *dev,
+ const struct rdma_ah_attr *ah_attr)
+{
+ enum ib_gid_type gid_type = ah_attr->grh.sgid_attr->gid_type;
+ __be16 sport;
+
+ if ((gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) &&
+ (rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH) &&
+ (ah_attr->grh.flow_label & IB_GRH_FLOWLABEL_MASK))
+ sport = cpu_to_be16(
+ rdma_flow_label_to_udp_sport(ah_attr->grh.flow_label));
+ else
+ sport = mlx5_get_roce_udp_sport_min(dev,
+ ah_attr->grh.sgid_attr);
+
+ return sport;
+}
+
static void create_ib_ah(struct mlx5_ib_dev *dev, struct mlx5_ib_ah *ah,
struct rdma_ah_init_attr *init_attr)
{
@@ -60,8 +78,7 @@ static void create_ib_ah(struct mlx5_ib_dev *dev, struct mlx5_ib_ah *ah,
memcpy(ah->av.rmac, ah_attr->roce.dmac,
sizeof(ah_attr->roce.dmac));
- ah->av.udp_sport =
- mlx5_get_roce_udp_sport(dev, ah_attr->grh.sgid_attr);
+ ah->av.udp_sport = mlx5_ah_get_udp_sport(dev, ah_attr);
ah->av.stat_rate_sl |= (rdma_ah_get_sl(ah_attr) & 0x7) << 1;
if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
#define MLX5_ECN_ENABLED BIT(1)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index e7fb290c9d8d..0b8cc219e085 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -629,8 +629,8 @@ static int mlx5_ib_del_gid(const struct ib_gid_attr *attr,
attr->index, NULL, NULL);
}
-__be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev,
- const struct ib_gid_attr *attr)
+__be16 mlx5_get_roce_udp_sport_min(const struct mlx5_ib_dev *dev,
+ const struct ib_gid_attr *attr)
{
if (attr->gid_type != IB_GID_TYPE_ROCE_UDP_ENCAP)
return 0;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index f250753319d0..3041808773e6 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1356,8 +1356,8 @@ int mlx5_ib_get_vf_guid(struct ib_device *device, int vf, u8 port,
int mlx5_ib_set_vf_guid(struct ib_device *device, int vf, u8 port,
u64 guid, int type);
-__be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev,
- const struct ib_gid_attr *attr);
+__be16 mlx5_get_roce_udp_sport_min(const struct mlx5_ib_dev *dev,
+ const struct ib_gid_attr *attr);
void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num);
void mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num);
--
2.26.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP
2020-05-04 5:19 [PATCH rdma-next v3 0/5] Set flow_label and RoCEv2 UDP source port for datagram QP Leon Romanovsky
` (4 preceding siblings ...)
2020-05-04 5:19 ` [PATCH rdma-next v3 5/5] RDMA/mlx5: Set UDP source port based on the grh.flow_label Leon Romanovsky
@ 2020-05-06 20:23 ` Jason Gunthorpe
5 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2020-05-06 20:23 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, linux-kernel, linux-rdma,
Maor Gottlieb, Mark Zhang
On Mon, May 04, 2020 at 08:19:30AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
>
> Changelog:
> v3: Rebased on latest rdma-nex, which includes HCA set capability patch
> and LAG code and this is why new patch from Maor was added.
> v2: https://lore.kernel.org/linux-rdma/20200413133703.932731-1-leon@kernel.org
> Dropped patch "RDMA/cm: Set flow label of recv_wc based on primary
> flow label", because it violates IBTA 13.5.4.3/13.5.4.4 sections.
> v1: https://lore.kernel.org/lkml/20200322093031.918447-1-leon@kernel.org
> Added extra patch to reduce amount of kzalloc/kfree calls in
> the HCA set capability flow.
> v0: https://lore.kernel.org/linux-rdma/20200318095300.45574-1-leon@kernel.org
> --------------------------------
>
> >From Mark:
>
> This series provide flow label and UDP source port definition in RoCE v2.
> Those fields are used to create entropy for network routes (ECMP), load
> balancers and 802.3ad link aggregation switching that are not aware of
> RoCE headers.
>
> Thanks.
>
> Maor Gottlieb (1):
> RDMA/core: Consider flow label when building skb
>
> Mark Zhang (4):
> RDMA/core: Add hash functions to calculate RoCEv2 flowlabel and UDP
> source port
> RDMA/mlx5: Define RoCEv2 udp source port when set path
> RDMA/cma: Initialize the flow label of CM's route path record
> RDMA/mlx5: Set UDP source port based on the grh.flow_label
Applied to for-next
Thanks,
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread