* [PATCH] IB/cma: use cached gids
@ 2013-09-08 21:44 Doug Ledford
[not found] ` <681df2ae830b20b81fef0c57c41f3bbce5c8c1cc.1378676509.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Doug Ledford @ 2013-09-08 21:44 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A
Cc: Doug Ledford, Sean Hefty, Eli Cohen
The cma_acquire_dev function was changed by commit 3c86aa70bf67
to use find_gid_port because multiport devices might have
either IB or IBoE formatted gids. The old function assumed that
all ports on the same device used the same GID format. However,
when it was changed to use find_gid_port, we inadvertently lost
usage of the GID cache. This turned out to be a very costly
change. In our testing, each iteration through each index of
the GID table takes roughly 35us. When you have multiple
devices in a system, and the GID you are looking for is on one
of the later devices, the code loops through all of the GID
indexes on all of the early devices before it finally succeeds
on the target device. This pathological search behavior combined
with 35us per GID table index retrieval results in results such
as the following from the cmtime application that's part of the
latest librdmacm git repo:
cmtime -b <card1, port1, mthca> -c 10000
step total ms max ms min us us / conn
create id : 33.88 0.06 1.00 3.39
bind addr : 1029.22 0.42 85.00 102.92
resolve addr : 50.40 25.93 23244.00 5.04
resolve route: 578.06 551.67 26457.00 57.81
create qp : 603.69 0.33 51.00 60.37
connect : 6461.23 6417.50 43963.00 646.12
disconnect : 877.99 659.96 162985.00 87.80
destroy : 38.67 0.03 2.00 3.87
cmtime -b <card1, port2, mthca> -c 10000
step total ms max ms min us us / conn
create id : 34.74 0.07 1.00 3.47
bind addr : 21759.39 2.75 1874.00 2175.94
resolve addr : 50.67 26.30 23962.00 5.07
resolve route: 622.68 594.80 27952.00 62.27
create qp : 599.82 0.23 49.00 59.98
connect : 24761.36 24709.28 49183.00 2476.14
disconnect : 904.57 652.34 187201.00 90.46
destroy : 38.94 0.04 2.00 3.89
cmtime -b <card2, port1, mlx4, IB> -c 10000
step total ms max ms min us us / conn
create id : 35.13 0.05 1.00 3.51
bind addr : 47421.04 6.38 3896.00 4742.10
resolve addr : 50.60 25.54 24248.00 5.06
resolve route: 524.76 498.97 25861.00 52.48
create qp : 3137.70 5.68 251.00 313.77
connect : 48959.76 48894.49 31841.00 4895.98
disconnect : 101926.72 98431.12 538689.00 10192.67
destroy : 37.63 0.04 2.00 3.76
cmtime -b <card2, port2, mlx4, IBoE> -c 5000
step total ms max ms min us us / conn
create id : 28.04 0.05 1.00 5.61
bind addr : 235.03 0.17 41.00 47.01
resolve addr : 27.45 14.97 12308.00 5.49
resolve route: 556.26 540.88 15514.00 111.25
create qp : 1323.23 5.73 210.00 264.65
connect : 84025.30 83960.46 61319.00 16805.06
disconnect : 2273.15 1734.22 417534.00 454.63
destroy : 21.28 0.06 2.00 4.26
Clearly, both the bind address and connect operations suffer
a huge penalty for being anything other than the default
GID on the first port in the system. Note: I had to reduce
the number of connections to 5000 to get the IBoE test to
complete, so it's numbers aren't fully comparable to the
rest of the tests.
After applying this patch, the numbers now look like this:
cmtime -b <card1, port1, mthca> -c 10000
step total ms max ms min us us / conn
create id : 30.30 0.04 1.00 3.03
bind addr : 26.15 0.03 1.00 2.62
resolve addr : 47.18 24.62 22336.00 4.72
resolve route: 642.78 617.61 25242.00 64.28
create qp : 610.06 0.61 52.00 61.01
connect : 43362.32 43303.70 59353.00 4336.23
disconnect : 877.59 658.70 165291.00 87.76
destroy : 40.03 0.05 2.00 4.00
cmtime -b <card1, port2, mthca> -c 10000
step total ms max ms min us us / conn
create id : 31.34 0.07 1.00 3.13
bind addr : 42.37 0.03 3.00 4.24
resolve addr : 47.19 24.92 22003.00 4.72
resolve route: 580.25 553.65 26680.00 58.03
create qp : 687.45 0.30 52.00 68.74
connect : 37457.12 37384.62 73015.00 3745.71
disconnect : 900.72 648.67 183825.00 90.07
destroy : 39.05 0.05 2.00 3.90
cmtime -b <card2, port1, mlx4, IB> -c 10000
step total ms max ms min us us / conn
create id : 36.26 0.05 1.00 3.63
bind addr : 75.29 0.10 4.00 7.53
resolve addr : 52.59 28.60 24753.00 5.26
resolve route: 628.05 602.16 25969.00 62.80
create qp : 3125.12 5.48 272.00 312.51
connect : 55779.52 55704.92 75617.00 5577.95
disconnect : 100925.06 98429.45 476384.00 10092.51
destroy : 51.92 0.07 2.00 5.19
cmtime -b <card2, port2, mlx4, IBoE> -c 5000
step total ms max ms min us us / conn
create id : 26.89 0.05 1.00 5.38
bind addr : 27.51 0.03 4.00 5.50
resolve addr : 22.77 11.78 11004.00 4.55
resolve route: 505.13 489.99 15252.00 101.03
create qp : 1331.86 5.71 209.00 266.37
connect : 82606.75 82537.34 70746.00 16521.35
disconnect : 2261.20 1734.62 406724.00 452.24
destroy : 22.19 0.04 2.00 4.44
Many thanks to Neil Horman for helping to track the source of
slow function that allowed us to track down the fact that
the original patch I mentioned above backed out cache usage
and identify just how much that impacted the system.
Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/core/cma.c | 35 ++++++-----------------------------
1 file changed, 6 insertions(+), 29 deletions(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index dab4b41..c62ff9e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -328,28 +328,6 @@ static int cma_set_qkey(struct rdma_id_private *id_priv, u32 qkey)
return ret;
}
-static int find_gid_port(struct ib_device *device, union ib_gid *gid, u8 port_num)
-{
- int i;
- int err;
- struct ib_port_attr props;
- union ib_gid tmp;
-
- err = ib_query_port(device, port_num, &props);
- if (err)
- return err;
-
- for (i = 0; i < props.gid_tbl_len; ++i) {
- err = ib_query_gid(device, port_num, i, &tmp);
- if (err)
- return err;
- if (!memcmp(&tmp, gid, sizeof tmp))
- return 0;
- }
-
- return -EADDRNOTAVAIL;
-}
-
static void cma_translate_ib(struct sockaddr_ib *sib, struct rdma_dev_addr *dev_addr)
{
dev_addr->dev_type = ARPHRD_INFINIBAND;
@@ -377,7 +355,7 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
struct cma_device *cma_dev;
union ib_gid gid, iboe_gid;
int ret = -ENODEV;
- u8 port;
+ u8 port, found_port;
enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ?
IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
@@ -390,20 +368,19 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
memcpy(&gid, dev_addr->src_dev_addr +
rdma_addr_gid_offset(dev_addr), sizeof gid);
list_for_each_entry(cma_dev, &dev_list, list) {
- for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) {
+ for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port)
if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) {
if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET)
- ret = find_gid_port(cma_dev->device, &iboe_gid, port);
+ ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL);
else
- ret = find_gid_port(cma_dev->device, &gid, port);
+ ret = ib_find_cached_gid(cma_dev->device, &gid, &found_port, NULL);
- if (!ret) {
- id_priv->id.port_num = port;
+ if (!ret && (port == found_port)) {
+ id_priv->id.port_num = found_port;
goto out;
}
}
- }
}
out:
--
1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 4+ messages in thread[parent not found: <681df2ae830b20b81fef0c57c41f3bbce5c8c1cc.1378676509.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* RE: [PATCH] IB/cma: use cached gids [not found] ` <681df2ae830b20b81fef0c57c41f3bbce5c8c1cc.1378676509.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2013-09-10 23:39 ` Hefty, Sean 2013-09-11 1:17 ` Roland Dreier 1 sibling, 0 replies; 4+ messages in thread From: Hefty, Sean @ 2013-09-10 23:39 UTC (permalink / raw) To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org Cc: Eli Cohen > The cma_acquire_dev function was changed by commit 3c86aa70bf67 > to use find_gid_port because multiport devices might have > either IB or IBoE formatted gids. The old function assumed that > all ports on the same device used the same GID format. However, > when it was changed to use find_gid_port, we inadvertently lost > usage of the GID cache. This turned out to be a very costly > change. In our testing, each iteration through each index of > the GID table takes roughly 35us. When you have multiple > devices in a system, and the GID you are looking for is on one > of the later devices, the code loops through all of the GID > indexes on all of the early devices before it finally succeeds > on the target device. This pathological search behavior combined > with 35us per GID table index retrieval results in results such > as the following from the cmtime application that's part of the > latest librdmacm git repo: > > cmtime -b <card1, port1, mthca> -c 10000 > > step total ms max ms min us us / conn > create id : 33.88 0.06 1.00 3.39 > bind addr : 1029.22 0.42 85.00 102.92 > resolve addr : 50.40 25.93 23244.00 5.04 > resolve route: 578.06 551.67 26457.00 57.81 > create qp : 603.69 0.33 51.00 60.37 > connect : 6461.23 6417.50 43963.00 646.12 > disconnect : 877.99 659.96 162985.00 87.80 > destroy : 38.67 0.03 2.00 3.87 > > cmtime -b <card1, port2, mthca> -c 10000 > > step total ms max ms min us us / conn > create id : 34.74 0.07 1.00 3.47 > bind addr : 21759.39 2.75 1874.00 2175.94 > resolve addr : 50.67 26.30 23962.00 5.07 > resolve route: 622.68 594.80 27952.00 62.27 > create qp : 599.82 0.23 49.00 59.98 > connect : 24761.36 24709.28 49183.00 2476.14 > disconnect : 904.57 652.34 187201.00 90.46 > destroy : 38.94 0.04 2.00 3.89 > > cmtime -b <card2, port1, mlx4, IB> -c 10000 > > step total ms max ms min us us / conn > create id : 35.13 0.05 1.00 3.51 > bind addr : 47421.04 6.38 3896.00 4742.10 > resolve addr : 50.60 25.54 24248.00 5.06 > resolve route: 524.76 498.97 25861.00 52.48 > create qp : 3137.70 5.68 251.00 313.77 > connect : 48959.76 48894.49 31841.00 4895.98 > disconnect : 101926.72 98431.12 538689.00 10192.67 > destroy : 37.63 0.04 2.00 3.76 > > cmtime -b <card2, port2, mlx4, IBoE> -c 5000 > > step total ms max ms min us us / conn > create id : 28.04 0.05 1.00 5.61 > bind addr : 235.03 0.17 41.00 47.01 > resolve addr : 27.45 14.97 12308.00 5.49 > resolve route: 556.26 540.88 15514.00 111.25 > create qp : 1323.23 5.73 210.00 264.65 > connect : 84025.30 83960.46 61319.00 16805.06 > disconnect : 2273.15 1734.22 417534.00 454.63 > destroy : 21.28 0.06 2.00 4.26 > > Clearly, both the bind address and connect operations suffer > a huge penalty for being anything other than the default > GID on the first port in the system. Note: I had to reduce > the number of connections to 5000 to get the IBoE test to > complete, so it's numbers aren't fully comparable to the > rest of the tests. > > After applying this patch, the numbers now look like this: > > cmtime -b <card1, port1, mthca> -c 10000 > > step total ms max ms min us us / conn > create id : 30.30 0.04 1.00 3.03 > bind addr : 26.15 0.03 1.00 2.62 > resolve addr : 47.18 24.62 22336.00 4.72 > resolve route: 642.78 617.61 25242.00 64.28 > create qp : 610.06 0.61 52.00 61.01 > connect : 43362.32 43303.70 59353.00 4336.23 > disconnect : 877.59 658.70 165291.00 87.76 > destroy : 40.03 0.05 2.00 4.00 > > cmtime -b <card1, port2, mthca> -c 10000 > > step total ms max ms min us us / conn > create id : 31.34 0.07 1.00 3.13 > bind addr : 42.37 0.03 3.00 4.24 > resolve addr : 47.19 24.92 22003.00 4.72 > resolve route: 580.25 553.65 26680.00 58.03 > create qp : 687.45 0.30 52.00 68.74 > connect : 37457.12 37384.62 73015.00 3745.71 > disconnect : 900.72 648.67 183825.00 90.07 > destroy : 39.05 0.05 2.00 3.90 > > cmtime -b <card2, port1, mlx4, IB> -c 10000 > > step total ms max ms min us us / conn > create id : 36.26 0.05 1.00 3.63 > bind addr : 75.29 0.10 4.00 7.53 > resolve addr : 52.59 28.60 24753.00 5.26 > resolve route: 628.05 602.16 25969.00 62.80 > create qp : 3125.12 5.48 272.00 312.51 > connect : 55779.52 55704.92 75617.00 5577.95 > disconnect : 100925.06 98429.45 476384.00 10092.51 > destroy : 51.92 0.07 2.00 5.19 > > cmtime -b <card2, port2, mlx4, IBoE> -c 5000 > > step total ms max ms min us us / conn > create id : 26.89 0.05 1.00 5.38 > bind addr : 27.51 0.03 4.00 5.50 > resolve addr : 22.77 11.78 11004.00 4.55 > resolve route: 505.13 489.99 15252.00 101.03 > create qp : 1331.86 5.71 209.00 266.37 > connect : 82606.75 82537.34 70746.00 16521.35 > disconnect : 2261.20 1734.62 406724.00 452.24 > destroy : 22.19 0.04 2.00 4.44 > > Many thanks to Neil Horman for helping to track the source of > slow function that allowed us to track down the fact that > the original patch I mentioned above backed out cache usage > and identify just how much that impacted the system. > > Signed-off-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Acked-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] IB/cma: use cached gids [not found] ` <681df2ae830b20b81fef0c57c41f3bbce5c8c1cc.1378676509.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2013-09-10 23:39 ` Hefty, Sean @ 2013-09-11 1:17 ` Roland Dreier [not found] ` <CAG4TOxOTPPNdzpNmRkiwsjEcGEoVG1=H2e6YHddXFnAGXHFL6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 4+ messages in thread From: Roland Dreier @ 2013-09-11 1:17 UTC (permalink / raw) To: Doug Ledford Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty, Eli Cohen On Sun, Sep 8, 2013 at 2:44 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > - ret = find_gid_port(cma_dev->device, &iboe_gid, port); > + ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL); This type of change is kind of unfortunate -- I've been on a multi-year quest to get rid of the whole "ib...cached..." set of APIs, since the part of the code that handles maintaining that cache is racy and hard to get the locking right with. It would be better if we could declare that the device driver's query GID methods had to be fast (non-sleeping) and use them directly (and device drivers could maintain a cache if they need to, maybe with a library to help them). But I guess that's a bigger project than fixing this immediate issue. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <CAG4TOxOTPPNdzpNmRkiwsjEcGEoVG1=H2e6YHddXFnAGXHFL6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] IB/cma: use cached gids [not found] ` <CAG4TOxOTPPNdzpNmRkiwsjEcGEoVG1=H2e6YHddXFnAGXHFL6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-09-11 1:37 ` Doug Ledford 0 siblings, 0 replies; 4+ messages in thread From: Doug Ledford @ 2013-09-11 1:37 UTC (permalink / raw) To: Roland Dreier Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty, Eli Cohen On 09/10/2013 09:17 PM, Roland Dreier wrote: > On Sun, Sep 8, 2013 at 2:44 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> >> - ret = find_gid_port(cma_dev->device, &iboe_gid, port); >> + ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL); > > This type of change is kind of unfortunate -- I've been on a > multi-year quest to get rid of the whole "ib...cached..." set of APIs, So I was forewarned. > since the part of the code that handles maintaining that cache is racy > and hard to get the locking right with. It would be better if we > could declare that the device driver's query GID methods had to be > fast (non-sleeping) and use them directly (and device drivers could > maintain a cache if they need to, maybe with a library to help them). > > But I guess that's a bigger project than fixing this immediate issue. There are only two solutions to this that I can see: 1) Don't go to the card 2) Go to the card, but get the entire GID table in one operation (or at most two if you need one to get the GID table length and then a second to get the entire GID table in one go) The current implementation goes to the card once to get the table length, and then once for each table entry. That's what's killing us. Personally, I don't want it ever to go to the card for this sort of operation. We should be able to keep the card's idea of GIDs and our idea of the same in sync. Getting this right might be hard, but it's still the right thing to do. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-09-11 1:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-08 21:44 [PATCH] IB/cma: use cached gids Doug Ledford
[not found] ` <681df2ae830b20b81fef0c57c41f3bbce5c8c1cc.1378676509.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-09-10 23:39 ` Hefty, Sean
2013-09-11 1:17 ` Roland Dreier
[not found] ` <CAG4TOxOTPPNdzpNmRkiwsjEcGEoVG1=H2e6YHddXFnAGXHFL6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-11 1:37 ` Doug Ledford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).