* [PATCH] mlx4: allow for 4K mtu configuration of IB ports
@ 2011-05-25 12:10 Or Gerlitz
[not found] ` <alpine.LRH.2.00.1105251507140.16147-VYr5/9ddeaGSIdy2EShu12Xnswh1EIUO@public.gmane.org>
0 siblings, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2011-05-25 12:10 UTC (permalink / raw)
To: Roland Dreier; +Cc: linux-rdma, Vladimir Sokolovsky
Since there's a dependency between the port mtu to the maximal
number of VLs the port can support - act in a loop, going down
from the highest possible number of VLs to the lowest. Use the
firmware return status as an indication for the requested number
of VLs being impossible with that mtu.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
Roland, this is an updated approach and posting for the patch posted
earlier by Vlad on which you've responded @
http://www.spinics.net/lists/linux-rdma/msg02136.html
I've attempted to address your comment and questions, let me know...
drivers/net/mlx4/port.c | 36 ++++++++++++++++++++++++++++++++----
1 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c
index 8856659..872e8d3 100644
--- a/drivers/net/mlx4/port.c
+++ b/drivers/net/mlx4/port.c
@@ -43,6 +43,10 @@
#define MLX4_VLAN_VALID (1u << 31)
#define MLX4_VLAN_MASK 0xfff
+static int mlx4_ib_set_4k_mtu;
+module_param_named(set_4k_mtu, mlx4_ib_set_4k_mtu, int, 0444);
+MODULE_PARM_DESC(set_4k_mtu, "attempt to set 4K MTU to IB ports");
+
void mlx4_init_mac_table(struct mlx4_dev *dev, struct mlx4_mac_table *table)
{
int i;
@@ -461,10 +465,20 @@ int mlx4_get_port_ib_caps(struct mlx4_dev *dev, u8 port, __be32 *caps)
return err;
}
+/* bit locations for set port command with zero op modifier */
+enum {
+ MLX4_SET_PORT_VL_CAP = 4, /* bits 7:4 */
+ MLX4_SET_PORT_MTU_CAP = 12, /* bits 15:12 */
+ MLX4_CHANGE_PORT_VL_CAP = 21,
+ MLX4_CHANGE_PORT_MTU_CAP = 22,
+};
+
+#define IBTA_MTU_4096 5
+
int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port)
{
struct mlx4_cmd_mailbox *mailbox;
- int err;
+ int err, vl_cap;
if (dev->caps.port_type[port] == MLX4_PORT_TYPE_ETH)
return 0;
@@ -474,10 +488,24 @@ int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port)
return PTR_ERR(mailbox);
memset(mailbox->buf, 0, 256);
-
((__be32 *) mailbox->buf)[1] = dev->caps.ib_port_def_cap[port];
- err = mlx4_cmd(dev, mailbox->dma, port, 0, MLX4_CMD_SET_PORT,
- MLX4_CMD_TIME_CLASS_B);
+
+ if (mlx4_ib_set_4k_mtu)
+ for (vl_cap = 8; vl_cap >= 1; vl_cap >>= 1) {
+ ((__be32 *) mailbox->buf)[0] = cpu_to_be32(
+ (1 << MLX4_CHANGE_PORT_MTU_CAP) |
+ (1 << MLX4_CHANGE_PORT_VL_CAP) |
+ (IBTA_MTU_4096 << MLX4_SET_PORT_MTU_CAP) |
+ (vl_cap << MLX4_SET_PORT_VL_CAP));
+ err = mlx4_cmd(dev, mailbox->dma, port, 0,
+ MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B);
+ if (err != -ENOMEM)
+ break;
+ }
+ else {
+ err = mlx4_cmd(dev, mailbox->dma, port, 0, MLX4_CMD_SET_PORT,
+ MLX4_CMD_TIME_CLASS_B);
+ }
mlx4_free_cmd_mailbox(dev, mailbox);
return err;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 18+ messages in thread[parent not found: <alpine.LRH.2.00.1105251507140.16147-VYr5/9ddeaGSIdy2EShu12Xnswh1EIUO@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <alpine.LRH.2.00.1105251507140.16147-VYr5/9ddeaGSIdy2EShu12Xnswh1EIUO@public.gmane.org> @ 2011-05-25 16:13 ` Roland Dreier [not found] ` <BANLkTi=bKdVXm+f+HVeg2tBzT4RBJDCN_w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Roland Dreier @ 2011-05-25 16:13 UTC (permalink / raw) To: Or Gerlitz; +Cc: linux-rdma, Vladimir Sokolovsky On Wed, May 25, 2011 at 5:10 AM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote: > Roland, this is an updated approach and posting for the patch posted > earlier by Vlad on which you've responded @ > http://www.spinics.net/lists/linux-rdma/msg02136.html > I've attempted to address your comment and questions, let me know... Thanks for looking at this again. Definitely better in terms of fewer magic numbers... however I still think needing to set this with a module parameter kind of sucks for the end user. Can we think of a better way to handle this? Is the issue that we trade off VL cap for MTU? Does anyone really care about max VL cap with 2K MTU? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTi=bKdVXm+f+HVeg2tBzT4RBJDCN_w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTi=bKdVXm+f+HVeg2tBzT4RBJDCN_w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-25 21:05 ` Or Gerlitz [not found] ` <BANLkTinDCbPXi8zS46wcVW_Yn0fxzbSikw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Or Gerlitz @ 2011-05-25 21:05 UTC (permalink / raw) To: Roland Dreier; +Cc: Or Gerlitz, linux-rdma, Vladimir Sokolovsky Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > Is the issue that we trade off VL cap for MTU? yes, this is it > [...] however I still think needing to set this with a module parameter kind of > sucks for the end user. Can we think of a better way to handle this? with the above @ hand, setting mtu cap of 4k w.o an ability of reducing that to 2k, makes the patch distruptive for users that do need eight VLs. Maybe it would be easier for the common user if turn on the module param by default. > Does anyone really care about max VL cap with 2K MTU? I'm not with you... can you elaborate a little further here? the current HW generation support four VLs with 4k mtu, newer HW might support more. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTinDCbPXi8zS46wcVW_Yn0fxzbSikw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTinDCbPXi8zS46wcVW_Yn0fxzbSikw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-25 21:27 ` Roland Dreier [not found] ` <BANLkTikrJ_3f+LwN_R=-AcwCde-KdHJhOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Roland Dreier @ 2011-05-25 21:27 UTC (permalink / raw) To: Or Gerlitz; +Cc: Or Gerlitz, linux-rdma, Vladimir Sokolovsky On Wed, May 25, 2011 at 2:05 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Does anyone really care about max VL cap with 2K MTU? > > I'm not with you... can you elaborate a little further here? the > current HW generation support four VLs with 4k mtu, newer HW might > support more. I mean is there anyone who really uses >4 VLs? Presumably the HW designers didn't think so, because they limited HW to 4 VLs with 4K MTU. At least can we make this a runtime thing? If we're able to set a port as IB vs ethernet then # of VLs seems like it should be doable too. And 4K MTU should probably be the default, since almost all users want 4K MTU vs. caring about VLs. (Probably 99% of IB users never set SL of anything) - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTikrJ_3f+LwN_R=-AcwCde-KdHJhOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTikrJ_3f+LwN_R=-AcwCde-KdHJhOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-25 21:46 ` Or Gerlitz [not found] ` <BANLkTinT5OJdP38Xq=PCtx1wgxDhjVfndA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-05-25 22:19 ` Bob Pearson 1 sibling, 1 reply; 18+ messages in thread From: Or Gerlitz @ 2011-05-25 21:46 UTC (permalink / raw) To: Roland Dreier; +Cc: Or Gerlitz, linux-rdma, Vladimir Sokolovsky Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > And 4K MTU should probably be the default, since almost all users > want 4K MTU vs. caring about VLs. (Probably 99% of IB users > never set SL of anything) I agree that we want that to be the default, I'm not sure the 99% thing is accurate, with more and more (specifically the huge ones) IB clusters that are built in some sort of 2D/3D torus, mesh or alike topologies for which routing engines such as DOR and LASH use multiple VLs to avoid credit loops. also I assume that some users (maybe < 5%) would like to enjoy 8 HW traffic classes, so if pressed to the wall, they would prefer 2k mtu with the current HW. > I mean is there anyone who really uses >4 VLs? Presumably the > HW designers didn't think so, because they limited HW to 4 VLs with 4K MTU. I'm not sure if 4 VLs are enough for all the topologies / algorithms I mentioned above, so I do prefer to leave an option to run with eight VLs. As for the HW designers comment, its always good to look forward for improvements in newer HCA drops (the patch for the CX3 series device IDs is already comitted by 31dd272e8cbb32ef31a411492cc642c363bb54b9, so one can expect for the actual cards to be coming soon as well). > At least can we make this a runtime thing? If we're able to set a > port as IB vs ethernet then # of VLs seems like it should be doable too. Here I lost you again, the policy is dictated by the module param, which whose default value should be turned on, the code that sets the mtu and VL cap is executed each time the function change by the patch is called, which in turn happens each time an IB link type is sensed or dictated for the port. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTinT5OJdP38Xq=PCtx1wgxDhjVfndA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTinT5OJdP38Xq=PCtx1wgxDhjVfndA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-25 22:07 ` Jason Gunthorpe 2011-05-25 22:10 ` Roland Dreier 1 sibling, 0 replies; 18+ messages in thread From: Jason Gunthorpe @ 2011-05-25 22:07 UTC (permalink / raw) To: Or Gerlitz; +Cc: Roland Dreier, Or Gerlitz, linux-rdma, Vladimir Sokolovsky On Thu, May 26, 2011 at 12:46:20AM +0300, Or Gerlitz wrote: > > I mean is there anyone who really uses >4 VLs? ?Presumably the > > HW designers didn't think so, because they limited HW to 4 VLs with 4K MTU. > > I'm not sure if 4 VLs are enough for all the topologies / algorithms I > mentioned above, so I do prefer to leave an option to run with eight > VLs. As for the HW designers comment, its always good to look > forward Routing algorithms only need VLs on interswitch links, not on HCA to switch links. The only use of the HCA to switch VLs is for QoS. Mesh topologies can usually be routed with only two VLs, but you need alot of SLs to make that work. IMHO it would be much nicer if the SM could control and set this choice, but since the spec doesn't have a provision for this I'm not sure what to suggest there.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTinT5OJdP38Xq=PCtx1wgxDhjVfndA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-05-25 22:07 ` Jason Gunthorpe @ 2011-05-25 22:10 ` Roland Dreier [not found] ` <BANLkTi=NkZQX2JFa99qkVpLAchRjNoYOhg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 18+ messages in thread From: Roland Dreier @ 2011-05-25 22:10 UTC (permalink / raw) To: Or Gerlitz; +Cc: Or Gerlitz, linux-rdma, Vladimir Sokolovsky On Wed, May 25, 2011 at 2:46 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> At least can we make this a runtime thing? If we're able to set a >> port as IB vs ethernet then # of VLs seems like it should be doable too. > > Here I lost you again, the policy is dictated by the module param, > which whose default value should be turned on, the code that sets the > mtu and VL cap is executed each time the function change by the patch > is called, which in turn happens each time an IB link type is sensed > or dictated for the port. I mean set the MTU port-by-port with the module loaded, the same way we are supposed to be able to do for the port type. Rather than having one global module parameter. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTi=NkZQX2JFa99qkVpLAchRjNoYOhg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTi=NkZQX2JFa99qkVpLAchRjNoYOhg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-26 6:24 ` Or Gerlitz [not found] ` <4DDDF212.4080700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Or Gerlitz @ 2011-05-26 6:24 UTC (permalink / raw) To: Roland Dreier; +Cc: Or Gerlitz, linux-rdma, Vladimir Sokolovsky Roland Dreier wrote: > I mean set the MTU port-by-port with the module loaded, the same way > we are supposed to be able to do for the port type. Rather than having > one global module parameter. The HCA has set of per port buffers, from which comes the dependency between VLs to MTU. So with this code running for each IB hca/port, we're actually doing that logic port-by-port. I assume you didn't mean let the user specify a desired MTU for each hca/port... or I'm still not fully with you? Anyway, I'd be happy to provide at least the folks that use torus/mesh and/or sophisticated QoS schemes an ability to use eight VLs with the current HW, so how about keeping the module param but with default value turned on? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <4DDDF212.4080700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <4DDDF212.4080700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2011-05-26 15:53 ` Roland Dreier [not found] ` <BANLkTimrm7bqTT8qAzxdf9tsJuv7+NPO+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Roland Dreier @ 2011-05-26 15:53 UTC (permalink / raw) To: Or Gerlitz; +Cc: Or Gerlitz, linux-rdma, Vladimir Sokolovsky On Wed, May 25, 2011 at 11:24 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote: > Roland Dreier wrote: >> >> I mean set the MTU port-by-port with the module loaded, the same way >> we are supposed to be able to do for the port type. Rather than having >> one global module parameter. > > The HCA has set of per port buffers, from which comes the dependency between > VLs to MTU. So with this code running for each IB hca/port, we're actually > doing that logic port-by-port. I assume you didn't mean let the user specify > a desired MTU for each hca/port... or I'm still not fully with you? > > Anyway, I'd be happy to provide at least the folks that use torus/mesh > and/or sophisticated QoS schemes an ability to use eight VLs with the > current HW, so how about keeping the module param but with default value > turned on? What I'm trying to say is that a global module parameter is a pain for users. Would it make sense to have a "port_type" module parameter where you could set all the ports to IB or to ethernet, and have no way to have a mix of port types or change without reloading the module? The obvious answer is no, and therefore we have mlx4_portX attributes in sysfs that are per port. MTU is the same way. For example, you suggest that CX3 won't have the same limitation of only 4 VLs with 4K MTU. In that case, think about a system with one CX2 and one CX3 -- should the CX3 be limited to 2K MTU because of CX2 limitations? Rather than having a completely different way of handling MTU, why can't we just handle it the same way as the port type, and have a sysfs attribute like "mlx4_mtuN" for each port? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTimrm7bqTT8qAzxdf9tsJuv7+NPO+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTimrm7bqTT8qAzxdf9tsJuv7+NPO+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-30 22:12 ` Or Gerlitz 0 siblings, 0 replies; 18+ messages in thread From: Or Gerlitz @ 2011-05-30 22:12 UTC (permalink / raw) To: Roland Dreier; +Cc: Or Gerlitz, linux-rdma, Vladimir Sokolovsky Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > The obvious answer is no, and therefore we have mlx4_portX attributes in > sysfs that are per port. MTU is the same way. For example, you suggest > that CX3 won't have the same limitation of only 4 VLs with 4K MTU. In > that case, think about a system with one CX2 and one CX3 -- should the > CX3 be limited to 2K MTU because of CX2 limitations? > > Rather than having a completely different way of handling MTU, why can't > we just handle it the same way as the port type, and have a sysfs attribute > like "mlx4_mtuN" for each port? okay, got that. I'd like to make another round of thinking / checking if we can make 4k mtu being the default and not configurable also for pre-CX3 devices, if yes, I guess we can avoid the per port sysfs entry, if not, I'll add that as part of the patch. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTikrJ_3f+LwN_R=-AcwCde-KdHJhOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-05-25 21:46 ` Or Gerlitz @ 2011-05-25 22:19 ` Bob Pearson 2011-05-25 23:14 ` Roland Dreier 1 sibling, 1 reply; 18+ messages in thread From: Bob Pearson @ 2011-05-25 22:19 UTC (permalink / raw) To: 'Roland Dreier', 'Or Gerlitz', Jim Schutt Cc: 'Or Gerlitz', 'linux-rdma', 'Vladimir Sokolovsky' With lash+mesh redsky required 6-7 VLs to wire up without deadlocks. I think that Jim's version uses 8 SLs but only 2VLs to work. If someone was using a torus and also wanted to support QOS and also wanted to separate multicast and management on a separate VL to be absolutely sure that there is no possibility of a deadlock you might end up with #QOS * 2 + 1 which would be 5 using the current algorithm. -----Original Message----- From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Roland Dreier Sent: Wednesday, May 25, 2011 4:28 PM To: Or Gerlitz Cc: Or Gerlitz; linux-rdma; Vladimir Sokolovsky Subject: Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports On Wed, May 25, 2011 at 2:05 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Does anyone really care about max VL cap with 2K MTU? > > I'm not with you... can you elaborate a little further here? the > current HW generation support four VLs with 4k mtu, newer HW might > support more. I mean is there anyone who really uses >4 VLs? Presumably the HW designers didn't think so, because they limited HW to 4 VLs with 4K MTU. At least can we make this a runtime thing? If we're able to set a port as IB vs ethernet then # of VLs seems like it should be doable too. And 4K MTU should probably be the default, since almost all users want 4K MTU vs. caring about VLs. (Probably 99% of IB users never set SL of anything) - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports 2011-05-25 22:19 ` Bob Pearson @ 2011-05-25 23:14 ` Roland Dreier [not found] ` <BANLkTikLeWHYLKrX8O=syt-Q+b6LP2PCtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Roland Dreier @ 2011-05-25 23:14 UTC (permalink / raw) To: Bob Pearson Cc: Or Gerlitz, Jim Schutt, Or Gerlitz, linux-rdma, Vladimir Sokolovsky On Wed, May 25, 2011 at 3:19 PM, Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote: > With lash+mesh redsky required 6-7 VLs to wire up without deadlocks. I think > that Jim's version uses 8 SLs but only 2VLs to work. > If someone was using a torus and also wanted to support QOS and also wanted > to separate multicast and management on a separate VL to be absolutely sure > that there is no possibility of a deadlock you might end up with #QOS * 2 + > 1 which would be 5 using the current algorithm. But again you don't need all those VLs on the HCAs' links, do you? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTikLeWHYLKrX8O=syt-Q+b6LP2PCtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTikLeWHYLKrX8O=syt-Q+b6LP2PCtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-26 6:18 ` Or Gerlitz [not found] ` <4DDDF0B1.6090305-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> [not found] ` <17495_1306419903_p4QEMqSb018281_4DDE6291.1090809@sandia.gov> 0 siblings, 2 replies; 18+ messages in thread From: Or Gerlitz @ 2011-05-26 6:18 UTC (permalink / raw) To: Roland Dreier, Jim Schutt Cc: Bob Pearson, Or Gerlitz, linux-rdma, Vladimir Sokolovsky, Alex Netes Roland Dreier wrote: > Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote: >> With lash+mesh redsky required 6-7 VLs to wire up without deadlocks. I think >> that Jim's version uses 8 SLs but only 2VLs to work. >> If someone was using a torus and also wanted to support QOS and also wanted >> to separate multicast and management on a separate VL to be absolutely sure >> that there is no possibility of a deadlock you might end up with #QOS * 2 + >> 1 which would be 5 using the current algorithm. > But again you don't need all those VLs on the HCAs' links, do you? Jason Gunthorpe wrote: > Routing algorithms only need VLs on interswitch links, not on HCA to > switch links. The only use of the HCA to switch VLs is for QoS. Mesh > topologies can usually be routed with only two VLs, but you need alot > of SLs to make that work. Bob, Jim, Alex I wasn't sure if the SL-to-VL mapping done by open SM is dictated by the directives @ the user config file or if the routing algorithm is "VL aware" but the routing engine? if the latter, do interswitch links use different mapping vs. HCA - switch links? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <4DDDF0B1.6090305-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <4DDDF0B1.6090305-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2011-05-26 14:24 ` Jim Schutt 0 siblings, 0 replies; 18+ messages in thread From: Jim Schutt @ 2011-05-26 14:24 UTC (permalink / raw) To: Or Gerlitz Cc: Roland Dreier, Bob Pearson, Or Gerlitz, linux-rdma, Vladimir Sokolovsky, Alex Netes Or Gerlitz wrote: > Roland Dreier wrote: >> Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote: >>> With lash+mesh redsky required 6-7 VLs to wire up without deadlocks. >>> I think >>> that Jim's version uses 8 SLs but only 2VLs to work. >>> If someone was using a torus and also wanted to support QOS and also >>> wanted >>> to separate multicast and management on a separate VL to be >>> absolutely sure >>> that there is no possibility of a deadlock you might end up with #QOS >>> * 2 + >>> 1 which would be 5 using the current algorithm. > >> But again you don't need all those VLs on the HCAs' links, do you? > > Jason Gunthorpe wrote: >> Routing algorithms only need VLs on interswitch links, not on HCA to >> switch links. The only use of the HCA to switch VLs is for QoS. Mesh >> topologies can usually be routed with only two VLs, but you need alot >> of SLs to make that work. > > Bob, Jim, Alex > > I wasn't sure if the SL-to-VL mapping done by open SM is dictated by the > directives @ the user config file or if the routing algorithm is "VL > aware" but the routing engine? if the latter, do interswitch links use > different mapping vs. HCA - switch links? FWIW, the torus-2QoS routing engine uses VL bit 0 for torus deadlock avoidance, VL bit 1 to route around a missing switch without deadlocks, and VL bit 2 to provide two QoS levels. It needs the port dependence of the SL2VL maps to do this in switches. The interswitch and HCAs use the same mapping, but only VL bit 2 is needed on HCAs, to provide the QoS levels. I chose that bit usage because it seemed the proper ordering of capabilities if there are fewer than 8 data VLs available - basic deadlock avoidance is most important; some QoS is nice to have but not that useful if the fabric can deadlock. Is that what you were asking, at least WRT. torus-2QoS? -- Jim > > Or. > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <17495_1306419903_p4QEMqSb018281_4DDE6291.1090809@sandia.gov>]
[parent not found: <17495_1306419903_p4QEMqSb018281_4DDE6291.1090809-4OHPYypu0djtX7QSmKvirg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <17495_1306419903_p4QEMqSb018281_4DDE6291.1090809-4OHPYypu0djtX7QSmKvirg@public.gmane.org> @ 2011-05-26 14:30 ` Jim Schutt [not found] ` <4DDE63F9.8060502-4OHPYypu0djtX7QSmKvirg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Jim Schutt @ 2011-05-26 14:30 UTC (permalink / raw) To: Jim Schutt Cc: Or Gerlitz, Roland Dreier, Bob Pearson, Or Gerlitz, linux-rdma, Vladimir Sokolovsky, Alex Netes Jim Schutt wrote: > Or Gerlitz wrote: >> Roland Dreier wrote: >>> Bob Pearson <rpearson-klaOcWyJdxkshyMvu7JE4pqQE7yCjDx5@public.gmane.org> wrote: >>>> With lash+mesh redsky required 6-7 VLs to wire up without deadlocks. >>>> I think >>>> that Jim's version uses 8 SLs but only 2VLs to work. >>>> If someone was using a torus and also wanted to support QOS and also >>>> wanted >>>> to separate multicast and management on a separate VL to be >>>> absolutely sure >>>> that there is no possibility of a deadlock you might end up with >>>> #QOS * 2 + >>>> 1 which would be 5 using the current algorithm. >> >>> But again you don't need all those VLs on the HCAs' links, do you? >> >> Jason Gunthorpe wrote: >>> Routing algorithms only need VLs on interswitch links, not on HCA to >>> switch links. The only use of the HCA to switch VLs is for QoS. Mesh >>> topologies can usually be routed with only two VLs, but you need alot >>> of SLs to make that work. >> >> Bob, Jim, Alex >> >> I wasn't sure if the SL-to-VL mapping done by open SM is dictated by >> the directives @ the user config file or if the routing algorithm is >> "VL aware" but the routing engine? if the latter, do interswitch links >> use different mapping vs. HCA - switch links? > > FWIW, the torus-2QoS routing engine uses VL bit 0 for torus deadlock > avoidance, VL bit 1 to route around a missing switch without deadlocks, > and VL bit 2 to provide two QoS levels. It needs the port dependence > of the SL2VL maps to do this in switches. > > The interswitch and HCAs use the same mapping, but only VL bit 2 > is needed on HCAs, to provide the QoS levels. It occurred to me as soon I sent the above that there's no good reason to insist that the VL usage is the same for both interswitch links, and switch-CA links. Do I need to change this? -- Jim > > I chose that bit usage because it seemed the proper ordering of > capabilities if there are fewer than 8 data VLs available - basic > deadlock avoidance is most important; some QoS is nice to have but > not that useful if the fabric can deadlock. > > Is that what you were asking, at least WRT. torus-2QoS? > > -- Jim > >> >> Or. >> >> > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <4DDE63F9.8060502-4OHPYypu0djtX7QSmKvirg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <4DDE63F9.8060502-4OHPYypu0djtX7QSmKvirg@public.gmane.org> @ 2011-05-26 15:56 ` Roland Dreier [not found] ` <BANLkTimGioibPTUYV046ObOPhHntHxx72w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Roland Dreier @ 2011-05-26 15:56 UTC (permalink / raw) To: Jim Schutt Cc: Or Gerlitz, Bob Pearson, Or Gerlitz, linux-rdma, Vladimir Sokolovsky, Alex Netes On Thu, May 26, 2011 at 7:30 AM, Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org> wrote: > It occurred to me as soon I sent the above that there's no > good reason to insist that the VL usage is the same for both > interswitch links, and switch-CA links. > > Do I need to change this? I don't think changing this is a high priority, since it's a pretty small slice of the world, and QoS on the edge links probably is important to an even smaller slice, but IMHO it would be better to give QoS to HCAs that only support 4 VLs by using a different SL2VL table for links to CAs. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTimGioibPTUYV046ObOPhHntHxx72w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTimGioibPTUYV046ObOPhHntHxx72w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-30 22:07 ` Or Gerlitz [not found] ` <BANLkTinK+YHUyF4ThHQAwFeZ4FNswpY-tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Or Gerlitz @ 2011-05-30 22:07 UTC (permalink / raw) To: Roland Dreier, Jim Schutt Cc: Or Gerlitz, Bob Pearson, linux-rdma, Vladimir Sokolovsky, Alex Netes Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org> wrote: >> no good reason to insist that the VL usage is the same for both >> interswitch links, and switch-CA links. Do I need to change this? > I don't think changing this is a high priority, since it's a pretty small > slice of the world, and QoS on the edge links probably is important > to an even smaller slice, but IMHO it would be better to give QoS to > HCAs that only support 4 VLs by using a different SL2VL table for links to CAs. Jim, AFAIK, the way opensm applies an SL-to-VL mapping specification (e.g dictated by the admin or maybe your routing engine) on a specific link is by modulation on the number of active VLs for that link - e.g say the ID mapping was required and there are two VLs for that link, so we'll have SL-to-VL of 0->0 1->1 2->0 3->1 and so on. So in that respect, I wasn't sure what's the change here for you. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <BANLkTinK+YHUyF4ThHQAwFeZ4FNswpY-tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] mlx4: allow for 4K mtu configuration of IB ports [not found] ` <BANLkTinK+YHUyF4ThHQAwFeZ4FNswpY-tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-05-31 16:11 ` Jim Schutt 0 siblings, 0 replies; 18+ messages in thread From: Jim Schutt @ 2011-05-31 16:11 UTC (permalink / raw) To: Or Gerlitz Cc: Roland Dreier, Or Gerlitz, Bob Pearson, linux-rdma, Vladimir Sokolovsky, Alex Netes Hi Or, Or Gerlitz wrote: > Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org> wrote: >>> no good reason to insist that the VL usage is the same for both >>> interswitch links, and switch-CA links. Do I need to change this? > >> I don't think changing this is a high priority, since it's a pretty small >> slice of the world, and QoS on the edge links probably is important >> to an even smaller slice, but IMHO it would be better to give QoS to >> HCAs that only support 4 VLs by using a different SL2VL table for links to CAs. > > Jim, > > AFAIK, the way opensm applies an SL-to-VL mapping specification (e.g > dictated by the admin or maybe your routing engine) on a specific link > is by modulation on the number of active VLs for that link - e.g say > the ID mapping was required and there are two VLs for that link, so > we'll have SL-to-VL of 0->0 1->1 2->0 3->1 and so on. So in that > respect, I wasn't sure what's the change here for you. Hmmm, can you tell me where such remapping happens? What I know about so far is the code in sl2vl_update_table(), which AFAICS truncates VL values in the SL2VL maps provided by the routing engine to fit into the number of VLs supported by a port. Am I missing something else? torus-2QoS will currently only use VL values 0 and 4 in the SL2VL maps it generates for CA ports; i.e. any SL maps either to VL 0 or VL 4, depending on QoS level. So sl2vl_update_table() would truncate all those VL 4 values to 0 in the case of ports that support fewer than 8 data VLs. What I'm suggesting is that I need to make torus-2QoS generate SL2VL maps that only reference VLs 0 or 1 for CA ports, in order to allow it to support 2 QoS levels for CA ports that don't support 8 data VLs. -- Jim > > Or. > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-05-31 16:11 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-25 12:10 [PATCH] mlx4: allow for 4K mtu configuration of IB ports Or Gerlitz
[not found] ` <alpine.LRH.2.00.1105251507140.16147-VYr5/9ddeaGSIdy2EShu12Xnswh1EIUO@public.gmane.org>
2011-05-25 16:13 ` Roland Dreier
[not found] ` <BANLkTi=bKdVXm+f+HVeg2tBzT4RBJDCN_w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-25 21:05 ` Or Gerlitz
[not found] ` <BANLkTinDCbPXi8zS46wcVW_Yn0fxzbSikw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-25 21:27 ` Roland Dreier
[not found] ` <BANLkTikrJ_3f+LwN_R=-AcwCde-KdHJhOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-25 21:46 ` Or Gerlitz
[not found] ` <BANLkTinT5OJdP38Xq=PCtx1wgxDhjVfndA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-25 22:07 ` Jason Gunthorpe
2011-05-25 22:10 ` Roland Dreier
[not found] ` <BANLkTi=NkZQX2JFa99qkVpLAchRjNoYOhg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-26 6:24 ` Or Gerlitz
[not found] ` <4DDDF212.4080700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-05-26 15:53 ` Roland Dreier
[not found] ` <BANLkTimrm7bqTT8qAzxdf9tsJuv7+NPO+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-30 22:12 ` Or Gerlitz
2011-05-25 22:19 ` Bob Pearson
2011-05-25 23:14 ` Roland Dreier
[not found] ` <BANLkTikLeWHYLKrX8O=syt-Q+b6LP2PCtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-26 6:18 ` Or Gerlitz
[not found] ` <4DDDF0B1.6090305-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-05-26 14:24 ` Jim Schutt
[not found] ` <17495_1306419903_p4QEMqSb018281_4DDE6291.1090809@sandia.gov>
[not found] ` <17495_1306419903_p4QEMqSb018281_4DDE6291.1090809-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2011-05-26 14:30 ` Jim Schutt
[not found] ` <4DDE63F9.8060502-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2011-05-26 15:56 ` Roland Dreier
[not found] ` <BANLkTimGioibPTUYV046ObOPhHntHxx72w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-30 22:07 ` Or Gerlitz
[not found] ` <BANLkTinK+YHUyF4ThHQAwFeZ4FNswpY-tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-31 16:11 ` Jim Schutt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox