From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Subject: Re: [PATCH RFC 0/3] Support standard SRIOV configuration for IB VFs Date: Mon, 25 May 2015 23:58:18 -0600 Message-ID: <20150526055818.GB22461@obsidianresearch.com> References: <1432225447-6536-1-git-send-email-ogerlitz@mellanox.com> <1432226406.28905.22.camel@redhat.com> <1432242708.28905.73.camel@redhat.com> <20150525211433.GA9186@obsidianresearch.com> <20150525223235.GA9858@obsidianresearch.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Or Gerlitz Cc: Doug Ledford , "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" List-Id: linux-rdma@vger.kernel.org On Tue, May 26, 2015 at 06:33:15AM +0300, Or Gerlitz wrote: > > This allows generic code to work with the LLADDR - for instance 'ip > > link set vf mac' should have checked the size of the IFLA_ADDRESS and > > demanded that the address argument is the same number of bytes. It is > > very broken the command happily accepts an 8 byte and 6 byte argument > > for the same device. > > OK, so per your view, the existing kernel code for this flow is broken > and you resist my attempt to use it as is, and What does that mean? The current kernel code that uses this interface is all ethernet drivers. They use 6 bytes, in exactly the same format as the LLADDR. Do you see something different? And yes, the kernel code for this interface is very much inconsistent with the rest of netlink. Carrying a LLADDR in a fixed and unsized 32 byte array and calling it a 'mac' is awful. Looking through the comments from 2010 on the patch that introduced this function, it is pretty clear that the original author did not understand netlink. The userspace side in iproute2 is even worse: static void print_vfinfo(FILE *fp, struct rtattr *vfinfo) { ... fprintf(fp, "\n vf %d MAC %s", vf_mac->vf, ll_addr_n2a((unsigned char *)&vf_mac->mac, ETH_ALEN, 0, b1, sizeof(b1))); Hard coded ETH_ALEN! For a 32 byte array! BLECK! And the set side: static int iplink_parse_vf(int vf, int *argcp, char ***argvp, struct iplink_req *req, int dev_index) { ... len = ll_addr_a2n((char *)ivm.mac, 32, *argv); if (len < 0) return -1; addattr_l(&req->n, sizeof(*req), IFLA_VF_MAC, &ivm, sizeof(ivm)); It just passes random stack garbage into the kernel if the user doesn't provide 32 bytes on the command line. BLECK! The kernel didn't even validate the *length* of address sub attribute until recently: https://patchwork.ozlabs.org/patch/436901/ So, to your question yes, I think it is broken / horribly written. But we can live with it, we don't need to radically change it. So. looking at this interface there are two options for the 32 byte fixed array: 1) 'mac' is some arbitary interface specific thing with no relation to anything else in netlink 2) 'mac' is a LLADDR, and must match the construction of IFLA_ADDRESS Speaking generally - netlink is a general purpose interface - #2 is by far the saner of the two choices. Why? Because 'ip' needs to know the length of the address for the two places noted above. Using the same length as IFLA_ADDRESS is simple and 100% general. Using a lookup on the device type to guess the address length is 8 because it is infiniband, but 6 for ethernet, and who knows what if the tool is old and doesn't recognize the device type.. Crappy. Not The Netlink Way. Thus, the very best choice (at this point) is to use the length of the IFLA_ADDRESS (aka dev->addr_len), which is completely general and works for every single device choice. Is #1 possible? Yes. But linux-rdma is the wrong place to make that choice. Ask netdev. [ .. if we could go back in time fixing the original patch to use a *sized* array for IFLA_VF_MAC would have been ideal, and made #1 much more desirable, but that is not straightforward now, and we have to live with the bad choices someone else made. Try not to add to this crap .. ] > > Yes, it is ugly that the PF side's ndo_get_vf_config cannot return the > > same 20 byte address of the VF's ipoib interface, but I think that is > > less ugly than forcing a different address format just for the vf calls. > > you claim that what I propose is uglier from the fact that the PF can't > by no means return the 20 VF's IPoIB address and it's OK if I only let > the PF configure 20 bytes with part (say four) of them being arbitrary > and only have consistent 20B get/set by the PF. > > Would you be happier if the ipoib ndo_set_vf_mac ndo be > > 1. getting 20B from user-space and treating 16 of them as the VF > subnet-prefix (8B) + vGUID (8B) > > 2. checking that the subnet-prefix is correct > > 3. provision the vGUID through PF driver / the verb I proposed for the VF > > ??? > > on the way back, for the get_vf_config > > 1. read the VF vGUID from the PF IB driver through the verbs > > 2. add the port subnet prefix > > 3. return 20B to user-space > > ??? Yes, absolutely. That maintains the general invariant for LLADDR and the broad integrity of the existing netlink design. Using anything else is also short sited. Current hardware is restricted to a randomized QPN, and current IBA is restricted to a single GID prefix: But are you *absolutely* sure that will be the case forever? This is a UAPI, after all. I'm not: For instance controlling/choosing the QPN solves real issues with live migration of VF IPoIB - the need to invalidate remote ND entires when the QPN changes on migration. > I am not @ the point to change start changing this specific netlink flow. We are not changing anything, we are trying to guess what netdev wants to use the fixed 32 byte array called 'mac' for, when we are not going to store a MAC in it. This is new territory, never been done before. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html