From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46CF8C43381 for ; Thu, 28 Feb 2019 16:24:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 030F1218C3 for ; Thu, 28 Feb 2019 16:24:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=netronome-com.20150623.gappssmtp.com header.i=@netronome-com.20150623.gappssmtp.com header.b="aYX2f9iA" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732292AbfB1QYM (ORCPT ); Thu, 28 Feb 2019 11:24:12 -0500 Received: from mail-qt1-f194.google.com ([209.85.160.194]:43361 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731934AbfB1QYM (ORCPT ); Thu, 28 Feb 2019 11:24:12 -0500 Received: by mail-qt1-f194.google.com with SMTP id y4so24128105qtc.10 for ; Thu, 28 Feb 2019 08:24:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=Jv2SOmVZCOACh5UNBXyp0zYwKwALh9pBbU5zymNQmFg=; b=aYX2f9iAAnbaQJ8LfP0cwaTtnt6+O1ylVtCrnZbn+Qu5n0DFvwqNYKpFATKuFhjn4V PlNmZecBFFuZo+iHHMwuoOIha14S6GSqUCsbpoFFdcyD2nB/c/25Ujxvc8oiY7y9EDBf qZvRkH1AVSyxH2XP/UrkJNw7qLocuaF1PnaSv7Zmh056WNznzj2T47KvCmG2FWqCUjXA hNrfbqW6/Tivla2BmX7ikSj3Wi+l28jBUWSHhT9tw5tlInam38j3bNnNIcgE07ih8DQk u3pdbWHDGcC68vw7Nd8nXJkXqKgCrhBOVqEQi0z6ZExew65zjPC494qOWLAArD398kr5 hPPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=Jv2SOmVZCOACh5UNBXyp0zYwKwALh9pBbU5zymNQmFg=; b=VEpueUgK4zwI/lliFdql65cm5Ivg+oiu48GcyaBnUrpUyBa6fz5TV+3RLhO1CHP2nn IAvNRba7yz5GsjbVX34JzKRGBwSISpJ3W/clz8VzdMKSbelgW/4B49rhGtyAlT/V3ZRG ZyMnHoHe2mTuDp4rKV06l62a/RWmXWmpeqxRdRgVWcXRYMRy2iFfCn1Y1JyhEEW/w+dZ t9kniR24EqisOvUi2brz7BbbCS9y6G5wN+gW3W+f7mcbd7FibMF9//qzlcr0dbmGEzUv lNBp4vYTTcGlWEyGJg4i8KpIdqba4hfh1kac/BLr9tU4pXeUaIxb6Fd7I+TlIwdYwRdL D0Uw== X-Gm-Message-State: APjAAAU1R2+y10SbqqDzKmoYOiTtk8H58S4O8IbLDIdchbf/ILmghPMY 14+kX2Ag5nAtzv0SYazHegLOnQ== X-Google-Smtp-Source: APXvYqwm8BhN/+zAtqRA5K6C0AOll+c4wNDa/DrLdCQOWNfu8cFCYfoBvhhWsvsFfU83OYdliElseQ== X-Received: by 2002:a0c:91e1:: with SMTP id r30mr7173583qvr.136.1551371050426; Thu, 28 Feb 2019 08:24:10 -0800 (PST) Received: from cakuba.netronome.com ([66.60.152.14]) by smtp.gmail.com with ESMTPSA id z6sm8679691qtb.67.2019.02.28.08.24.09 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 28 Feb 2019 08:24:10 -0800 (PST) Date: Thu, 28 Feb 2019 08:24:04 -0800 From: Jakub Kicinski To: Jiri Pirko Cc: davem@davemloft.net, oss-drivers@netronome.com, netdev@vger.kernel.org, parav@mellanox.com, jgg@mellanox.com Subject: Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports Message-ID: <20190228082404.5b6d1061@cakuba.netronome.com> In-Reply-To: <20190228085624.GD2324@nanopsycho.orion> References: <20190226182436.23811-1-jakub.kicinski@netronome.com> <20190226182436.23811-5-jakub.kicinski@netronome.com> <20190227123753.GB2240@nanopsycho> <20190227103000.6ea6f7c0@cakuba.netronome.com> <20190228085624.GD2324@nanopsycho.orion> Organization: Netronome Systems, Ltd. MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, 28 Feb 2019 09:56:24 +0100, Jiri Pirko wrote: > Wed, Feb 27, 2019 at 07:30:00PM CET, jakub.kicinski@netronome.com wrote: > >On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote: > >> Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote: > >> >PCI endpoint corresponds to a PCI device, but such device > >> >can have one more more logical device ports associated with it. > >> >We need a way to distinguish those. Add a PCI subport in the > >> >dumps and print the info in phys_port_name appropriately. > >> > > >> >This is not equivalent to port splitting, there is no split > >> >group. It's just a way of representing multiple netdevs on > >> >a single PCI function. > >> > > >> >Note that the quality of being multiport pertains only to > >> >the PCI function itself. A PF having multiple netdevs does > >> >not mean that its VFs will also have multiple, or that VFs > >> >are associated with any particular port of a multiport VF. > >> > >> We've been discussing the problem of subport (we call it "subfunction" > >> or "SF") for some time internally. Turned out, this is probably harder > >> task to model. Please prove me wrong. > >> > >> The nature of VF makes it a logically separate entity. It has a separate > >> PCI address, it should therefore have a separate devlink instance. > >> You can pass it through to VM, then the same devlink instance should be > >> created inside the VM and disappear from the host. > > > >Depends what a devlink instance represents :/ On one hand you may want > >to create an instance for a VF to allow it to spawn soft ports, on the > >other you may want to group multiple functions together. > > > >IOW if devlink instance is for an ASIC, there should be one per device > >per host. So if we start connecting multiple functions (PFs and/or VFs) > >to one host we should probably introduce the notion of devlink aliases > >or some such (so that multiple bus addresses can target the same > > Hmm. Like VF address -> PF address alias? That would be confusing to see > eswitch ports under VF devlink instance... I probably did not get you > right. No eswitch ports under VF, more in case of mutli-PF. Bus addresses of all PFs aliasing to the same devlink instance. > >devlink instance). Those less pipelined NICs can forward between > >ports, but still want a function per port (otherwise user space > >sometimes gets confused). If we have multiple functions which are on > >the same "switchid" they should have a single devlink instance if you > >ask me. That instance will have all the ports of the device. > > Okay, that makes sense. But the question it, can the same devlink > instance contain ports that does not have "Switchid"? No strong preference if switchid is different. To me devlink is an ASIC instance, if the multiport card is constructed by copy-pasting the same IP twice onto a die, and the ports really are completely separate, there is no reason to require single devlink instance. > I think it would be beneficial to have the switchid shown for devlink > ports too. Then it is clean that the devlink ports with the same > switchid belong to the same switch, and other ports under the same > devlink instance (like PF itself) is separate, but still under the same > ASIC. Sure, you mean in terms of UI - user space can do a link dump or get that from sysfs, right? > >You say disappear from the host - what do you mean. Are you referring > >to the VF port disappearing? But on the switch the port is still > > No, VF itself. eswitch port will be still there on the host. > > > >there, and you should show the subports on the PF side IMHO. Devlink > >ports should allow users to understand the topology of the switch. > > What do you mean by "topology"? Mostly which ports are part of the switch and what's their "flavour". Also (less importantly) which host netdevs are "peers" of eswitch ports. > >Is spawning VMDq sub-instances the only thing we can think of that VMs > >may want to do? Are there any other uses? > > > >> SF (or subport) feels similar to that. Basically it is exactly the same > >> thing as VF, only does reside under PF PCI function. > >> > >> That is why I think, for sake of consistency, it should have a separate > >> devlink entity as well. The problem is correct sysfs modelling and > >> devlink handle derived from that. Parav is working on a simple soft > >> bus for this purpose called "subbus". There is a RFC floating around on > >> Mellanox internal mailing list, looks like it is time to send it > >> upstream. > >> > >> Then each PF driver which have SFs would register subbus devices > >> according to SFs/subports and they would be properly handled by bus > >> probe, devlink and devlink port and netdev instances created. > >> > >> Ccing Parav and Jason. > > > >You guys come from the RDMA side of the world, with which I'm less > >familiar, and the soft bus + spawning devices seems to be a popular > >design there. Could you describe the advantages of that model for > >the sake of the netdev-only folks? :) > > I'll try to draw some ascii art :) Yess :) > >Another term that gets thrown into the mix here is mediated devices, > >right? If you wanna pass the sub-spawn-soft-port to a VM. Or run > >DPDK on some queues. > > > >To state the obvious AF_XDP and macvlan offload were are previous > >answers to some of those use cases. What is the forwarding model > >for those subports? Are we going to allow flower rules from VMs? > >Is it going to be dst MAC only? Or is the hypervisor going to forward > >as it sees appropriate (OvS + "repr"/port netdev)?