From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66C5DC43381 for ; Mon, 18 Mar 2019 12:31:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 208D620854 for ; Mon, 18 Mar 2019 12:31:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=resnulli-us.20150623.gappssmtp.com header.i=@resnulli-us.20150623.gappssmtp.com header.b="m0b3xt/I" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726780AbfCRMbk (ORCPT ); Mon, 18 Mar 2019 08:31:40 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:40848 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726504AbfCRMbk (ORCPT ); Mon, 18 Mar 2019 08:31:40 -0400 Received: by mail-wm1-f67.google.com with SMTP id u10so10090848wmj.5 for ; Mon, 18 Mar 2019 05:31:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=resnulli-us.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=3qB+UYpcyZaFQ3sRXDSNON1XSS0vy+jOK6wqmv0tqHg=; b=m0b3xt/I9jQYIxE3JKwiqnXeVYy9AhnQtHkdibgDiIzQFcuZ7c2HxXshxN/eJNm0ml liTCUilaBgPJ2FIzf9Rbq689VtzpnGFJrKjOl7/QNpVo5Id+nMcwDQYGOF6yPiC4WFB+ /NSAsGwqrx/CgR9lGH18kx/5rjE0VvueYPZwecOy/niWXz62KbglpUoqjFcQpTRzkRqB AiwEj8rJ/BfC5OoAh9HRJ7iNyNTUWgTSvmzGydLAd+AI0o/Yf2pKM1+Wy7WeTyc/2ay0 6zKVCyzT7h4nTDvt2P5SnW4tBzwMtglPhtTyLik1w3+7vQpcS/3IH3SYUrkD+HRkaNoq yuBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=3qB+UYpcyZaFQ3sRXDSNON1XSS0vy+jOK6wqmv0tqHg=; b=ZpdSFxj2vvGMzuCS/B+YJiNPHKvsaQ9jMqLqfyaJ65JGSVjpObKCjKQmlV95DCVYGQ y5/8fxypumktkrljJZ/momHbji4b0mtQuZ0DbVIOE9jeUO1mS0kCMnFjxhDTZzqEIunk uVSz05G9Akq3YwMZr2L5oLZgiyulX00a/m1LRASghHRJNTADH2s97lnjrcdgMyncNqUp Jq0U8w64OPwZDTD/9AlpoGGpqIYQRGA+IFHTkxPI7Hv/dfm8DzW0jXHv5ePcovy+HiLx FuOzBoZ2pkF6VL+iZs9q8yOACZxL3yAJj688zgS/v0p5Evx6Ua43Fjzw2RsnEJCDXAyz k6ag== X-Gm-Message-State: APjAAAVC8LmJMJ6PU+lQ7j3E1E7Fx+igH0zjgPAb2wLmGugIFRfQO910 bHOlwWs/nuZLhlj9hTRiBI4Hgg== X-Google-Smtp-Source: APXvYqzxHHw3SExs4sLIH+QvIBqgIRJ+ictGeMNcWMTPHQloXU6B1X2Bq0U8QlD4CyliSzn7NtYlvQ== X-Received: by 2002:a05:600c:218:: with SMTP id 24mr4118654wmi.144.1552912297194; Mon, 18 Mar 2019 05:31:37 -0700 (PDT) Received: from localhost (ip-94-113-125-71.net.upcbroadband.cz. [94.113.125.71]) by smtp.gmail.com with ESMTPSA id u14sm9681615wrr.42.2019.03.18.05.31.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Mar 2019 05:31:36 -0700 (PDT) Date: Mon, 18 Mar 2019 13:21:05 +0100 From: Jiri Pirko To: Parav Pandit Cc: "Samudrala, Sridhar" , Jakub Kicinski , "davem@davemloft.net" , "netdev@vger.kernel.org" , "oss-drivers@netronome.com" Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports Message-ID: <20190318122105.GH2270@nanopsycho> References: <20190314150945.031d1b08@cakuba.netronome.com> <20190314163915.24fd2481@cakuba.netronome.com> <4436da3d-4b99-f792-8e77-695d5958794d@intel.com> <20190315200814.GD2305@nanopsycho> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Fri, Mar 15, 2019 at 10:59:33PM CET, parav@mellanox.com wrote: > > >> -----Original Message----- >> From: Jiri Pirko >> Sent: Friday, March 15, 2019 3:08 PM >> To: Parav Pandit >> Cc: Samudrala, Sridhar ; Jakub Kicinski >> ; davem@davemloft.net; >> netdev@vger.kernel.org; oss-drivers@netronome.com >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI >> ports >> >> Fri, Mar 15, 2019 at 04:32:24PM CET, parav@mellanox.com wrote: >> > >> > >> >> -----Original Message----- >> >> From: Samudrala, Sridhar >> >> Sent: Friday, March 15, 2019 12:58 AM >> >> To: Parav Pandit ; Jakub Kicinski >> >> >> >> Cc: Jiri Pirko ; davem@davemloft.net; >> >> netdev@vger.kernel.org; oss-drivers@netronome.com >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on >> >> devlink PCI ports >> >> >> >> >> >> On 3/14/2019 7:40 PM, Parav Pandit wrote: >> >> > >> >> > >> >> >> -----Original Message----- >> >> >> From: Samudrala, Sridhar >> >> >> Sent: Thursday, March 14, 2019 9:16 PM >> >> >> To: Parav Pandit ; Jakub Kicinski >> >> >> >> >> >> Cc: Jiri Pirko ; davem@davemloft.net; >> >> >> netdev@vger.kernel.org; oss-drivers@netronome.com >> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on >> >> >> devlink PCI ports >> >> >> >> >> >> >> >> >> >> >> >> On 3/14/2019 6:28 PM, Parav Pandit wrote: >> >> >>> >> >> >>> >> >> >>>> -----Original Message----- >> >> >>>> From: Jakub Kicinski >> >> >>>> Sent: Thursday, March 14, 2019 6:39 PM >> >> >>>> To: Parav Pandit >> >> >>>> Cc: Jiri Pirko ; davem@davemloft.net; >> >> >>>> netdev@vger.kernel.org; oss-drivers@netronome.com >> >> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on >> >> >>>> devlink PCI ports >> >> >>>> >> >> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote: >> >> >>>>>>> Then instances of flavour pci_vf are going to appear in the >> >> >>>>>>> same devlink instance. Those are the switch ports: >> >> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0 >> >> >>>>>>> flavour pci_vf pf 0 vf 0 >> >> >>>>>>> switch_id 00154d130d2f peer >> >> >>>>>>> pci/0000:05:10.1/0 >> >> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0 >> >> >>>>>>> flavour pci_vf pf 0 vf 0 subport 1 >> >> >>>>>>> switch_id 00154d130d2f peer >> >> >>>>>>> pci/0000:05:10.1/1 >> >> >>>>>>> >> >> >>>>>>> With that, peers are going to appear too, and those are the >> >> >>>>>>> actual VF/VF >> >> >>>>>>> subport: >> >> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host >> >> >>>>>>> peer pci/0000:05:00.0/10002 >> >> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host >> >> >>>>>>> peer pci/0000:05:00.0/10003 >> >> >>>>>>> >> >> >>>>>>> Later you can push this VF along with all subports to VM. So >> >> >>>>>>> in VM, you are going to see the VF like this: >> >> >>>>>>> $ devlink dev >> >> >>>>>>> pci/0000:00:08.0 >> >> >>>>>>> $ devlink port >> >> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host >> >> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host >> >> >>>>>>> >> >> >>>>>>> And back to your question of how are they connected in eswitch. >> >> >>>>>>> That is totally up to the original user John who did the creation. >> >> >>>>>>> He is in charge of the eswitch on baremetal, he would >> >> >>>>>>> configure the forwarding however he likes. >> >> >>>>>> >> >> >>>>>> Ack, so I think you're saying VM has to communicate to the >> >> >>>>>> cloud environment to have this provisioned using some service >> >> >>>>>> API, not a kernel API. That's what I wanted to confirm. >> >> >>>>>> >> >> >>>>>> I don't see any benefit to having the "host ports" under >> >> >>>>>> devlink, as such I think it's a matter of preference. >> >> >>>>> >> >> >>>>> We need 'host ports' to configure parameters of this host port >> >> >>>>> which is not exposed by the rep-netdev. >> >> >>>>> Such as mac address. >> >> >>>> >> >> >>>> Please look at the quote of what Jiri wrote above - the host >> >> >>>> port gets passed to the VM, you can't use it as a handle to set the >> MAC. >> >> >>>> >> >> >>>> The way to set the MAC remains: >> >> >>>> >> >> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr >> >> >>>> 00:11:22:33:44:55 >> >> >>>> >> >> >>> Even though it can be done, I think this is wrong model to >> >> >>> program >> >> >> hostport mac address using eswitch port. >> >> >>> All devlink objects are control objects, so what is passed to VM >> >> >>> is what is >> >> >> represented by devlink. >> >> >>> VF in the VM will anyway create its devlink object. >> >> >>> What is wrong in programming hostport? >> >> >>> It gives a very clear view to users of topology and objects. >> >> >> >> >> >> The VF or any subport MAC address should be configured by the >> >> >> orchestration layer that is running on the hypervisor and when a >> >> >> VF is assigned to a VF, the host port is not visible to the hypervisor. >> >> > What prevents creation of hostport due to which is not visible? >> >> > Hostport is control port to program host side of parameters. >> >> > It should be created when user wants to program the parameters. >> >> > >> >> > Model is really straight forward. >> >> > Program host port params using hostport object. >> >> > Program switchport params using rep-netdev. >> >> >> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each >> >> port - host facing ports and switch facing ports. This is in addition >> >> to the netdevs that are created today. >> >> >> >I am not proposing any different. >> >I am proposing only two changes. >> >1. control hostport params via referring hostport (not via indirect >> >peer) >> >> Not really possible. If you passthrough VF into VM, the hostport goes along >> with it. >> >No. >I am sorry in showing the enumeration which is the source of confusion. > >Below is the right enumeration. > >When VF is enumerated initially in the host, where eswitch devlink instance is located. >Below enumeration is seen. > >First two entries shows the link between hostport and switchport. >$ devlink port show >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1 > >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002 Hostport should not have switch_id. > >pci/0000:05:10.1/0 eth netdev flavour hostport >This entry won't be seen if VF auto probing is disabled. Because than VF is not enumerated. > >As a user, I will be programming the mac address of hostport for a VF. >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002 Hmm, so you are going to have 2 hostports for VF: 1) pci/0000:05:10.1/0 real one, that is going to go to VM - with a separate pci address and devlink instance. 2) pci/0000:05:00.0/1 dummy one, which is not really a hostport, as there is no netdev created for it. It only models the other side of cable, which is away in VM. > > >> >> >2. flavour should not be vf/pf, flavour should be hostport, switchport. >> >Because switch is flat and agnostic of pf/vf/mdev. >> >> Not sure. It's good to have this kind of visibility. >> >port can have label/attribute indicating that this belong to VF-1 or mdev as long as you are agreeing to have mdev attribute on host port. >(and not ask for abstracting it, because mdev is well defined kernel object). Why mdev cannot be another flavour? > >> >> > >> >> Are you suggesting that all the devlink objects should be visible >> >> only at the hypervisor layer? >> >> >> >Of course not. >> > >> >Ports and params controlled by hypervisor should be exposed at >> hypervisor/eswitch wherever its parent devlink instance exist. >> >Ports which should be visible inside a VM should be exposed inside a VM. >> >So for a given VF, >> > >> >If eswitch is at hypervisor level, >> >$ devlink port show >> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id >> >00154d130d2f peer pci/0000:05:10.1/0 >> >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f >> >peer pci/0000:05:00.0/10002 >> > >> >where VF is enumerated, >> >$ devlink port show >> >pci/0000:05:10.1/0 eth netdev flavour hostport >> >> So this is how it looks like in VM, right? >> >Yep. >Once VF is mapped to VM only two entries are seen and hostport can be still controlled. > >$ devlink port show >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1 > >pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002 > >This addresses the case for Infiniband where there is no eswitch, but hostports exists and should be managed. >We shouldn't be inventing new devlink APIs or create a fake sw eswitch object which doesn't exist in hw. > >> >> >This is because unprivileged VF doesn't have visibility to eswitch and its >> links. >> > >> >> I think the terminology need to be defined clearly so that we are all >> >> on the same page. >> >> >> >> > >> >> >> Currently we have ndo_set_vf_mac_addr api that works with PF >> >> >> netdev, but i think we are trying to move away from that API and >> >> >> do all the configuration via the port representor netdevs. >> >> > This is fine rep-netdev represents eswitch port. >> >> > You normally don't go to switch to program host port params. >> >> > >> >> >> As the mac address cannot be configured using this netdev, i think >> >> >> Jakub is suggesting creating a devlink opject for each port >> >> >> representor and use that interface to set peer mac address. >> >> > >> >> > I understand but is convoluted interface. >> >> > When you program host NIC mac address you talk to iLo or BIOS. >> >> > When you program switch side mac address, you go >> switch/router/modem. >> >> > >> >> > Also programming host params on host side, also doesn't make >> >> assumption that its connected to eswitch. >> >> > It also doesn't assume that same connectivity for its life. >> >> > >> >> > If you model around how physical devices are configured, it will >> >> > almost >> >> never go wrong and still provides same level of flexibility. >> >> > >> >> >> We should be able use this to configure port vlan too. >> >> >> >> >> >> Also, instead of subport, can we call vport and support different >> >> >> types of vports - sr-iov, siov, vmdq etc. >> >> >> >> >> > At switch level there are just ports. >> >> > sriov, siov, mdev, vmdq are their couter part (peer) where it is >> connected. >> >> > >> >> >>> >> >> >>> Also eswitch is flat. There is no need of pf/vf flavour for port. >> >> >>> It doesn't make sense to define 'mdev' flavour which we are >> >> >>> already >> >> >> working. >> >> >>> At eswitch level it is just a port, it happen to be connected to >> >> >>> vf or pf or >> >> >> other objects, it doesn't matter. >> >> >>> Port should be flavoured as 'hostport' or 'switchport'. >> >> >>> >> >> >>> >> >> >>>> (using the port ids from above)