From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BD82C43381 for ; Wed, 27 Feb 2019 18:30:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C1658217F5 for ; Wed, 27 Feb 2019 18:30:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=netronome-com.20150623.gappssmtp.com header.i=@netronome-com.20150623.gappssmtp.com header.b="XO2TUmpb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730163AbfB0SaH (ORCPT ); Wed, 27 Feb 2019 13:30:07 -0500 Received: from mail-qt1-f194.google.com ([209.85.160.194]:45383 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726389AbfB0SaH (ORCPT ); Wed, 27 Feb 2019 13:30:07 -0500 Received: by mail-qt1-f194.google.com with SMTP id d18so20334447qtg.12 for ; Wed, 27 Feb 2019 10:30:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=fQypD81JUqzwSTJibKqt5iiducXMIWJXVv5kbWuTmDc=; b=XO2TUmpbi3q4yWxWc217910fsj6QllMtgKy55zTanC+xmIihddUxowSAegvXQNC+Pp 6/IEqxWAmzrgOks9i12u3BhCRNVnwuB+E1vIM8mkENUa/yFvAGybJcYA/bRjja3Sgi7G IYL1xVkize6vza7Eh5H9PXja2nE52p7yZ01Gk2HsTsBLQxdS1cLUM75pPJG2zdmydydy slXIU4zRp42Mbudv4lOomIZnWnpbHDyuK2/MnjU0iTnLdXV57ZGjC6//MQ5NesLRU16d rlqyid4RzRUVXjHh6ALRsZm/+dqNL7UKUOQrLzjGldUGCJIgjawbFmHca7yU2+xtc17I qApA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=fQypD81JUqzwSTJibKqt5iiducXMIWJXVv5kbWuTmDc=; b=Q2nNOTY3S0IO4/QdO0FflEwSUwr05ajjnK3exI1eby+6iuRmC3CLpKu8+N5D63ajqM wkkARRoAs7sQMxEbXkiT1wI9GXAgwRh20ENj5ZKGfLu9fedO+DPqUOqdaeHsi0keT2tE yKjgrhSK9XNUCZ18LKpL6awKO+oAIjUe42ihHrpslIJ6cfZPRRFIMHTzQuq9YhRXmbZe dUd0kiKY4nn/htkJhB80ztBDh2zeNBBmWvmzFEFp7nn+YN3WKJp3k45jdvMRlwfXSMvU ZSOJi2wNtw1zry6IvH6UduCF0qa1SBBvPtgV/ufZyAP1VdPMcxGNUPmwrFZPS+XtnxTP w50g== X-Gm-Message-State: APjAAAUWaUOR/KTqAfinMwsjDy8VrOgIHWGF3oNsUsidXiBvuak35bzG 2Ae60auWwDdsGMqXpFrn+z+6nw== X-Google-Smtp-Source: AHgI3IaOedJmQXfgPUexZfAlNjejPhwrboeMwtIAHHvOMoYriPwyYfZodQBH13eIYSWrDlnJTHUzCQ== X-Received: by 2002:ac8:1702:: with SMTP id w2mr2981909qtj.164.1551292206092; Wed, 27 Feb 2019 10:30:06 -0800 (PST) Received: from cakuba.netronome.com ([66.60.152.14]) by smtp.gmail.com with ESMTPSA id 10sm11019143qtx.40.2019.02.27.10.30.05 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 27 Feb 2019 10:30:05 -0800 (PST) Date: Wed, 27 Feb 2019 10:30:00 -0800 From: Jakub Kicinski To: Jiri Pirko Cc: davem@davemloft.net, oss-drivers@netronome.com, netdev@vger.kernel.org, parav@mellanox.com, jgg@mellanox.com Subject: Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports Message-ID: <20190227103000.6ea6f7c0@cakuba.netronome.com> In-Reply-To: <20190227123753.GB2240@nanopsycho> References: <20190226182436.23811-1-jakub.kicinski@netronome.com> <20190226182436.23811-5-jakub.kicinski@netronome.com> <20190227123753.GB2240@nanopsycho> Organization: Netronome Systems, Ltd. MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote: > Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@netronome.com wrote: > >PCI endpoint corresponds to a PCI device, but such device > >can have one more more logical device ports associated with it. > >We need a way to distinguish those. Add a PCI subport in the > >dumps and print the info in phys_port_name appropriately. > > > >This is not equivalent to port splitting, there is no split > >group. It's just a way of representing multiple netdevs on > >a single PCI function. > > > >Note that the quality of being multiport pertains only to > >the PCI function itself. A PF having multiple netdevs does > >not mean that its VFs will also have multiple, or that VFs > >are associated with any particular port of a multiport VF. > > We've been discussing the problem of subport (we call it "subfunction" > or "SF") for some time internally. Turned out, this is probably harder > task to model. Please prove me wrong. > > The nature of VF makes it a logically separate entity. It has a separate > PCI address, it should therefore have a separate devlink instance. > You can pass it through to VM, then the same devlink instance should be > created inside the VM and disappear from the host. Depends what a devlink instance represents :/ On one hand you may want to create an instance for a VF to allow it to spawn soft ports, on the other you may want to group multiple functions together. IOW if devlink instance is for an ASIC, there should be one per device per host. So if we start connecting multiple functions (PFs and/or VFs) to one host we should probably introduce the notion of devlink aliases or some such (so that multiple bus addresses can target the same devlink instance). Those less pipelined NICs can forward between ports, but still want a function per port (otherwise user space sometimes gets confused). If we have multiple functions which are on the same "switchid" they should have a single devlink instance if you ask me. That instance will have all the ports of the device. You say disappear from the host - what do you mean. Are you referring to the VF port disappearing? But on the switch the port is still there, and you should show the subports on the PF side IMHO. Devlink ports should allow users to understand the topology of the switch. Is spawning VMDq sub-instances the only thing we can think of that VMs may want to do? Are there any other uses? > SF (or subport) feels similar to that. Basically it is exactly the same > thing as VF, only does reside under PF PCI function. > > That is why I think, for sake of consistency, it should have a separate > devlink entity as well. The problem is correct sysfs modelling and > devlink handle derived from that. Parav is working on a simple soft > bus for this purpose called "subbus". There is a RFC floating around on > Mellanox internal mailing list, looks like it is time to send it > upstream. > > Then each PF driver which have SFs would register subbus devices > according to SFs/subports and they would be properly handled by bus > probe, devlink and devlink port and netdev instances created. > > Ccing Parav and Jason. You guys come from the RDMA side of the world, with which I'm less familiar, and the soft bus + spawning devices seems to be a popular design there. Could you describe the advantages of that model for the sake of the netdev-only folks? :) Another term that gets thrown into the mix here is mediated devices, right? If you wanna pass the sub-spawn-soft-port to a VM. Or run DPDK on some queues. To state the obvious AF_XDP and macvlan offload were are previous answers to some of those use cases. What is the forwarding model for those subports? Are we going to allow flower rules from VMs? Is it going to be dst MAC only? Or is the hypervisor going to forward as it sees appropriate (OvS + "repr"/port netdev)?