From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 15B07C4332F for ; Tue, 15 Nov 2022 14:03:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 8DC308139D; Tue, 15 Nov 2022 14:03:18 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 8DC308139D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osuosl.org; s=default; t=1668520998; bh=8u83Xxa0E/gen8duHtjKrX5FiAVokACLItTf4SlU18o=; h=Date:From:To:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: Cc:From; b=wHmTAqOdh/NC9VKCwOvXYUWmQboxRSWDKl9NJ8d4pFcW1W4IxbFLZ7F6rBdQWOhI4 WctOzSBNZF9gvfovl99ryS7SKk52+UUABdBVSi5JE2WoJbQOAGTMSKUK0d+4wCDI53 SkEXBTl1KuS9nUEHXhmSAAb2JkAA/NhhB89WyNh8n8pc9U9M3BMF2QPQ0cXf61truc m7J6sCvT0ls7+nZXDQSokFtaFk8TAsDi7lXPU/STKfuKsedfqJpgCZ6v69fCZamIeo wIr1LJPa3uVcPepCKZxlVb1r1r5g3opGyQ02mDYmb0Q+uF9MU4qFZFHPcBIaJREchR DxW4GZwe8jPHQ== X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P0aX7oYX6z_l; Tue, 15 Nov 2022 14:03:17 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by smtp1.osuosl.org (Postfix) with ESMTP id 2C3AF813A4; Tue, 15 Nov 2022 14:03:17 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 2C3AF813A4 Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by ash.osuosl.org (Postfix) with ESMTP id 695E41BF319 for ; Tue, 15 Nov 2022 14:03:15 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 4204960C13 for ; Tue, 15 Nov 2022 14:03:15 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 4204960C13 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RNsGbHpv2zyp for ; Tue, 15 Nov 2022 14:03:13 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org DD5B160812 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp3.osuosl.org (Postfix) with ESMTPS id DD5B160812 for ; Tue, 15 Nov 2022 14:03:12 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6500,9779,10531"; a="376523126" X-IronPort-AV: E=Sophos;i="5.96,166,1665471600"; d="scan'208";a="376523126" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2022 06:02:55 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10531"; a="883968954" X-IronPort-AV: E=Sophos;i="5.96,166,1665471600"; d="scan'208";a="883968954" Received: from unknown (HELO localhost.localdomain) ([10.237.112.144]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2022 06:02:51 -0800 Date: Tue, 15 Nov 2022 15:02:40 +0100 From: Michal Swiatkowski To: Leon Romanovsky Message-ID: References: <20221114125755.13659-1-michal.swiatkowski@linux.intel.com> <49e2792d-7580-e066-8d4e-183a9c826e68@intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668520993; x=1700056993; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=Hjd1rUR4NX66NbZyLLto8JKEeGu4hl5UvC80BnOy2jg=; b=SRxwA854mc4dcCrBTrTnxuUcjd3C+lp1B7zU/CG1ATls8hNqEMX77RRx CjmED+cKw6LvQren6hcsOBPVTyNdFARMLleEnmcNKtlqoYFcTy78hdw/T CjdMSALrGtbRMpkH4CmgOb2Um1V1DH5SgcPGSec3xE79EqPdIQTZ7ZEMy 3gwhDxUcL/YZ0pRZeTjrqo8pZrnYDugDZaWa5TLYwGbVkgCmFOMlqeAy4 xl2Sf+dw/7cdrZPRBnOIvU/mieLJgtBr85xg97OYgKc+8LPI5xaLF0wDp 3719809aPk/4v/Pk8fbgYX+N0uAX5f+Evi7DEna9RxKVKBzZj3ZjAbefb A==; X-Mailman-Original-Authentication-Results: smtp3.osuosl.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=SRxwA854 Subject: Re: [Intel-wired-lan] [PATCH net-next 00/13] resource management using devlink reload X-BeenThere: intel-wired-lan@osuosl.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Wired Ethernet Linux Kernel Driver Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jiri@nvidia.com, leszek.kaliszczuk@intel.com, przemyslaw.kitszel@intel.com, edumazet@google.com, mustafa.ismail@intel.com, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, shiraz.saleem@intel.com, davem@davemloft.net Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-wired-lan-bounces@osuosl.org Sender: "Intel-wired-lan" On Tue, Nov 15, 2022 at 02:12:12PM +0200, Leon Romanovsky wrote: > On Tue, Nov 15, 2022 at 11:16:58AM +0100, Michal Swiatkowski wrote: > > On Tue, Nov 15, 2022 at 11:32:14AM +0200, Leon Romanovsky wrote: > > > On Tue, Nov 15, 2022 at 10:04:49AM +0100, Michal Swiatkowski wrote: > > > > On Tue, Nov 15, 2022 at 10:11:10AM +0200, Leon Romanovsky wrote: > > > > > On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote: > > > > > > On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote: > > > > > > > On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote: > > > > > > > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote: > > > > > > > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote: > > > > > > > > > > Currently the default value for number of PF vectors is number of CPUs. > > > > > > > > > > Because of that there are cases when all vectors are used for PF > > > > > > > > > > and user can't create more VFs. It is hard to set default number of > > > > > > > > > > CPUs right for all different use cases. Instead allow user to choose > > > > > > > > > > how many vectors should be used for various features. After implementing > > > > > > > > > > subdevices this mechanism will be also used to set number of vectors > > > > > > > > > > for subfunctions. > > > > > > > > > > > > > > > > > > > > The idea is to set vectors for eth or VFs using devlink resource API. > > > > > > > > > > New value of vectors will be used after devlink reinit. Example > > > > > > > > > > commands: > > > > > > > > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16 > > > > > > > > > > $ sudo devlink dev reload pci/0000:31:00.0 > > > > > > > > > > After reload driver will work with 16 vectors used for eth instead of > > > > > > > > > > num_cpus. > > > > > > > > > By saying "vectors", are you referring to MSI-X vectors? > > > > > > > > > If yes, you have specific interface for that. > > > > > > > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/ > > > > > > > > > > > > > > > > This patch series is exposing a resources API to split the device level MSI-X vectors > > > > > > > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and > > > > > > > > in future subfunctions). Today this is all hidden in a policy implemented within > > > > > > > > the PF driver. > > > > > > > > > > > > > > Maybe we are talking about different VFs, but if you refer to PCI VFs, > > > > > > > the amount of MSI-X comes from PCI config space for that specific VF. > > > > > > > > > > > > > > You shouldn't set any value through netdev as it will cause to > > > > > > > difference in output between lspci (which doesn't require any driver) > > > > > > > and your newly set number. > > > > > > > > > > > > If I understand correctly, lspci shows the MSI-X number for individual > > > > > > VF. Value set via devlink is the total number of MSI-X that can be used > > > > > > when creating VFs. > > > > > > > > > > Yes and no, lspci shows how much MSI-X vectors exist from HW point of > > > > > view. Driver can use less than that. It is exactly as your proposed > > > > > devlink interface. > > > > > > > > > > > > > > > > > > Ok, I have to take a closer look at it. So, are You saing that we should > > > > drop this devlink solution and use sysfs interface fo VFs or are You > > > > fine with having both? What with MSI-X allocation for subfunction? > > > > > > You should drop for VFs and PFs and keep it for SFs only. > > > > > > > I understand that MSI-X for VFs can be set via sysfs interface, but what > > with PFs? > > PFs are even more tricker than VFs, as you are changing that number > while driver is bound. This makes me wonder what will be lspci output, > as you will need to show right number before driver starts to load. > > You need to present right value if user decided to unbind driver from PF too. > In case of ice driver lspci -vs shows: Capabilities: [70] MSI-X: Enable+ Count=1024 Masked so all vectors that hw supports (PFs, VFs, misc, etc). Because of that total number of MSI-X in the devlink example from cover letter is 1024. I see that mellanox shows: Capabilities: [9c] MSI-X: Enable+ Count=64 Masked I assume that 64 is in this case MSI-X ony for this one PF (it make sense). To be honest I don't know why we show maximum MSI-X for the device there, but because of that the value will be the same afer changing allocation of MSI-X across features. Isn't the MSI-X capabilities read from HW register? > > Should we always allow max MSI-X for PFs? So hw_max - used - > > sfs? Is it save to call pci_enable_msix always with max vectors > > supported by device? > > I'm not sure. I think that it won't give you much if you enable > more than num_online_cpu(). > Oh, yes, correct, I missed that. > > > > I added the value for PFs, because we faced a problem with MSI-X > > allocation on 8 port device. Our default value (num_cpus) was too big > > (not enough vectors in hw). Changing the amount of vectors that can be > > used on PFs was solving the issue. > > We had something similar for mlx5 SFs, where we don't have enough vectors. > Our solution is simply to move to automatic shared MSI-X mode. I would > advocate for that for you as well. > Thanks for proposing solution, I will take a look how this work in mlx5. > > > > Let me write an example. As default MSI-X for PF is set to num_cpus, the > > platform have 128 CPUs, we have 8 port device installed there and still > > have 1024 vectors in HW (I simplified because I don't count additional > > interrupts). We run out of vectors, there is 0 vectors that can be used > > for VFs. Sure, it is easy to handle, we can divide PFs interrupts by 2 > > and will end with 512 vectors for VFs. I assume that with current sysfs > > interface in this situation MSI-X for VFs can be set from 0 to 512? What > > if user wants more? If there is a PFs MSI-X value which can be set by > > user, user can decrease the value and use more vectors for VFs. Is it > > possible in current VFs sysfs interface? I mean, setting VFs MSI-X > > vectors to value that will need to decrease MSI-X for PFs. > > You can't do it and this limitation is because PF is bound. You can't change > that number while driver is running. AFAIR, such change will be PCI spec > violation. > As in previous comment, we have different value in MSI-X capabilities field (max number of vectors), so I won't allow user to change this value. I don't know why it is done like that (MSI-X amount for whole device in PCI caps). I will try to dig into it. > > > > > > > > > > > > As Jake said I will fix the code to track both values. Thanks for pointing the patch. > > > > > > > > > > > > > > > > > > > > Also in RDMA case, it is not clear what will you achieve by this > > > > > > > setting too. > > > > > > > > > > > > > > > > > > > We have limited number of MSI-X (1024) in the device. Because of that > > > > > > the amount of MSI-X for each feature is set to the best values. Half for > > > > > > ethernet, half for RDMA. This patchset allow user to change this values. > > > > > > If he wants more MSI-X for ethernet, he can decrease MSI-X for RDMA. > > > > > > > > > > RDMA devices doesn't have PCI logic and everything is controlled through > > > > > you main core module. It means that when you create RDMA auxiliary device, > > > > > it will be connected to netdev (RoCE and iWARP) and that netdev should > > > > > deal with vectors. So I still don't understand what does it mean "half > > > > > for RDMA". > > > > > > > > > > > > > Yes, it is controlled by module, but during probe, MSI-X vectors for RDMA > > > > are reserved and can't be used by ethernet. For example I have > > > > 64 CPUs, when loading I get 64 vectors from HW for ethernet and 64 for > > > > RDMA. The vectors for RDMA will be consumed by irdma driver, so I won't > > > > be able to use it in ethernet and vice versa. > > > > > > > > By saing it can't be used I mean that irdma driver received the MSI-X > > > > vectors number and it is using them (connected them with RDMA interrupts). > > > > > > > > Devlink resource is a way to change the number of MSI-X vectors that > > > > will be reserved for RDMA. You wrote that netdev should deal with > > > > vectors, but how netdev will know how many vectors should go to RDMA aux > > > > device? Does there an interface for setting the vectors amount for RDMA > > > > device? > > > > > > When RDMA core adds device, it calls to irdma_init_rdma_device() and > > > num_comp_vectors is actually the number of MSI-X vectors which you want > > > to give to that device. > > > > > > I'm trying to say that probably you shouldn't reserve specific vectors > > > for both ethernet and RDMA and simply share same vectors. RDMA applications > > > that care about performance set comp_vector through user space verbs anyway. > > > > > > > Thanks for explanation, appriciate that. In our driver num_comp_vectors for > > RDMA is set during driver probe. Do we have any interface to change > > num_comp_vectors while driver is working? > > No, as this number is indirectly exposed to the user space. > Ok, thanks. > > > > Sorry, I am not fully familiar with RDMA. Can user app for RDMA set > > comp_vector to any value or only to max which is num_comp_vectors given > > for RDMA while creating aux device? > > comp_vector logically equal to IRQ number and this is how RDMA > applications controls to which interrupt deliver completions. > > " The CQ will use the completion vector comp_vector for signaling > completion events; it must be at least zero and less than > context->num_comp_vectors. " > https://man7.org/linux/man-pages/man3/ibv_create_cq.3.html > Thank You > > > > Assuming that I gave 64 MSI-X for RDMA by setting num_comp_vectors to > > 64, how I will know if I can or can't use these vectors in ethernet? > > Why should you need to know? Vectors are not exclusive and they can be > used by many applications at the same time. The thing is that it is far > fetch to except that high performance RDMA applications and high > performance ethernet can coexist on same device at the same time. > Yes, but after loading aux device part of vectors (num_comp_vectors) are reserved for only RDMA use case (at least in ice driver). We though that devlink resource interface can be a good way to change the num_comp_vectors in this case. Maybe I am wrong, but I think that vectors that we reserved for RDMA can't be used by ethernet (in ice driver). We sent specific MSI-X entries to rdma driver, so I don't think that the same entries can be reuse by ethernet. I am talking only about ice + irdma case, am I wrong (probably question to Intel devs)? Huge thanks for Your comments here, I understnad more now. > Thanks > > > > > Thanks > > > > > Thanks > > > > > > > > > > > Thanks > > > > > > > > > Thanks _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan