From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2611D433BD for ; Tue, 9 Jul 2024 02:54:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720493643; cv=none; b=RVGL26HmIEKLzivuGxw9YwKPHyj2NTgIPkOQ9HVZBusAX8E4/hRybogec+ppFqMEOkbTFwoam28cGL1gP4lA4TMCl6dxGfarhoWwGf9smkSb0EduRuDEP8S3TxJvuDu+SUPmuGpJWg9DuATwxAUz1hfMx9zUd41juhIlJE0C7cc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720493643; c=relaxed/simple; bh=U8KbggbN02hqceirmTibPpXOr56EKbg345vLx1qFGh8=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=H/gE+c2yjB3ILKlq2NcdXbdn7fNdaYoAlpe6RazcOed61ZgtwOS5IBGjx+UdzOQjOO9ev4nkkZnxLnpIveAZf/BPe8/Myyl07KMzHC/DZucELR/zTE8sEMaIfIpceA2U/nYR3gROwHbBLIM5W4urb7a0iEkGtkFRTi3z48P8Zs0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y0HBKLJC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y0HBKLJC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 512F1C116B1; Tue, 9 Jul 2024 02:54:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1720493642; bh=U8KbggbN02hqceirmTibPpXOr56EKbg345vLx1qFGh8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Y0HBKLJCc+PIWzf5sIPpgUBussg1wAYumxx86/FK9hB4qGSnIgIrbt19ca8ZqfgHV I+KupeZ0e7rIOKGaVN/FsLekqEDoeeb8yXSQbuAGVOS81s7IN/wUBXNLkrSy7EjaP8 v6fD/OGI4JqvvUS4HRLg51HMgVVvjSl3fCrtsWpHlWNcuLWXxa3Ap177Zoc4vJ/v6i x30GRhSHzRNrhFBN7UInIGyhSicGa+CyFSanno4j0Vc1zPaojcRVTgall+KK2ig8PD xP0/GA3NYMXDQQ4XY44BX+n6CqMyzZW1NWZTtlZePVzpuEms/dGtveJxWOJdORBPRw 03ScrA/w9z4Dg== Date: Mon, 8 Jul 2024 19:54:01 -0700 From: Jakub Kicinski To: Paolo Abeni Cc: netdev@vger.kernel.org, Jiri Pirko , Madhu Chittim , Sridhar Samudrala , Simon Horman , John Fastabend , Sunil Kovvuri Goutham , Jamal Hadi Salim Subject: Re: [PATCH net-next 1/5] netlink: spec: add shaper YAML spec Message-ID: <20240708195401.5ef3f016@kernel.org> In-Reply-To: <390717c3688956d6da04b7de00da3dc57ff9c7a9.camel@redhat.com> References: <75cb77aa91040829e55c5cae73e79349f3988e06.1719518113.git.pabeni@redhat.com> <20240628191230.138c66d7@kernel.org> <4df85437379ae1d7f449fe2c362af8145b1512a5.camel@redhat.com> <20240701195418.5b465d9c@kernel.org> <20240702080452.06e363ae@kernel.org> <20240702140830.2890f77b@kernel.org> <2e4bf0dcffe51a7bc4d427e33f132a99ceac8d8a.camel@redhat.com> <20240703142017.346ed0c1@kernel.org> <390717c3688956d6da04b7de00da3dc57ff9c7a9.camel@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, 08 Jul 2024 21:42:00 +0200 Paolo Abeni wrote: > > To judge whether it's an over-design we'd need to know what the user > > scenarios are, those are not included in the series AFAICS. =20 >=20 > My bad, in the cover I referred to the prior discussions without > explicitly quoting the contents. >=20 > The initial goal here was to allow the user-space to configure per- > queue, H/W offloaded, TX shaping. Per-queue Tx shaping is already supported. I mean "user scenarios" in the agile programming sense as in something that gets closer to production use cases. "Set rate limit on a queue" is a suggested solution not a statement of a problem. > That later evolved in introducing an in-kernel H/W offload TX shaping > API capable of replacing the many existing similar in-kernel APIs and > supporting the foreseeable H/W capabilities. >=20 > > To be blunt - what I'm getting at is that the API mirrors Intel's FW > > API with an extra kludge to support the DSA H/W - in the end matching > > neither what the DSA wants nor what Intel can do. =20 >=20 > The API is similar to Intel=E2=80=99s FW API because to my understanding = the > underlying design - an arbitrary tree - is the most complete > representation possible for shaping H/W. AFAICT is also similar to what > other NIC vendors=E2=80=99 offer. >=20 > I don=E2=80=99t see why the APIs don=E2=80=99t match what Intel can do, c= ould you > please elaborate on that? That's not the main point I was making, I was complaining about how=20 the "extension" to support DSA HW was bolted onto this API. Undeniably the implementation must be stored as a tree (with some max depth). That doesn't imply that, for example, arbitrary re-parenting of non-leaf nodes is an operation that makes sense for all devices. =46rom memory devlink rate doesn't allow mixing some node types after one parent, too. > > IOW I'm trying to explore whether we can find a language of > > transformations which will be more complex than single micro-operations > > on the tree, but sufficiently expressive to provide atomic > > transformations without transactions of micro-ops. =20 >=20 > I personally find it straight-forward describing the scenario you > proposed in terms of the simple operations as allowed by this series. I > also think it=E2=80=99s easier to build arbitrarily complex scenarios in = terms > of simple operations instead of trying to put enough complexity in the > language to describe everything. It will easily lose flexibility or > increase complexity for unclear gain. Why do you think that would be a > better approach? I strongly dislike the operation grouping/batching as it stands in v1. Do you think that part of the design is clean? I also disagree with the assertion that having a language of transformations more advanced that "add/move/delete" increases complexity. Language gives you properties you can reason about. That's why rbtree, B-tree and other algos define a language of transformations. Naive ops make arbitrary trees easy but I put it to you that allowing arbitrary trees without any enforced=20 invariants will breed far more complexity and bugs than a properly designed language :) The primary operation of interest is, in fact: given a set of resources (queues or netdevs) which currently feed one mux node - build=20 a sub-hierarchy. Use case 1 - container b/w sharing. Container manager will want to group a set of queues, and feed a higher layer RR node (so that number of queues doesn't impact load sharing). Say we have two containers (c1, c2 represent queues assigned to them); before container 3 starts: c1 - \ c1 - >RR c1 - / > RR(root) c2 - \ RR c2 - / allocate 2 queues: c1 - \ c1 - ->RR c1 - / > RR(root) c2 - \ RR c2 - / ' qX / qY / hierarchize ("group([qX, qY], type=3D"rr")): c1 - \ c1 - >RR c1 - / \=20 c2 - \ RR -> RR(root) c2 - / / c3 - \ RR=20 c3 - / The container manager just wants so say "take the new queues (X, Y), put them under an RR node". If the language is build around creating mux nodes - that's a single call. Note that the RR(root) node is implicit (in your API it's not visible but it is implied). Users case 2 - delegation - the neat thing about using such construct is that as you can see we never referred to output, i.e. RR(root). The output can be implied by whatever node the queues already output to. So say container 1 (c1) wants to set a b/w limit on two of it's queues (let's call its queues A B C): rr_id =3D group([B C], type=3D"rr") rate_limit(rr_id, 1Gbps) and end up with: cA ----- \ cB - \RR*/ RR cC - / \=20 c2 - \ RR -> RR(root) c2 - / / c3 - \ RR=20 c3 - / * new node, also has rate limit set You don't have to worry about parentage permissions. Container can only add nodes (which is always safe) or delete nodes (and we can trivially enforce it only deletes node it has created itself). > Also the simple building blocks approach is IMHO closer to the original > use-case. >=20 > Are there any other reasons for atomic operations, beyond addressing > low-end H/W? I think atomic changes are convenient to match what the user wants to do. And the second use case is that I do believe there's a real need to allow uncoordinated agents to modify sections on the hierarchy. Neither of those are hard requirements, but I think any application driven requirement should come before "that's the FW API for vendor X" I hope this we can agree on. Thinking about it (for longer than I care to admit), one concern I have about the "mux creation" API I described above is that it forces existence of leaf and non-leaf nodes at the same parent, at least transiently. Can we go back to an API with an explicit create/modify/delete? All we need for Andrew's use case, I believe, is to be able to "somewhat atomically" move leaf nodes.