From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B61F11CFB9 for ; Fri, 31 May 2024 16:00:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717171258; cv=none; b=r1N6nSzahipV2yrlXMeh2c2i8Tt9DQgemKiTDvmtTFQo/W+vYZcSzl8WAASiEJpzigHZPRJ3L8FNVSXVWLWvjQbIgtp5gvzZ0btJ5G+EjpHuhsbDXp0Y946jmKIqqqXE3ZMbPwX+HuM6tYjnqup13pUI5noCph2I427k3Gv1V5A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717171258; c=relaxed/simple; bh=TV02sWb1rtx0nWwZkRpN0Jorl0N3AhObSpgLL9VfzFE=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qOw1LSQoFiDOxUijzoxCe6Bnm6Q6KGyrkxv73Kndsc5zEnWbwIdsXr0MbUsJs/6Iw8ka0NYnqh7/gPtldZW1gr92gbK0Fr7jIkNPB+ZlARY8evMtmln/5OhDinlyOg1CgfSkaZOftMYH/K7zfIvwstXeoFH66RgtJLwPlkqW5r8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=L46iF7zo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="L46iF7zo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EA987C116B1; Fri, 31 May 2024 16:00:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717171258; bh=TV02sWb1rtx0nWwZkRpN0Jorl0N3AhObSpgLL9VfzFE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=L46iF7zoMvZKHATaKXXvDOP9eDPtvLqjX4RuMc50eyqVd/QM31jwHkQfaPMln958x DTNw1ogSB3SqXKDqrTNvNmpftEeANJHWZJ2db790Ct+tlJ8ze7Nc0JRmHTwvxNTkPP 4ejqvkuG0pZ1unn3WbAf4Qt3tJyBM3itDU3xUM90Fcu+usmRStVYpHjhAnFvUp7Jkt EOe2UXTDpOUzxEiBTWHLE9HzuqmAPkXdwUnRMqjq8ChminsjLb6S7I+bXZsqgFi4Rt bJ1FJuCOqQ4whaNEwk6mjSdwLOs6ACh7DVmE/nX8tlYOWUj6I+mLDsjwZaoSNpZu1K 1NAHLO+W/0Zpw== Date: Fri, 31 May 2024 09:00:57 -0700 From: Jakub Kicinski To: Paolo Abeni Cc: netdev@vger.kernel.org, Jiri Pirko , Madhu Chittim , Sridhar Samudrala , Simon Horman , John Fastabend , Sunil Kovvuri Goutham , Jamal Hadi Salim Subject: Re: [RFC PATCH] net: introduce HW Rate Limiting Driver API Message-ID: <20240531090057.02fb8616@kernel.org> In-Reply-To: <16d7b761c3c1b4c4bd327d4486d958682a5f33dd.camel@redhat.com> References: <3d1e2d945904a0fb55258559eb7322d7e11066b6.1715199358.git.pabeni@redhat.com> <20240528101845.414cff22@kernel.org> <16d7b761c3c1b4c4bd327d4486d958682a5f33dd.camel@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 31 May 2024 11:22:46 +0200 Paolo Abeni wrote: > On Tue, 2024-05-28 at 10:18 -0700, Jakub Kicinski wrote: > > > + u32 priority; /* scheduling strict priority */ > > > + u32 weight; /* scheduling WRR weight*/ > > > > I wonder if we should somehow more clearly specify what a node can do. > > Like Andrew pointed out, if we have a WRR node, presumably the weights > > go on the children, since there's only one weigh param. But then the > > WRR node itself is either empty (no params) or has rate params... which > > is odd. > > > > Maybe shaping nodes and RR nodes should be separate node classes, > > somehow? > > Possibly clarifying the meaning of 'weight' field would help? > It means: this node is scheduled WRR according to the specified weight > among the sibling shapers with the same priority. > > I think it's quite simpler than introducing different node classes with > separate handling. My understanding is that the latter would help only > to workaround some H/W limitation and will make the implementation more > difficult for more capable H/W. > > What kind of problems do you foresee with the above definition? The problem Andrew mentioned, basically. There may not be a path to transition one fully offloaded hierarchy to another (e.g. switching strict prio to WRR). TBH I haven't really grasped your proposal in: https://lore.kernel.org/all/db51b7ccff835dd5a96293fb84d527be081de062.camel@redhat.com/ > > > + * enum net_shaper_scope - the different scopes where a shaper could be attached > > > + * @NET_SHAPER_SCOPE_PORT: The root shaper for the whole H/W. > > > + * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network device. > > > + * @NET_SHAPER_SCOPE_VF: The shaper is attached to the given virtual > > > + * function. > > > + * @NET_SHAPER_SCOPE_QUEUE_GROUP: The shaper groups multiple queues under the > > > + * same device. > > > + * @NET_SHAPER_SCOPE_QUEUE: The shaper is attached to the given device queue. > > > > I wonder if we need traffic class? Some devices may have two schedulers, > > one from the host interfaces (PCIe) into the device buffer. And then > > from the device buffer independently into the wire. > > I feel like I'm really missing your point here. How would you use > traffic class? And how the 2 schedulers come into play here? Each of > them will be tied to one or more of the scopes above, why exposing H/W > details that will not expand user visible features? I was just thinking aloud, I'm not sure anyone would ever use a single host queue to service (ingress) traffic from multiple TCs. Or if any HW actually supports that. We can add it later. > > > + * NET_SHAPER_SCOPE_PORT and NET_SHAPER_SCOPE_VF are only available on > > > + * PF devices, usually inside the host/hypervisor. > > > + * NET_SHAPER_SCOPE_NETDEV, NET_SHAPER_SCOPE_QUEUE_GROUP and > > > + * NET_SHAPER_SCOPE_QUEUE are available on both PFs and VFs devices. > > > + */ > > > +enum net_shaper_scope { > > > + NET_SHAPER_SCOPE_PORT, > > > + NET_SHAPER_SCOPE_NETDEV, > > > + NET_SHAPER_SCOPE_VF, > > > > I realized now that we do indeed need this VF node (if we want to > > internally express the legacy SRIOV NDOs in this API), as much > > as I hate it. Could you annotate somehow my nack on ever exposing > > the ability to hook on the VF to user space? > > This work sparked from the need to allow configuring a shaper on > specific queues of some VF from the host. I hope this is not what you > are nacking here? Could you please elaborate a bit what concern you > with 'hook on the VF to user space'? Would the ability of attaching a > shaper to the VF from the host hit your nack? Queue configuration, for the VF, from the hypervisor? I thought it was from the VF. In any case, hypervisor has the representors. Use the representor's NETDEV scope? > > > + NET_SHAPER_SCOPE_QUEUE_GROUP, > > > > We may need a definition for a queue group. Did I suggest this? > > I think this was mentioned separately by you, John Fastabend and Intel. Oh ugh, I can't type. I think I meant to say "Why do we need..." > > Isn't queue group just a bunch of queues feeding a trivial RR node? > > Why does it need to be a "scope"? > > The goal is allowing arbitrary manipulation at the queue group level. > e.g. you can have different queue groups with different priority, or > weigh or shaping, and below them arbitrary shaping at the queue level. > > Note that a similar concept could be introduced for device (or VFs) > groups. > > Why would you like to constraint the features avail at the queue > groups? Wait! You don't have a way to create pure RR nodes other than queue group now? Perhaps that's what I'm missing... > > > + NET_SHAPER_SCOPE_QUEUE, > > > +}; > > > + > > > +/** > > > + * struct net_shaper_ops - Operations on device H/W shapers > > > + * @add: Creates a new shaper in the specified scope. > > > > "in a scope"? Isn't the scope just defining the ingress and egress > > points of the scheduling hierarchy? > > This is purely lexical matter, right? The scope, and more specifically > the full 'handle' comprising more scoped-related information, specifies > 'where' the shaper is located. Do you have a suggested alternative > wording? ... and also what confused me here. How are you going to do 2 layers of grouping with arbitrary shaping? We need arbitrary inner nodes. Unless I'm missing a trick. > > Also your example moves schedulers from queue scope to queue group > > scope. > > In the example, this part creates/enables a shaper at the queue level: > > u32 ghandle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE_GROUP, 0, 0); > dev->shaper_ops->add(dev, ghandle, &ginfo); > > and this: > > u32 handle = shaper_make_handle(NET_SHAPER_SCOPE_QUEUE, 0, queue_id); > //... > dev->netshaper_ops->move(dev, qhandle, ghandle, NULL); > > changes the _parent_ of qhandle setting it to the previously creates > queue group shaper. qhandle initial/implicit/default parent was the > device scope shaper. > > The scope of the queue shaper remains unchanged. An I misunderstanding > your point? > > > Why "if it exists" ? Core should make sure it exists, no? > > > > In addition to ops and state, the device will likely need to express > > capabilities of some sort. So that the core can do some work for the > > drivers and in due course we can expose them to user space for > > discoverability. > > What kind of capabilities are you thinking about? supported scopes? > supported metrics? What else? I feel like there is a lot of > mixed/partial kind of support which is hard to express in a formal way > but would fit nicely an extended ack for a failing op - as the SP/WRR > constrains Andrew reported. > > Do we need to introduce this introspection support from the start? I > think that having a few H/W implementations around would help (at least > me) understanding which properties could relevant here. Try to write good tests which can run on HW for more than one vendor. The introspection and capabilities will become apparent.