From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14571C4332F for ; Mon, 14 Nov 2022 09:23:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236421AbiKNJW6 (ORCPT ); Mon, 14 Nov 2022 04:22:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235873AbiKNJW5 (ORCPT ); Mon, 14 Nov 2022 04:22:57 -0500 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6056D15704 for ; Mon, 14 Nov 2022 01:22:54 -0800 (PST) Received: by mail-pl1-x633.google.com with SMTP id p21so9522613plr.7 for ; Mon, 14 Nov 2022 01:22:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=eVNw2mPbfqBEOy/HCY6VcrsdwOvpz9Ffl1CPRifap0g=; b=O1IrDD1zxS5hQJ/v9uBJ9862+2HarsFwIsB0rqaY0B5HS12CpQtDe2Di6/j9/H136N /CpSzn+HaTrDMUhBtwD/fWLhUkJBbkUtw5ajCrlnB1Nv/ICZosYYvWHZD/Owjr8bMivn DCRercGA79frTX26k2uxISLTeVYXQMZ0gxfsaCSB+2TkNleGUashUxfNKVGlVN4N1fUh o+aDN6L8kXZbU76xlChuhNNSrYlCv/4fE9bBF9c3dfEwoVpLlyVk+3Lt+9+wiJXJrqjt cfBn8fZjSP09y0nNXu58RsXHt/GAMQ0pHlHJVjeNyfaPX/jTdyRkCsDUYFbx122Zob4v EAmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=eVNw2mPbfqBEOy/HCY6VcrsdwOvpz9Ffl1CPRifap0g=; b=C8wQDIvn+XAzkUvnQQT7iL1CtoxzaHVluQQL/XEa8CH+ezdjHeJp/jAtwSR25I9rXg Bda+PjpvqMIzhylxSDQqWlUrYr/ISrD4CuoYcT6FnOpCHYta17WQENh5zqWcpm4PbRjM hlCKhqKb25RlgZhHnzAiHUffkb1EfO80w6nD1UqmWHs1rhoidMV9LklaJWVAphFY0Q2i lfxklqCfcR0h8kOowo9WAjCmOMH55wtM+Vb9O+41fijgwefwdIkbMO+BLrXlsk/MroHq bK7Q/+MqLULZvuCAvTGWH8jZJTkJzR3a2/qJSoAWtkj40BcjE/MBsL5ZhZTq2YUQGRoR 2Esw== X-Gm-Message-State: ANoB5plW3AYRGX2y9zhdnSTkZlkXUpv5+8UALxvaJUrCRqZzienAMi/K m/KbhafAFAGQv0brdmrzErI= X-Google-Smtp-Source: AA0mqf4oDz4M/tfZJ7decIw6P/l4Nm6FsUZaavDiELHKXy28G5KPc+JQTxPnRQ97SgBv5ilfWci18g== X-Received: by 2002:a17:90a:d808:b0:213:9c67:1b09 with SMTP id a8-20020a17090ad80800b002139c671b09mr12657149pjv.221.1668417773326; Mon, 14 Nov 2022 01:22:53 -0800 (PST) Received: from debian.me (subs28-116-206-12-49.three.co.id. [116.206.12.49]) by smtp.gmail.com with ESMTPSA id ch16-20020a17090af41000b00210125b789dsm6000972pjb.54.2022.11.14.01.22.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 01:22:52 -0800 (PST) Received: by debian.me (Postfix, from userid 1000) id 52C91103ED3; Mon, 14 Nov 2022 16:22:47 +0700 (WIB) Date: Mon, 14 Nov 2022 16:22:47 +0700 From: Bagas Sanjaya To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , Dietmar Eggemann , Vincent Guittot , linux-kernel@vger.kernel.org, Lukasz Luba , Xuewen Yan , Wei Wang , Jonathan JMChen , Hank , Paul Bone Subject: Re: [PATCH] Documentation: sched: Add a new sched-util-clamp.rst Message-ID: References: <20221105232343.887199-1-qyousef@layalina.io> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="GLa0IuTWsGzJ3iko" Content-Disposition: inline In-Reply-To: <20221105232343.887199-1-qyousef@layalina.io> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --GLa0IuTWsGzJ3iko Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Nov 05, 2022 at 11:23:43PM +0000, Qais Yousef wrote: > +2. DESIGN: > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Why ALLCAPS and trailing colon for section title? > +When a task is attached to a CPU controller, its uclamp values will be i= mpacted > +as follows: > + > +* cpu.uclamp.min is a protection as described in section 3-3 in > + Documentation/admin-guide/cgroup-v2.rst. > ... > +* cpu.uclamp.max is a limit as described in section 3-2 in > + Documentation/admin-guide/cgroup-v2.rst. > + Exactly what section on cgroup doc do you refer? I don't see any section number there. Did you mean this?: ---- >8 ---- diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-= guide/cgroup-v2.rst index dc254a3cb95686..fd448069c11562 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -619,6 +619,8 @@ process migrations. and is an example of this type. =20 =20 +.. _cgroupv2-limits-distributor: + Limits ------ =20 @@ -635,6 +637,7 @@ process migrations. "io.max" limits the maximum BPS and/or IOPS that a cgroup can consume on an IO device and is an example of this type. =20 +.. _cgroupv2-protections-distributor: =20 Protections ----------- diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/s= cheduler/sched-util-clamp.rst index 6601bda176d16e..5741acb35b7db2 100644 --- a/Documentation/scheduler/sched-util-clamp.rst +++ b/Documentation/scheduler/sched-util-clamp.rst @@ -364,8 +364,8 @@ There are two uclamp related values in the CPU cgroup c= ontroller: When a task is attached to a CPU controller, its uclamp values will be imp= acted as follows: =20 -* cpu.uclamp.min is a protection as described in section 3-3 in - Documentation/admin-guide/cgroup-v2.rst. +* cpu.uclamp.min is a protection as described in :ref:`section 3-3 of cgro= up + v2 documentation `. =20 If a task uclamp_min value is lower than cpu.uclamp.min, then the task w= ill inherit the cgroup cpu.uclamp.min value. @@ -373,8 +373,8 @@ as follows: In a cgroup hierarchy, effective cpu.uclamp.min is the max of (child, parent). =20 -* cpu.uclamp.max is a limit as described in section 3-2 in - Documentation/admin-guide/cgroup-v2.rst. +* cpu.uclamp.max is a limit as described in :ref:`section 3-2 of cgroup v2 + documentation `. =20 If a task uclamp_max value is higher than cpu.uclamp.max, then the task = will inherit the cgroup cpu.uclamp.max value. IMO, the doc wording can be improved (applied on top of your fixup [1]): ---- >8 ---- diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/s= cheduler/sched-util-clamp.rst index 728ffa364fc7ad..6601bda176d16e 100644 --- a/Documentation/scheduler/sched-util-clamp.rst +++ b/Documentation/scheduler/sched-util-clamp.rst @@ -2,31 +2,29 @@ Utilization Clamping =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -1. INTRODUCTION -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +1. Introduction +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -Utilization clamping is a scheduler feature that allows user space to help= in -managing the performance requirement of tasks. It was introduced in v5.3 -release. The CGroup support was merged in v5.4. - -It is often referred to as util clamp and uclamp. You'll find all variatio= ns -used interchangeably in this documentation and in the source code. +Utilization clamping, also known as util clamp or uclamp, is a scheduler +feature that allows user space to help in managing the performance require= ment +of tasks. It was introduced in v5.3 release. The CGroup support was merged= in +v5.4. =20 Uclamp is a hinting mechanism that allows the scheduler to understand the -performance requirements and restrictions of the tasks. Hence help it make -a better placement decision. And when schedutil cpufreq governor is used, = util -clamp will influence the frequency selection as well. +performance requirements and restrictions of the tasks, thus it helps the +scheduler to make a better decision. And when schedutil cpufreq governor is +used, util clamp will influence the frequency selection as well. =20 Since scheduler and schedutil are both driven by PELT (util_avg) signals, = util clamp acts on that to achieve its goal by clamping the signal to a certain -point; hence the name. I.e: by clamping utilization we are making the syst= em -run at a certain performance point. +point; hence the name. That is, by clamping utilization we are making the +system run at a certain performance point. =20 -The right way to view util clamp is as a mechanism to make performance -constraints request/hint. It consists of two components: +The right way to view util clamp is as a mechanism to make request or hint= on +performance constraints. It consists of two tunables: =20 - * UCLAMP_MIN, which sets a lower bound. - * UCLAMP_MAX, which sets an upper bound. + * UCLAMP_MIN, which sets the lower bound. + * UCLAMP_MAX, which sets the upper bound. =20 These two bounds will ensure a task will operate within this performance r= ange of the system. UCLAMP_MIN implies boosting a task, while UCLAMP_MAX implies @@ -35,18 +33,18 @@ capping a task. One can tell the system (scheduler) that some tasks require a minimum performance point to operate at to deliver the desired user experience. Or= one can tell the system that some tasks should be restricted from consuming too -much resources and should NOT go above a specific performance point. Viewi= ng +much resources and should not go above a specific performance point. Viewi= ng the uclamp values as performance points rather than utilization is a better abstraction from user space point of view. =20 As an example, a game can use util clamp to form a feedback loop with its perceived FPS. It can dynamically increase the minimum performance point required by its display pipeline to ensure no frame is dropped. It can also -dynamically 'prime' up these tasks if it knows in the coming few 100ms -a computationally intensive scene is about to happen. +dynamically 'prime' up these tasks if it knows in the coming few hundred +milliseconds a computationally intensive scene is about to happen. =20 On mobile hardware where the capability of the devices varies a lot, this -dynamic feedback loop offers a great flexibility in ensuring best user +dynamic feedback loop offers a great flexibility to ensure best user experience given the capabilities of any system. =20 Of course a static configuration is possible too. The exact usage will dep= end @@ -68,17 +66,17 @@ stay on the little cores which will ensure that: are CPU intensive tasks. =20 By making these uclamp performance requests, or rather hints, user space c= an -ensure system resources are used optimally to deliver the best user experi= ence -the system is capable of. +ensure system resources are used optimally to deliver the best possible us= er +experience. =20 Another use case is to help with overcoming the ramp up latency inherit in= how scheduler utilization signal is calculated. =20 -A busy task for instance that requires to run at maximum performance point= will -suffer a delay of ~200ms (PELT HALFIFE =3D 32ms) for the scheduler to real= ize -that. This is known to affect workloads like gaming on mobile devices where -frames will drop due to slow response time to select the higher frequency -required for the tasks to finish their work in time. +On the other hand, a busy task for instance that requires to run at maximum +performance point will suffer a delay of ~200ms (PELT HALFIFE =3D 32ms) fo= r the +scheduler to realize that. This is known to affect workloads like gaming on +mobile devices where frames will drop due to slow response time to select = the +higher frequency required for the tasks to finish their work in time. =20 The overall visible effect goes beyond better perceived user experience/performance and stretches to help achieve a better overall @@ -101,11 +99,12 @@ when an RT task wakes up. This cost is unchanged by us= ing uclamp. Uclamp only helps picking what frequency to request instead of schedutil always reques= ting MAX for all RT tasks. =20 -See section 3.4 for default values and 3.4.1 on how to change RT tasks def= ault -value. +See :ref:`section 3.4 ` for default values and +:ref:`3.4.1 ` on how to change RT tasks +default value. =20 -2. DESIGN: -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +2. Design +=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 Util clamp is a property of every task in the system. It sets the boundari= es of its utilization signal; acting as a bias mechanism that influences certain @@ -123,10 +122,10 @@ which have implications on the utilization value at r= q level, which brings us to the main design challenge. =20 When a task wakes up on an rq, the utilization signal of the rq will be -impacted by the uclamp settings of all the tasks enqueued on it. For examp= le if +affected by the uclamp settings of all the tasks enqueued on it. For examp= le if a task requests to run at UTIL_MIN =3D 512, then the util signal of the rq= needs -to respect this request as well as all other requests from all of the enqu= eued -tasks. +to respect to this request as well as all other requests from all of the +enqueued tasks. =20 To be able to aggregate the util clamp value of all the tasks attached to = the rq, uclamp must do some housekeeping at every enqueue/dequeue, which is the @@ -138,19 +137,21 @@ The way this is handled is by dividing the utilizatio= n range into buckets (struct uclamp_bucket) which allows us to reduce the search space from eve= ry task on the rq to only a subset of tasks on the top-most bucket. =20 -When a task is enqueued, we increment a counter in the matching bucket. An= d on -dequeue we decrement it. This makes keeping track of the effective uclamp = value -at rq level a lot easier. +When a task is enqueued, the counter in the matching bucket is incremented, +and on dequeue it is decremented. This makes keeping track of the effective +uclamp value at rq level a lot easier. =20 -As we enqueue and dequeue tasks we keep track of the current effective ucl= amp -value of the rq. See section 2.1 for details on how this works. +As tasks are enqueued and dequeued, we keep track of the current effective +uclamp value of the rq. See :ref:`section 2.1 ` for detail= s on +how this works. =20 Later at any path that wants to identify the effective uclamp value of the= rq, it will simply need to read this effective uclamp value of the rq at that = exact moment of time it needs to take a decision. =20 For task placement case, only Energy Aware and Capacity Aware Scheduling -(EAS/CAS) make use of uclamp for now. This implies heterogeneous systems o= nly. +(EAS/CAS) make use of uclamp for now, which implies that it is applied on +heterogeneous systems only. When a task wakes up, the scheduler will look at the current effective ucl= amp value of every rq and compare it with the potential new value if the task = were to be enqueued there. Favoring the rq that will end up with the most energy @@ -159,17 +160,19 @@ efficient combination. Similarly in schedutil, when it needs to make a frequency update it will l= ook at the current effective uclamp value of the rq which is influenced by the= set of tasks currently enqueued there and select the appropriate frequency that -will honour uclamp requests. +will satisfy constraints from requests. =20 Other paths like setting overutilization state (which effectively disables= EAS) make use of uclamp as well. Such cases are considered necessary housekeepi= ng to allow the 2 main use cases above and will not be covered in detail here as= they could change with implementation details. =20 -2.1 BUCKETS: -------------- +.. _uclamp-buckets: =20 -.. code-block:: c +2.1. Buckets +------------ + +.. code-block:: =20 [struct rq] =20 @@ -189,7 +192,6 @@ could change with implementation details. =20 =20 .. note:: - DISCLAMER: The diagram above is an illustration rather than a true depiction of the internal data structure. =20 @@ -198,12 +200,11 @@ an rq as tasks are enqueued/dequeued, the whole utili= zation range is divided into N buckets where N is configured at compile time by setting CONFIG_UCLAMP_BUCKETS_COUNT. By default it is set to 5. =20 -The rq has a bucket for each uclamp_id: [UCLAMP_MIN, UCLAMP_MAX]. +The rq has a bucket for each uclamp_id tunables: [UCLAMP_MIN, UCLAMP_MAX]. =20 -The range of each bucket is 1024/N. For example for the default value of 5= we -will have 5 buckets, each of which will cover the following range: +The range of each bucket is 1024/N. For example, for the default value of = 5 there will be 5 buckets, each of which will cover the following range: =20 -.. code-block:: c +.. code-block:: =20 DELTA =3D round_closest(1024/5) =3D 204.8 =3D 205 =20 @@ -213,21 +214,21 @@ will have 5 buckets, each of which will cover the fol= lowing range: Bucket 3: [615:819] Bucket 4: [820:1024] =20 -When a task p +When a task p with following tunable parameters =20 .. code-block:: c =20 p->uclamp[UCLAMP_MIN] =3D 300 p->uclamp[UCLAMP_MAX] =3D 1024 =20 -is enqueued into the rq, Bucket 1 will be incremented for UCLAMP_MIN and B= ucket +is enqueued into the rq, bucket 1 will be incremented for UCLAMP_MIN and b= ucket 4 will be incremented for UCLAMP_MAX to reflect the fact the rq has a task= in this range. =20 The rq then keeps track of its current effective uclamp value for each uclamp_id. =20 -When a task p is enqueued, the rq value changes as follows: +When a task p is enqueued, the rq value changes to: =20 .. code-block:: c =20 @@ -235,7 +236,7 @@ When a task p is enqueued, the rq value changes as foll= ows: rq->uclamp[UCLAMP_MIN] =3D max(rq->uclamp[UCLAMP_MIN], p->uclamp[U= CLAMP_MIN]) // repeat for UCLAMP_MAX =20 -When a task is p dequeued the rq value changes as follows: +Similarly, when p is dequeued the rq value changes to: =20 .. code-block:: c =20 @@ -244,11 +245,11 @@ When a task is p dequeued the rq value changes as fol= lows: // repeat for UCLAMP_MAX =20 When all buckets are empty, the rq uclamp values are reset to system defau= lts. -See section 3.4 for default values. +See :ref:`section 3.4 ` for details on default valu= es. =20 =20 -2.2 MAX AGGREGATION: ---------------------- +2.2. Max aggregation +-------------------- =20 Util clamp is tuned to honour the request for the task that requires the highest performance point. @@ -268,19 +269,20 @@ values: p1->uclamp[UCLAMP_MIN] =3D 500 p1->uclamp[UCLAMP_MAX] =3D 500 =20 -then assuming both p0 and p1 are enqueued to the same rq +then assuming both p0 and p1 are enqueued to the same rq, both UCLAMP_MIN +and UCLAMP_MAX become: =20 .. code-block:: c =20 rq->uclamp[UCLAMP_MIN] =3D max(300, 500) =3D 500 rq->uclamp[UCLAMP_MAX] =3D max(900, 500) =3D 900 =20 -As we shall see in section 5.1, this max aggregation is the cause of one o= f the -limitations when using util clamp. Particularly for UCLAMP_MAX hint when u= ser -space would like to save power. +As we shall see in :ref:`section 5.1 `, this max +aggregation is the cause of one of limitations when using util clamp, in +particular for UCLAMP_MAX hint when user space would like to save power. =20 -2.3 HIERARCHICAL AGGREGATION: ------------------------------- +2.3. Hierarchial aggregation +---------------------------- =20 As stated earlier, util clamp is a property of every task in the system. B= ut the actual applied (effective) value can be influenced by more than just t= he @@ -293,80 +295,66 @@ The effective util clamp value of any task is restric= ted as follows: 2. The restricted value in (1) is then further restricted by the system = wide uclamp settings. =20 -Section 3 discusses the interfaces and will expand further on that. +:ref:`Section 3 ` discusses the interfaces and will exp= and further on that. =20 For now suffice to say that if a task makes a request, its actual effective value will have to adhere to some restrictions imposed by cgroup and system wide settings. =20 -The system will still accept the request even if effectively will look -different; but as soon as the task moves to a different cgroup or a sysadm= in -modifies the system settings, it'll be able to get what it wants if the new -settings allows it. +The system will still accept the request even if effectively will be +beyond the constraints, but as soon as the task moves to a different cgroup +or a sysadmin modifies the system settings, the request will be satisfied +only if it is within new constraints. =20 In other words, this aggregation will not cause an error when a task chang= es -its uclamp values. It just might not be able to achieve it based on those -factors. +its uclamp values, but rather the system may not be able to satisfy reques= ts +based on those factors. =20 2.4 Range: ----------- =20 -Uclamp performance request follow the utilization range: [0:1024] inclusiv= e. +Uclamp performance request has the range of 0 to 1024 inclusive. =20 -For cgroup interface percentage is used: [0:100] inclusive. -You can use 'max' instead of 100 like other cgroup interfaces. +For cgroup interface percentage is used (that is 0 to 100 inclusive). +Just like other cgroup interfaces, you can use 'max' instead of 100. =20 -3. INTERFACES: -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +.. _uclamp-interfaces: =20 -3.1 PER TASK INTERFACE: ------------------------- +3. Interfaces +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +3.1 Per-task interface +----------------------- =20 sched_setattr() syscall was extended to accept two new fields: =20 * sched_util_min: requests the minimum performance point the system should= run - at when this task is running. Or lower performance bound. + at when this task is running. Or lower performance bound. * sched_util_max: requests the maximum performance point the system should= run - at when this task is running. Or upper performance bound. + at when this task is running. Or upper performance bound. =20 -For example: +For example, the following scenario have 40% to 80% utilization constraint= s: =20 .. code-block:: c =20 attr->sched_util_min =3D 40% * 1024; attr->sched_util_max =3D 80% * 1024; =20 -Will tell the system that when task @p is running, it should try its best = to -ensure it starts at a performance point no less than 40% of maximum system= 's -capability. - -And if the task runs for a long enough time so that its actual utilization= goes -above 80%, then it should not cause the system to operate at a performance -point higher than that. +When task @p is running, the scheduler should try its best to ensure it st= arts +at 40% utilization. If the task runs for a long enough time so that its ac= tual +utilization goes above 80%, the utilization will be capped. =20 The special value -1 is used to reset the uclamp settings to the system default. =20 Note that resetting the uclamp value to system default using -1 is not the= same -as setting the uclamp value to system default. +as manually setting uclamp value to system default. This distinction is +important because as we shall see in system interfaces, the default value = for +RT could be changed. SCHED_NORMAL/OTHER might gain similar knobs too in the +future. =20 -.. code-block:: c - - attr->sched_util_min =3D -1 // p0 is reset to system default e.g:= 0 - -not the same as - -.. code-block:: c - - attr->sched_util_min =3D 0 // p0 is set to 0, the fact it is the= same - // as system default is irrelevant - -This distinction is important because as we shall see in system interfaces= , the -default value for RT could be changed. SCHED_NORMAL/OTHER might gain simil= ar -knobs too in the future. - -3.2 CGROUP INTERFACE: ----------------------- +3.2. cgroup interface +--------------------- =20 There are two uclamp related values in the CPU cgroup controller: =20 @@ -394,7 +382,7 @@ as follows: In a cgroup hierarchy, effective cpu.uclamp.max is the min of (child, parent). =20 -For example: +For example, given following parameters: =20 .. code-block:: c =20 @@ -410,7 +398,7 @@ For example: cgroup1->cpu.uclamp.min =3D 60% * 1024; cgroup1->cpu.uclamp.max =3D 100% * 1024; =20 -when p0 and p1 are attached to cgroup0 +when p0 and p1 are attached to cgroup0, the values become: =20 .. code-block:: c =20 @@ -420,7 +408,7 @@ when p0 and p1 are attached to cgroup0 p1->uclamp[UCLAMP_MIN] =3D 40% * 1024; // intact p1->uclamp[UCLAMP_MAX] =3D 50% * 1024; // intact =20 -when p0 and p1 are attached to cgroup1 +when p0 and p1 are attached to cgroup1, these instead become: =20 .. code-block:: c =20 @@ -433,49 +421,46 @@ when p0 and p1 are attached to cgroup1 Note that cgroup interfaces allows cpu.uclamp.max value to be lower than cpu.uclamp.min. Other interfaces don't allow that. =20 -3.3 SYSTEM INTERFACE: +3.3. System interface ---------------------- =20 -3.3.1 sched_util_clamp_min: ----------------------------- +3.3.1 sched_util_clamp_min +--------------------------- =20 -System wide limit of allowed UCLAMP_MIN range. By default set to 1024, whi= ch -means tasks are allowed to reach an effective UCLAMP_MIN value in the rang= e of -[0:1024]. +System wide limit of allowed UCLAMP_MIN range. By default it is set to 102= 4, +which means that permitted effective UCLAMP_MIN range for tasks is [0:1024= ]. +By changing it to 512 for example the range reduces to [0:512]. This is us= eful +to restrict how much boosting tasks are allowed to acquire. =20 -By changing it to 512 for example the effective allowed range reduces to -[0:512]. - -This is useful to restrict how much boosting tasks are allowed to acquire. - -Requests from tasks to go above this point will still succeed, but effecti= vely -they won't be achieved until this value is >=3D p->uclamp[UCLAMP_MIN]. +Requests from tasks to go above this knob value will still succeed, but +they won't be satisfied until it is more than p->uclamp[UCLAMP_MIN]. =20 The value must be smaller than or equal to sched_util_clamp_max. =20 -3.3.2 sched_util_clamp_max: ----------------------------- +3.3.2 sched_util_clamp_max +--------------------------- =20 -System wide limit of allowed UCLAMP_MAX range. By default set to 1024, whi= ch -means tasks are allowed to reach an effective UCLAMP_MAX value in the rang= e of -[0:1024]. +System wide limit of allowed UCLAMP_MAX range. By default it is set to 102= 4, +which means that permitted effective UCLAMP_MAX range for tasks is [0:1024= ]. =20 By changing it to 512 for example the effective allowed range reduces to -[0:512]. The visible impact of this is that no task can run above 512, whi= ch in -return means that all rqs are restricted too. IOW, the whole system is cap= ped -to half its performance capacity. +[0:512]. This means is that no task can run above 512, which implies that = all +rqs are restricted too. IOW, the whole system is capped to half its perfor= mance +capacity. =20 -This is useful to restrict the overall maximum performance point of the sy= stem. +This is useful to restrict the overall maximum performance point of the +system. For example, it can be handy to limit performance when running low= on +battery. =20 -Can be handy to limit performance when running low on battery. - -Requests from tasks to go above this point will still succeed, but effecti= vely -they won't be achieved until this value is >=3D p->uclamp[UCLAMP_MAX]. +Requests from tasks to go above this knob value will still succeed, but +they won't be satisfied until it is more than p->uclamp[UCLAMP_MAX]. =20 The value must be greater than or equal to sched_util_clamp_min. =20 -3.4 DEFAULT VALUES: ----------------------- +.. _uclamp-default-values: + +3.4. Default values +------------------- =20 By default all SCHED_NORMAL/SCHED_OTHER tasks are initialized to: =20 @@ -484,7 +469,7 @@ By default all SCHED_NORMAL/SCHED_OTHER tasks are initi= alized to: p_fair->uclamp[UCLAMP_MIN] =3D 0 p_fair->uclamp[UCLAMP_MAX] =3D 1024 =20 -That is no boosting or restriction on any task. These default values can't= be +That is, no boosting or restriction on any task. These default values can'= t be changed at boot or runtime. No argument was made yet as to why we should provide this, but can be added in the future. =20 @@ -495,33 +480,35 @@ For SCHED_FIFO/SCHED_RR tasks: p_rt->uclamp[UCLAMP_MIN] =3D 1024 p_rt->uclamp[UCLAMP_MAX] =3D 1024 =20 -That is by default they're boosted to run at the maximum performance point= of +That is, by default they're boosted to run at the maximum performance poin= t of the system which retains the historical behavior of the RT tasks. =20 RT tasks default uclamp_min value can be modified at boot or runtime via -sysctl. See section 3.4.1. +sysctl. See below section. + +.. _sched-util-clamp-min-rt-default: =20 3.4.1 sched_util_clamp_min_rt_default: --------------------------------------- =20 Running RT tasks at maximum performance point is expensive on battery powe= red -devices and not necessary. To allow system designers to offer good perform= ance -guarantees for RT tasks without pushing it all the way to maximum performa= nce +devices and not necessary. To allow system developer to offer good perform= ance +guarantees for these tasks without pushing it all the way to maximum perfo= rmance point, this sysctl knob allows tuning the best boost value to address the system requirement without burning power running at maximum performance po= int all the time. =20 -Application designers are encouraged to use the per task util clamp interf= ace +Application developer are encouraged to use the per task util clamp interf= ace to ensure they are performance and power aware. Ideally this knob should b= e set to 0 by system designers and leave the task of managing performance -requirements to the apps themselves. +requirements to the apps. =20 -4. HOW TO USE UTIL CLAMP: -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D +4. How to use util clamp +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 Util clamp promotes the concept of user space assisted power and performan= ce -management. At the scheduler level the info required to make the best deci= sion -are non existent. But with util clamp user space can hint to the scheduler= to +management. At the scheduler level there is no info required to make the b= est +decision. However, with util clamp user space can hint to the scheduler to make better decision about task placement and frequency selection. =20 Best results are achieved by not making any assumptions about the system t= he @@ -530,41 +517,41 @@ dynamically monitor and adjust. Ultimately this will = allow for a better user experience at a better perf/watt. =20 For some systems and use cases, static setup will help to achieve good res= ults. -Portability will be a problem in this case. After all how much work one ca= n do -at 100, 200 or 1024 is unknown and a special property of every system. Unl= ess -there's a specific target system, static setup should be avoided. +Portability will be a problem in this case. How much work one can do at 10= 0, +200 or 1024 is different for each system. Unless there's a specific target +system, static setup should be avoided. =20 -All in all there are enough possibilities to create a whole framework base= d on +There are enough possibilities to create a whole framework based on util clamp or self contained app that makes use of it directly. =20 -4.1 BOOST IMPORTANT AND DVFS-LATENCY-SENSITIVE TASKS: ------------------------------------------------------- +4.1. Boost important and DVFS-latency-sensitive tasks +----------------------------------------------------- =20 A GUI task might not be busy to warrant driving the frequency high when it -wakes up. But it requires to finish its work within a specific period of t= ime +wakes up. However, it requires to finish its work within a specific time w= indow to deliver the desired user experience. The right frequency it requires at wakeup will be system dependent. On some underpowered systems it will be h= igh, -on other overpowered ones, it will be low or 0. +on other overpowered ones it will be low or 0. =20 -This task can increase its UCLAMP_MIN value every time it misses a deadlin= e to -ensure on next wake up it runs at a higher performance point. It should tr= y to -approach the lowest UCLAMP_MIN value that allows to meet its deadline on a= ny +This task can increase its UCLAMP_MIN value every time it misses the deadl= ine +to ensure on next wake up it runs at a higher performance point. It should= try +to approach the lowest UCLAMP_MIN value that allows to meet its deadline o= n any particular system to achieve the best possible perf/watt for that system. =20 On heterogeneous systems, it might be important for this task to run on -a bigger CPU. +a faster CPU. =20 Generally it is advised to perceive the input as performance level or point which will imply both task placement and frequency selection. =20 -4.2 CAP BACKGROUND TASKS: --------------------------- +4.2. Cap background tasks +------------------------- =20 Like explained for Android case in the introduction. Any app can lower UCLAMP_MAX for some background tasks that don't care about performance but could end up being busy and consume unnecessary system resources on the sy= stem. =20 -4.3 POWERSAVE MODE: +4.3. Powersave mode -------------------- =20 sched_util_clamp_max system wide interface can be used to limit all tasks = =66rom @@ -575,8 +562,8 @@ This is not unique to uclamp as one can achieve the sam= e by reducing max frequency of the cpufreq governor. It can be considered a more convenient alternative interface. =20 -4.4 PER APP PERFORMANCE RESTRICTIONS: --------------------------------------- +4.4. Per-app performance restriction +------------------------------------- =20 Middleware/Utility can provide the user an option to set UCLAMP_MIN/MAX fo= r an app every time it is executed to guarantee a minimum performance point and= /or @@ -585,28 +572,31 @@ these apps. =20 If you want to prevent your laptop from heating up while on the go from compiling the kernel and happy to sacrifice performance to save power, but -still would like to keep your browser performance intact; uclamp enables t= hat. +still would like to keep your browser performance intact, uclamp makes it +possible. =20 -5. LIMITATIONS: -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +5. Limitations +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -5.1 CAPPING FREQUENCY WITH UCLAMP_MAX FAILS UNDER CERTAIN CONDITIONS: ----------------------------------------------------------------------- +.. _uclamp-capping-fail: =20 -If task p0 is capped to run at 512 +5.1. Capping frequency with uclamp_max fails under certain conditions +--------------------------------------------------------------------- + +If task p0 is capped to run at 512: =20 .. code-block:: c =20 p0->uclamp[UCLAMP_MAX] =3D 512 =20 -is sharing the rq with p1 which is free to run at any performance point +and it shares the rq with p1 which is free to run at any performance point: =20 .. code-block:: c =20 p1->uclamp[UCLAMP_MAX] =3D 1024 =20 then due to max aggregation the rq will be allowed to reach max performance -point +point: =20 .. code-block:: c =20 @@ -620,19 +610,19 @@ both are running at the same rq, p1 will cause the fr= equency capping to be left from the rq although p1, which is allowed to run at any performance point, doesn't actually need to run at that frequency. =20 -5.2 UCLAMP_MAX CAN BREAK PELT (UTIL_AVG) SIGNAL +5.2. UCLAMP_MAX can break pelt (util_avg) signal ------------------------------------------------ =20 PELT assumes that frequency will always increase as the signals grow to en= sure -there's always some idle time on the CPU. But with UCLAMP_MAX, we will pre= vent -this frequency increase which can lead to no idle time in some circumstanc= es. -When there's no idle time, then a task will look like a busy loop, which w= ould -result in util_avg being 1024. +there's always some idle time on the CPU. But with UCLAMP_MAX, this freque= ncy +increase will be prevented which can lead to no idle time in some +circumstances. When there's no idle time, a task will stuck in a busy loop, +which would result in util_avg being 1024. =20 -Combing with issue described in 5.2, this an lead to unwanted frequency sp= ikes +Combing with issue described below, this an lead to unwanted frequency spi= kes when severely capped tasks share the rq with a small non capped task. =20 -As an example if task p +As an example if task p, which have: =20 .. code-block:: c =20 @@ -646,35 +636,35 @@ of. =20 rq->uclamp[UCLAMP_MAX] =3D 0 =20 -If the ratio of Fmax/Fmin is 3, then +If the ratio of Fmax/Fmin is 3, then maximum value will be: =20 .. code-block:: c =20 300 * (Fmax/Fmin) =3D 900 =20 -Which indicates the CPU will still see idle time since 900 is < 1024. The -_actual_ util_avg will NOT be 900 though. It will be higher than 300, but = won't -approach 900. As long as there's idle time, p->util_avg updates will be of= f by -a some margin, but not proportional to Fmax/Fmin. +which indicates the CPU will still see idle time since 900 is < 1024. The +_actual_ util_avg will not be 900 though, but somewhere between 300 and 90= 0. As +long as there's idle time, p->util_avg updates will be off by a some margi= n, +but not proportional to Fmax/Fmin. =20 .. code-block:: c =20 p0->util_avg =3D 300 + small_error =20 -Now if the ratio of Fmax/Fmin is 4, then +Now if the ratio of Fmax/Fmin is 4, the maximum value becomes: =20 .. code-block:: c =20 300 * (Fmax/Fmin) =3D 1200 =20 which is higher than 1024 and indicates that the CPU has no idle time. When -this happens, then the _actual_ util_avg will become 1024. +this happens, then the _actual_ util_avg will become: =20 .. code-block:: c =20 p0->util_avg =3D 1024 =20 -If task p1 wakes up on this CPU +If task p1 wakes up on this CPU, which have: =20 .. code-block:: c =20 @@ -683,7 +673,7 @@ If task p1 wakes up on this CPU =20 then the effective UCLAMP_MAX for the CPU will be 1024 according to max aggregation rule. But since the capped p0 task was running and throttled -severely, then the rq->util_avg will be 1024. +severely, then the rq->util_avg will be: =20 .. code-block:: c =20 @@ -693,7 +683,7 @@ severely, then the rq->util_avg will be 1024. rq->util_avg =3D 1024 rq->uclamp[UCLAMP_MAX] =3D 1024 =20 -Hence lead to a frequency spike since if p0 wasn't throttled we should get +Hence lead to a frequency spike since if p0 wasn't throttled we should get: =20 .. code-block:: c =20 Thanks. [1]: https://lore.kernel.org/lkml/20221113152629.3wbyeejsj5v33rvu@airbuntu/ --=20 An old man doll... just what I always wanted! - Clara --GLa0IuTWsGzJ3iko Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSSYQ6Cy7oyFNCHrUH2uYlJVVFOowUCY3II4wAKCRD2uYlJVVFO oyrKAQCtyOES98ZjGBXvk039yyVlMUyHz8zIIUQ1b0ZdIpHxJQD8ChCbVlzhRmmI y1fLXHZ05FipXidZiwnDLA4hUvo10Q8= =iGqW -----END PGP SIGNATURE----- --GLa0IuTWsGzJ3iko--