* [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext @ 2024-01-26 21:59 David Vernet 2024-01-29 22:41 ` Joel Fernandes 2024-02-19 8:48 ` Muhammad Usama Anjum 0 siblings, 2 replies; 9+ messages in thread From: David Vernet @ 2024-01-26 21:59 UTC (permalink / raw) To: lsf-pc Cc: bpf, joel, htejun, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya [-- Attachment #1: Type: text/plain, Size: 1732 bytes --] Hello, A few more use cases have emerged for sched_ext that are not yet supported that I wanted to discuss in the BPF track. Specifically: - EAS: Energy Aware Scheduling While firmware ultimately controls the frequency of a core, the kernel does provide frequency scaling knobs such as EPP. It could be useful for BPF schedulers to have control over these knobs to e.g. hint that certain cores should keep a lower frequency and operate as E cores. This could have applications in battery-aware devices, or in other contexts where applications have e.g. latency-sensitive compute-intensive workloads. - Componentized schedulers Scheduler implementations today largely have to reinvent the wheel. For example, if you want to implement a load balancer in rust, you need to add the necessary fields to the BPF program for tracking load / duty cycle, and then parse and consume them from the rust side. That's pretty suboptimal though, as the actual load balancing algorithm itself is essentially the exact same. The challenge here is that the feature requires both BPF and user space components to work together. It's not enough to ship a rust crate -- you need to also ship a BPF object file that your program can link against. And what should the API look like on both ends? Should rust / BPF have to call into functions to get load balancing? Or should it be automatically packaged and implemented? There are a lot of ways that we can approach this, and it probably warrants discussing in some more detail. If anybody else has ideas on things they'd like to discuss; either sched_ext features that are missing, or scheduling ideas that we could try to implement but just haven't yet, please feel free to share. Thanks, David [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext 2024-01-26 21:59 [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext David Vernet @ 2024-01-29 22:41 ` Joel Fernandes 2024-01-29 22:42 ` Joel Fernandes 2024-02-19 8:48 ` Muhammad Usama Anjum 1 sibling, 1 reply; 9+ messages in thread From: Joel Fernandes @ 2024-01-29 22:41 UTC (permalink / raw) To: David Vernet, lsf-pc Cc: bpf, htejun, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya On 1/26/2024 4:59 PM, David Vernet wrote: > Hello, > > A few more use cases have emerged for sched_ext that are not yet > supported that I wanted to discuss in the BPF track. Specifically: > > - EAS: Energy Aware Scheduling > > While firmware ultimately controls the frequency of a core, the kernel > does provide frequency scaling knobs such as EPP. It could be useful for > BPF schedulers to have control over these knobs to e.g. hint that > certain cores should keep a lower frequency and operate as E cores. > This could have applications in battery-aware devices, or in other > contexts where applications have e.g. latency-sensitive > compute-intensive workloads. This is a great topic. I think integrating/merging such mechanism with the NEST scheduler could be useful too? You mentioned there is sched_ext implementation of NEST already? One reason that's interesting to me is the task-packing and less-spreading may have power benefits, this is exactly what EAS on ARM does, but it also uses an energy model to know when packing is a bad idea. Since we don't have fine grained control of frequency on Intel, I wonder what else can we do to know when the scheduler should pack and when to spread. Maybe something simple which does not need an energy model but packs based on some other signal/heuristic would be great in the short term. Maybe a signal can be the "Quality of service (QoS)" approach where tasks with lower QoS are packed more aggressively and higher QoS are spread more (?). > > - Componentized schedulers > > Scheduler implementations today largely have to reinvent the wheel. For > example, if you want to implement a load balancer in rust, you need to > add the necessary fields to the BPF program for tracking load / duty > cycle, and then parse and consume them from the rust side. That's pretty > suboptimal though, as the actual load balancing algorithm itself is > essentially the exact same. The challenge here is that the feature > requires both BPF and user space components to work together. It's not > enough to ship a rust crate -- you need to also ship a BPF object file Maybe I am confused but why does rust userspace code need to link to BPF objects? The BPF object is loaded into the kernel right? > that your program can link against. And what should the API look like on > both ends? Should rust / BPF have to call into functions to get load > balancing? Or should it be automatically packaged and implemented? > > There are a lot of ways that we can approach this, and it probably > warrants discussing in some more detail But I get the gist of the issue, would be interesting to discuss. thanks, - Joel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext 2024-01-29 22:41 ` Joel Fernandes @ 2024-01-29 22:42 ` Joel Fernandes 2024-01-30 0:15 ` David Vernet 2024-01-30 1:50 ` Tejun Heo 0 siblings, 2 replies; 9+ messages in thread From: Joel Fernandes @ 2024-01-29 22:42 UTC (permalink / raw) To: David Vernet, lsf-pc, Tejun Heo Cc: bpf, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya Tejun's address bounced so I am adding the correct one. Thanks. On 1/29/2024 5:41 PM, Joel Fernandes wrote: > > > On 1/26/2024 4:59 PM, David Vernet wrote: >> Hello, >> >> A few more use cases have emerged for sched_ext that are not yet >> supported that I wanted to discuss in the BPF track. Specifically: >> >> - EAS: Energy Aware Scheduling >> >> While firmware ultimately controls the frequency of a core, the kernel >> does provide frequency scaling knobs such as EPP. It could be useful for >> BPF schedulers to have control over these knobs to e.g. hint that >> certain cores should keep a lower frequency and operate as E cores. >> This could have applications in battery-aware devices, or in other >> contexts where applications have e.g. latency-sensitive >> compute-intensive workloads. > > This is a great topic. I think integrating/merging such mechanism with the NEST > scheduler could be useful too? You mentioned there is sched_ext implementation > of NEST already? One reason that's interesting to me is the task-packing and > less-spreading may have power benefits, this is exactly what EAS on ARM does, > but it also uses an energy model to know when packing is a bad idea. Since we > don't have fine grained control of frequency on Intel, I wonder what else can we > do to know when the scheduler should pack and when to spread. Maybe something > simple which does not need an energy model but packs based on some other > signal/heuristic would be great in the short term. > > Maybe a signal can be the "Quality of service (QoS)" approach where tasks with > lower QoS are packed more aggressively and higher QoS are spread more (?). > >> >> - Componentized schedulers >> >> Scheduler implementations today largely have to reinvent the wheel. For >> example, if you want to implement a load balancer in rust, you need to >> add the necessary fields to the BPF program for tracking load / duty >> cycle, and then parse and consume them from the rust side. That's pretty >> suboptimal though, as the actual load balancing algorithm itself is >> essentially the exact same. The challenge here is that the feature >> requires both BPF and user space components to work together. It's not >> enough to ship a rust crate -- you need to also ship a BPF object file > > Maybe I am confused but why does rust userspace code need to link to BPF > objects? The BPF object is loaded into the kernel right? > >> that your program can link against. And what should the API look like on >> both ends? Should rust / BPF have to call into functions to get load >> balancing? Or should it be automatically packaged and implemented? >> >> There are a lot of ways that we can approach this, and it probably >> warrants discussing in some more detail > > But I get the gist of the issue, would be interesting to discuss. > > thanks, > > - Joel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext 2024-01-29 22:42 ` Joel Fernandes @ 2024-01-30 0:15 ` David Vernet 2024-01-30 1:50 ` Tejun Heo 1 sibling, 0 replies; 9+ messages in thread From: David Vernet @ 2024-01-30 0:15 UTC (permalink / raw) To: Joel Fernandes Cc: lsf-pc, Tejun Heo, bpf, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya [-- Attachment #1: Type: text/plain, Size: 8531 bytes --] On Mon, Jan 29, 2024 at 05:42:54PM -0500, Joel Fernandes wrote: > Tejun's address bounced so I am adding the correct one. Thanks. Ah, thanks, my mistake. > > On 1/29/2024 5:41 PM, Joel Fernandes wrote: > > > > > > On 1/26/2024 4:59 PM, David Vernet wrote: > >> Hello, > >> > >> A few more use cases have emerged for sched_ext that are not yet > >> supported that I wanted to discuss in the BPF track. Specifically: > >> > >> - EAS: Energy Aware Scheduling > >> > >> While firmware ultimately controls the frequency of a core, the kernel > >> does provide frequency scaling knobs such as EPP. It could be useful for > >> BPF schedulers to have control over these knobs to e.g. hint that > >> certain cores should keep a lower frequency and operate as E cores. > >> This could have applications in battery-aware devices, or in other > >> contexts where applications have e.g. latency-sensitive > >> compute-intensive workloads. > > > > This is a great topic. I think integrating/merging such mechanism with the NEST > > scheduler could be useful too? You mentioned there is sched_ext implementation > > of NEST already? One reason that's interesting to me is the task-packing and Correct -- it's called scx_nest [0]. [0]: https://github.com/sched-ext/scx/blob/main/scheds/c/scx_nest.bpf.c > > less-spreading may have power benefits, this is exactly what EAS on ARM does, > > but it also uses an energy model to know when packing is a bad idea. Since we > > don't have fine grained control of frequency on Intel, I wonder what else can we > > do to know when the scheduler should pack and when to spread. Maybe something > > simple which does not need an energy model but packs based on some other > > signal/heuristic would be great in the short term. Makes sense. What kinds of signals were you thinking? We can have user space query for whatever we'd need, and then communicate that to the kernel via shared maps. Or probably even more ideal, if we could get the information we need from tracepoints or kprobes, then we could possibly avoid having to deal with that and just keep everything in the kernel. Note that we don't have to necessarily just track public APIs if we did all of this in the kernel. If we can access a struct in a tracepoint or a kprobe, we can read from it, and use that in the scheduler however we want. Of course, none of this comes with any kind of ABI stability guarantees, but that's one of the features of sched_ext: because the actual scheduler itself is a _kernel_ program that runs in kernel space, we can experiment with and implement things without tying anyone's hands to fully supporting it in the kernel forever. The user space portion communicates with the BPF scheduler over maps that are UAPI (part of BPF UAPI), but the actual scheduler itself is just a kernel program, and therefore is free to interact with the rest of the system without making anything UAPI or adding ABI stability requirements. The contents of what's passed over those maps are not UAPI, in the same manner that the contents sent over the communication channels setup by KVM per your other thread [1] would not be UAPI. [1]: https://lore.kernel.org/all/653c2448-614e-48d6-af31-c5920d688f3e@joelfernandes.org/ > > Maybe a signal can be the "Quality of service (QoS)" approach where tasks with > > lower QoS are packed more aggressively and higher QoS are spread more (?). > > > >> > >> - Componentized schedulers > >> > >> Scheduler implementations today largely have to reinvent the wheel. For > >> example, if you want to implement a load balancer in rust, you need to > >> add the necessary fields to the BPF program for tracking load / duty > >> cycle, and then parse and consume them from the rust side. That's pretty > >> suboptimal though, as the actual load balancing algorithm itself is > >> essentially the exact same. The challenge here is that the feature > >> requires both BPF and user space components to work together. It's not > >> enough to ship a rust crate -- you need to also ship a BPF object file > > > > Maybe I am confused but why does rust userspace code need to link to BPF > > objects? The BPF object is loaded into the kernel right? So there are a few pieces at play here: 1. You're correct that the BPF program is loaded into kernel space, but the actual BPF bytecode itself is linked statically into the application, and the application is what actually makes the syscalls (via libbpf) to load the BPF program into the kernel. Here's a high-level overview of the workflow for loading a scheduler: - Open the scheduler: This involves libbpf parsing the BPF object file passed by the application, and discovering its maps, progs, etc which should be created. At this phase user space can still update any maps in the program, including e.g. read-only maps such as .rodata. This allows user space to do things like set the max # of CPUs on the system, set debug flags if they were requested by the user, etc. - Load the scheduler: Libbpf creates BPF maps, does relocations for CO-RE [2], and verifies and loads the scheduler into the kernel. At this point, the program is loaded into the kernel, but the scheduler is not actively running yet. User space can no longer write read-only maps in the BPF program, but it can still read and write _writeable_ maps, and it can in fact do so indefinitely throughout the runtime of the scheduler. As described below, this is why we need both a user space and a BPF object file portion for such features. - Attach the scheduler: This actually calls into ext.c to update the currently running scheduler to use the BPF sched_ext scheduler. [2]: https://nakryiko.com/posts/bpf-core-reference-guide/ 2. As alluded to above, the user space program that loaded the scheduler can interact with the scheduler in real time by reading and writing to its writeable maps. This allows user space to e.g. read some procfs values to determine utilization for each core in the system, do some load balancing math with floating point numbers basad on that data and on task weight / duty cycle, and then notify the BPF scheduler that is should migrate tasks by writing to shared maps. This is exactly what we do in scx_rusty [3]. We track duty cycles and load in kernel space (soon we'll only track duty cycles and do all load scaling in user space), and then periodically we'll do a load balancing pass in the user-space portion of the scheduler where we read those values, use floats, and then signal to the kernel if and where it should migrate tasks by writing to maps. This is all done async from the perspective of the kernel, so the kernel will check the maps to see if there's an update on e.g. enqueue paths. [3]: https://github.com/sched-ext/scx/tree/main/scheds/rust/scx_rusty/src So to summarize -- the rust portion isn't running in the kernel, but it is influencing the kernel scheduler's decisions by communicating with it via these shared maps (and the kernel can similarly communicate with user space in the opposite direction). That's the reason that it needs to have both the user space portion and the kernel portion available to implement these features. Neither makes sense without the other. Note that not every scheduler we've implemented has a robust user space portion, but every scheduler does have _some_ user space counterpart which is responsible for loading it. scx_nest.c [4], for example, doesn't really do anything in user space other than periodically print out some data that's exported to it from the kernel scheduler via a shared map. If we wanted to add user-space load balancing to scx_nest, the same requirements would apply as for schedulers with a rust user-space component: we'd need both a user space portion, and a kernel-space portion. [4]: https://github.com/sched-ext/scx/blob/main/scheds/c/scx_nest.c#L195 > >> that your program can link against. And what should the API look like on > >> both ends? Should rust / BPF have to call into functions to get load > >> balancing? Or should it be automatically packaged and implemented? > >> > >> There are a lot of ways that we can approach this, and it probably > >> warrants discussing in some more detail > > > > But I get the gist of the issue, would be interesting to discuss. Sounds great, thanks for reading this over. - David [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext 2024-01-29 22:42 ` Joel Fernandes 2024-01-30 0:15 ` David Vernet @ 2024-01-30 1:50 ` Tejun Heo 2024-02-19 9:25 ` Joel Fernandes 1 sibling, 1 reply; 9+ messages in thread From: Tejun Heo @ 2024-01-30 1:50 UTC (permalink / raw) To: Joel Fernandes Cc: David Vernet, lsf-pc, bpf, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya Hello, Joel. On Mon, Jan 29, 2024 at 05:42:54PM -0500, Joel Fernandes wrote: > > This is a great topic. I think integrating/merging such mechanism with the NEST > > scheduler could be useful too? You mentioned there is sched_ext implementation > > of NEST already? One reason that's interesting to me is the task-packing and > > less-spreading may have power benefits, this is exactly what EAS on ARM does, > > but it also uses an energy model to know when packing is a bad idea. Since we > > don't have fine grained control of frequency on Intel, I wonder what else can we > > do to know when the scheduler should pack and when to spread. Maybe something > > simple which does not need an energy model but packs based on some other > > signal/heuristic would be great in the short term. > > > > Maybe a signal can be the "Quality of service (QoS)" approach where tasks with > > lower QoS are packed more aggressively and higher QoS are spread more (?). This was done for a different purpose (improving tail latencies on latency critical workload) but it uses soft-affinity based packing which maybe can translate to power-aware scheduling: https://github.com/sched-ext/scx/blob/case-studies/case-studies/scx_layered.md I have a raptor lake-H laptop which has E and P cores and by default the threads are being spread across all CPUs which probably isn't best for power consumption. I was thinking about writing a scheduler which uses a similar strategy as scx_layered - pack the cores one by one overflowing to the next core from E to P when the average utilization crosses a set threshold. Most of the logic is already in scx_layered, so maybe it can just be a part of that. I'm curious whether whether and how much power can be saved with a generic approach like that. Thanks. -- tejun ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext 2024-01-30 1:50 ` Tejun Heo @ 2024-02-19 9:25 ` Joel Fernandes 0 siblings, 0 replies; 9+ messages in thread From: Joel Fernandes @ 2024-02-19 9:25 UTC (permalink / raw) To: Tejun Heo Cc: David Vernet, lsf-pc, bpf, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya On 1/29/2024 8:50 PM, Tejun Heo wrote:> On Mon, Jan 29, 2024 at 05:42:54PM -0500, Joel Fernandes wrote: >>> This is a great topic. I think integrating/merging such mechanism with the NEST >>> scheduler could be useful too? You mentioned there is sched_ext implementation >>> of NEST already? One reason that's interesting to me is the task-packing and >>> less-spreading may have power benefits, this is exactly what EAS on ARM does, >>> but it also uses an energy model to know when packing is a bad idea. Since we >>> don't have fine grained control of frequency on Intel, I wonder what else can we >>> do to know when the scheduler should pack and when to spread. Maybe something >>> simple which does not need an energy model but packs based on some other >>> signal/heuristic would be great in the short term. >>> >>> Maybe a signal can be the "Quality of service (QoS)" approach where tasks with >>> lower QoS are packed more aggressively and higher QoS are spread more (?). > > This was done for a different purpose (improving tail latencies on latency > critical workload) but it uses soft-affinity based packing which maybe can > translate to power-aware scheduling: > > https://github.com/sched-ext/scx/blob/case-studies/case-studies/scx_layered.md Thanks! I am looking more into this (scx_layered) for the latency benefits as well. David kindly gave me an introduction to it last week. It seems quite similar to our approach with using RT (round-robin) for the higher tier (that is have a higher tier of tasks that are fair scheduled over a lower one). There is the issue of starvation though (a higher tier/layer starves a lower one), so we're incorporating the DL server to help with that: https://lore.kernel.org/all/cover.1699095159.git.bristot@kernel.org/ https://lore.kernel.org/all/20240216183108.1564958-1-joel@joelfernandes.org/ Interesting on the soft-affinity feature and yeah that help save power and might be a better approach than say our usage of RT. > I have a raptor lake-H laptop which has E and P cores and by default the > threads are being spread across all CPUs which probably isn't best for power > consumption. I was thinking about writing a scheduler which uses a similar > strategy as scx_layered - pack the cores one by one overflowing to the next > core from E to P when the average utilization crosses a set threshold. Most > of the logic is already in scx_layered, so maybe it can just be a part of > that. I'm curious whether whether and how much power can be saved with a > generic approach like that. Can the scx NEST scheduler be reused for this? AFAIR, it does similar task packing. Though that is to keep more cores idle than to pack tasks to a certain type of core, if I remember Julia's presentation. thanks, - Joel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext 2024-01-26 21:59 [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext David Vernet 2024-01-29 22:41 ` Joel Fernandes @ 2024-02-19 8:48 ` Muhammad Usama Anjum 2024-02-19 9:11 ` Joel Fernandes 1 sibling, 1 reply; 9+ messages in thread From: Muhammad Usama Anjum @ 2024-02-19 8:48 UTC (permalink / raw) To: David Vernet, lsf-pc Cc: bpf, joel, htejun, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya On Fri, 2024-01-26 at 15:59 -0600, David Vernet wrote: > Hello, > > A few more use cases have emerged for sched_ext that are not yet > supported that I wanted to discuss in the BPF track. Specifically: > > - EAS: Energy Aware Scheduling > > While firmware ultimately controls the frequency of a core, the kernel > does provide frequency scaling knobs such as EPP. It could be useful for > BPF schedulers to have control over these knobs to e.g. hint that > certain cores should keep a lower frequency and operate as E cores. > This could have applications in battery-aware devices, or in other > contexts where applications have e.g. latency-sensitive > compute-intensive workloads. The current scheduler must already be using the frequency scaling knobs. Can sched_ext use those knobs directly with hint from userspace easily? > > - Componentized schedulers > > Scheduler implementations today largely have to reinvent the wheel. For > example, if you want to implement a load balancer in rust, you need to > add the necessary fields to the BPF program for tracking load / duty > cycle, and then parse and consume them from the rust side. That's pretty > suboptimal though, as the actual load balancing algorithm itself is > essentially the exact same. The challenge here is that the feature > requires both BPF and user space components to work together. It's not > enough to ship a rust crate -- you need to also ship a BPF object file > that your program can link against. And what should the API look like on > both ends? Should rust / BPF have to call into functions to get load > balancing? Or should it be automatically packaged and implemented? This seems like a really nice idea. If we build a kind of library where different components of a schedule are already available, the researchers can just focus on one component and improve it. This could bring long term benefits to schedulers based on sched_ext. This flexibility wasn't possible before for the scheduler. > > There are a lot of ways that we can approach this, and it probably > warrants discussing in some more detail. > > If anybody else has ideas on things they'd like to discuss; either > sched_ext features that are missing, or scheduling ideas that we could > try to implement but just haven't yet, please feel free to share. > > Thanks, > David ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext 2024-02-19 8:48 ` Muhammad Usama Anjum @ 2024-02-19 9:11 ` Joel Fernandes 2024-02-19 9:14 ` Joel Fernandes 0 siblings, 1 reply; 9+ messages in thread From: Joel Fernandes @ 2024-02-19 9:11 UTC (permalink / raw) To: Muhammad Usama Anjum, David Vernet, lsf-pc Cc: bpf, htejun, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya On 2/19/2024 3:48 AM, Muhammad Usama Anjum wrote: > On Fri, 2024-01-26 at 15:59 -0600, David Vernet wrote: >> Hello, >> >> A few more use cases have emerged for sched_ext that are not yet >> supported that I wanted to discuss in the BPF track. Specifically: >> >> - EAS: Energy Aware Scheduling >> >> While firmware ultimately controls the frequency of a core, the kernel >> does provide frequency scaling knobs such as EPP. It could be useful for >> BPF schedulers to have control over these knobs to e.g. hint that >> certain cores should keep a lower frequency and operate as E cores. >> This could have applications in battery-aware devices, or in other >> contexts where applications have e.g. latency-sensitive >> compute-intensive workloads. > The current scheduler must already be using the frequency scaling > knobs. Can sched_ext use those knobs directly with hint from userspace > easily? With regards to the current way of doing things, it depends. On Intel platforms, if HWP is enabled (Hardware-Controlled Performance States) which it is on almost all Intel platforms I've seen, then the selection of the individual Performance states (P-states) is done by the hardware, not the OS. My understanding is the benefit of HWP is responsiveness of the state selection. So the only thing OS can control then is either Turbo boost, or EPP. Unfortunately, this hinders using an energy model and doing energy calculations (ex. If I place shit on this core instead of that, then the total system power is such and such because P-state on this core is this) the way EAS on ARM does. But maybe we can do something simple with what is available and reap some benefits. On ARM platforms, there is more finer grained OS control of different operating performance points (what they call OPP). Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext 2024-02-19 9:11 ` Joel Fernandes @ 2024-02-19 9:14 ` Joel Fernandes 0 siblings, 0 replies; 9+ messages in thread From: Joel Fernandes @ 2024-02-19 9:14 UTC (permalink / raw) To: Muhammad Usama Anjum, David Vernet, lsf-pc, Tejun Heo Cc: bpf, schatzberg.dan, andrea.righi, davemarchevsky, changwoo, julia.lawall, himadrispandya Fixing with Tejun's correct email address again. ;-) On 2/19/2024 4:11 AM, Joel Fernandes wrote: > > > On 2/19/2024 3:48 AM, Muhammad Usama Anjum wrote: >> On Fri, 2024-01-26 at 15:59 -0600, David Vernet wrote: >>> Hello, >>> >>> A few more use cases have emerged for sched_ext that are not yet >>> supported that I wanted to discuss in the BPF track. Specifically: >>> >>> - EAS: Energy Aware Scheduling >>> >>> While firmware ultimately controls the frequency of a core, the kernel >>> does provide frequency scaling knobs such as EPP. It could be useful for >>> BPF schedulers to have control over these knobs to e.g. hint that >>> certain cores should keep a lower frequency and operate as E cores. >>> This could have applications in battery-aware devices, or in other >>> contexts where applications have e.g. latency-sensitive >>> compute-intensive workloads. >> The current scheduler must already be using the frequency scaling >> knobs. Can sched_ext use those knobs directly with hint from userspace >> easily? > > With regards to the current way of doing things, it depends. On Intel platforms, > if HWP is enabled (Hardware-Controlled Performance States) which it is on almost > all Intel platforms I've seen, then the selection of the individual Performance > states (P-states) is done by the hardware, not the OS. My understanding is the > benefit of HWP is responsiveness of the state selection. So the only thing OS > can control then is either Turbo boost, or EPP. Unfortunately, this hinders > using an energy model and doing energy calculations (ex. If I place shit on this > core instead of that, then the total system power is such and such because > P-state on this core is this) the way EAS on ARM does. But maybe we can do > something simple with what is available and reap some benefits. > > On ARM platforms, there is more finer grained OS control of different operating > performance points (what they call OPP). > > Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-02-19 9:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-26 21:59 [LSF/MM/BPF TOPIC] Discuss more features + use cases for sched_ext David Vernet 2024-01-29 22:41 ` Joel Fernandes 2024-01-29 22:42 ` Joel Fernandes 2024-01-30 0:15 ` David Vernet 2024-01-30 1:50 ` Tejun Heo 2024-02-19 9:25 ` Joel Fernandes 2024-02-19 8:48 ` Muhammad Usama Anjum 2024-02-19 9:11 ` Joel Fernandes 2024-02-19 9:14 ` Joel Fernandes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox