* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl @ 2025-01-28 23:09 ` Cristian Prundeanu 2025-02-11 3:27 ` K Prateek Nayak 2025-02-12 5:36 ` [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY " Cristian Prundeanu 0 siblings, 2 replies; 27+ messages in thread From: Cristian Prundeanu @ 2025-01-28 23:09 UTC (permalink / raw) To: Peter Zijlstra Cc: cpru, kprateek.nayak, abuehaze, alisaidi, benh, blakgeof, csabac, doebel, gautham.shenoy, joseph.salisbury, dietmar.eggemann, linux-arm-kernel, linux-kernel, linux-tip-commits, mingo, x86, torvalds, bp Peter, Thank you for the recent scheduler rework which went into kernel 6.13. Here are the latest test results using mysql+hammerdb, using a standalone reproducer (details and instructions below). Kernel | Runtime | Throughput | P50 latency aarm64 | parameters | (NOPM) | (larger is worse) -------+--------------+------------+------------------ 6.5 | default | baseline | baseline -------+--------------+------------+------------------ 6.8 | default | -6.9% | +7.9% | NO_PL NO_RTP | -1% | +1% | SCHED_BATCH | -9% | +10.7% -------+--------------+------------+------------------ 6.12 | default | -5.5% | +6.2% | NO_PL NO_RTP | -0.4% | +0.1% | SCHED_BATCH | -4.1% | +4.9% -------+--------------+------------+------------------ 6.13 | default | -4.8% | +5.4% | NO_PL NO_RTP | -0.3% | +0.01% | SCHED_BATCH | -4.8% | +5.4% -------+--------------+------------+------------------ A performance improvement is noticeable in kernel 6.13 over 6.12, both in latency and throughput. At the same time, SCHED_BATCH no longer has the same positive effect it had in 6.12. Disabling PLACE_LAG and RUN_TO_PARITY is still as effective as before. For this reason, I'd like to ask once again that this patch set be considered for merging and for backporting to kernels 6.6+. > This patchset disables the scheduler features PLACE_LAG and RUN_TO_PARITY > and moves them to sysctl. > > Replacing CFS with the EEVDF scheduler in kernel 6.6 introduced > significant performance degradation in multiple database-oriented > workloads. This degradation manifests in all kernel versions using EEVDF, > across multiple Linux distributions, hardware architectures (x86_64, > aarm64, amd64), and CPU generations. When weighing the relevance of various testing approaches, please keep in mind that mysql is a real-life workload, while the test which prompted the introduction of PLACE_LAG is much closer to a synthetic benchmark. Instructions for reproducing the above tests: 1. Code: The repro scenario that was used for this round of testing can be found here: https://github.com/aws/repro-collection 2. Setup: I used a 16 vCPU / 32G RAM / 1TB RAID0 SSD instance as SUT, running Ubuntu 22.04 with the latest updates. All kernels were compiled from source, preserving the same config (as much as possible) to minimize noise - in particular, CONFIG_HZ=250 was used everywhere. 3. Running: To run the repro, set up a SUT machine and a LDG (loadgen) machine on the same network, clone the git repo on both, and run: (on the SUT) ./repro.sh repro-mysql-EEVDF-regression SUT --ldg=<loadgen_IP> (on the LDG) ./repro.sh repro-mysql-EEVDF-regression LDG --sut=<SUT_IP> The repro will build and test multiple combinations of kernel versions and scheduler settings, and will prompt you when to reboot the SUT and rerun the same command to continue the process. More instructions can be found both in the repo's README and by running 'repro.sh --help'. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2025-01-28 23:09 ` [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl Cristian Prundeanu @ 2025-02-11 3:27 ` K Prateek Nayak 2025-02-12 5:41 ` Cristian Prundeanu 2025-02-12 5:36 ` [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY " Cristian Prundeanu 1 sibling, 1 reply; 27+ messages in thread From: K Prateek Nayak @ 2025-02-11 3:27 UTC (permalink / raw) To: Cristian Prundeanu, Peter Zijlstra Cc: abuehaze, alisaidi, benh, blakgeof, csabac, doebel, gautham.shenoy, joseph.salisbury, dietmar.eggemann, linux-arm-kernel, linux-kernel, linux-tip-commits, mingo, x86, torvalds, bp Hello Christian, Sorry for the delay in response. I'll leave some analysis from my side below. On 1/29/2025 4:39 AM, Cristian Prundeanu wrote: > Peter, > > Thank you for the recent scheduler rework which went into kernel 6.13. > Here are the latest test results using mysql+hammerdb, using a standalone > reproducer (details and instructions below). > > Kernel | Runtime | Throughput | P50 latency > aarm64 | parameters | (NOPM) | (larger is worse) > -------+--------------+------------+------------------ > 6.5 | default | baseline | baseline > -------+--------------+------------+------------------ > 6.8 | default | -6.9% | +7.9% > | NO_PL NO_RTP | -1% | +1% > | SCHED_BATCH | -9% | +10.7% > -------+--------------+------------+------------------ > 6.12 | default | -5.5% | +6.2% > | NO_PL NO_RTP | -0.4% | +0.1% > | SCHED_BATCH | -4.1% | +4.9% > -------+--------------+------------+------------------ > 6.13 | default | -4.8% | +5.4% > | NO_PL NO_RTP | -0.3% | +0.01% > | SCHED_BATCH | -4.8% | +5.4% > -------+--------------+------------+------------------ Thank you for the reproducer. I haven't tried it yet (in part due to the slightly scary "Assumptions" section) but I managed to find a HammerDB test bench internally that I modified to match the configuration from the repro you shared. Testing methodology is slightly different - the scripts pins mysqld to the CPUs on the first socket and the HammerDB clients on the second and measures the throughput (It only reports throughput out of the box; I'll see if I can get it to report Latency numbers as well. With that out of the way, these were the preliminary results: %diff v6.14-rc1 baseline v6.5.0 (pre-EEVDF) -0.95% v6.14-rc1 + NO_PL + NO_RTP +6.06% So I had myself a reproducer. Looking at the data from "perf sched stats" [1] (modified to support reporting with the new schedstats v17) I could see the difference on the on the mainline kernel (v6.14-rc1) default vs NO_PL + NO_RTP: ---------------------------------------------------------------------------------------------------- Time elapsed (in jiffies) : 109316, 109338 ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- CPU <ALL CPUS SUMMARY> ---------------------------------------------------------------------------------------------------- DESC COUNT1 COUNT2 PCT_CHANGE PCT_CHANGE1 PCT_CHANGE2 ---------------------------------------------------------------------------------------------------- sched_yield() count : 27349, 5785 | -78.85% | Legacy counter can be ignored : 0, 0 | 0.00% | schedule() called : 289265, 210475 | -27.24% | schedule() left the processor idle : 73316, 73993 | 0.92% | ( 25.35%, 35.16% ) try_to_wake_up() was called : 154198, 125239 | -18.78% | try_to_wake_up() was called to wake up the local cpu : 32858, 13927 | -57.61% | ( 21.31%, 11.12% ) total runtime by tasks on this processor (in jiffies) : 27003017867,27700849334 | 2.58% | total waittime by tasks on this processor (in jiffies) : 64285802345,80525026945 | 25.26% | ( 238.07%, 290.70% ) total timeslices run on this cpu : 190952, 132092 | -30.82% | ---------------------------------------------------------------------------------------------------- [1] https://lore.kernel.org/lkml/20241122084452.1064968-1-swapnil.sapkal@amd.com/ The trend is as follows: - Lower number of schedule() [-27.24%] - Longer wait times [+25.26%] - Sightly higher runtime across all CPUs This is very similar to the situation with other database workloads we had highlighted earlier that prompted Peter to recommend SCHED_BATCH. Using the dump_python.py from [2], modifying it to only return pids for tasks with "comm=mysqld" and running: python3 dump_python.py | while read i; do chrt -v -b --pid 0 $i; done before starting the workload, I was able to match the performance of SCHED_BATCH with the NO_PL + NO_RTP variant. [2] https://lore.kernel.org/all/d3306655-c4e7-20ab-9656-b1b01417983c@amd.com/ So it was back to drawing boards on why the setting on your reproducer might not be working. > > A performance improvement is noticeable in kernel 6.13 over 6.12, both in > latency and throughput. At the same time, SCHED_BATCH no longer has the > same positive effect it had in 6.12. > > Disabling PLACE_LAG and RUN_TO_PARITY is still as effective as before. > For this reason, I'd like to ask once again that this patch set be > considered for merging and for backporting to kernels 6.6+. > >> This patchset disables the scheduler features PLACE_LAG and RUN_TO_PARITY >> and moves them to sysctl. >> >> Replacing CFS with the EEVDF scheduler in kernel 6.6 introduced >> significant performance degradation in multiple database-oriented >> workloads. This degradation manifests in all kernel versions using EEVDF, >> across multiple Linux distributions, hardware architectures (x86_64, >> aarm64, amd64), and CPU generations. > > When weighing the relevance of various testing approaches, please keep in > mind that mysql is a real-life workload, while the test which prompted the > introduction of PLACE_LAG is much closer to a synthetic benchmark. > > > Instructions for reproducing the above tests: > > 1. Code: The repro scenario that was used for this round of testing can be > found here: https://github.com/aws/repro-collection Digging through the scripts, I found that SCHED_BATCH setting is done via systemd in [3] via the "CPUSchedulingPolicy" parameter. [3] https://github.com/aws/repro-collection/blob/main/workloads/mysql/files/mysqld.service.tmpl Going back to my setup, the scripts does not daemonize mysqld for the reasons of portability. It runs the following: <root>/bin/mysqld ... numactl $server_numactl_param /bin/sh <root>/bin/mysqld_safe ...& export BENCHMARK_PID=$! ... $server_numactl_param are CPU and memory affinity for mysqld_safe. Now interestingly, if I do (version 1): /bin/chrt -v -b 0 <root>/bin/mysqld ... numactl $server_numactl_param /bin/sh <root>/bin/mysqld_safe ...& export BENCHMARK_PID=$! ... I more or less get the same results as baseline v6.14-rc1 (Weird!) But then if I do (version 2): <root>/bin/mysqld ... numactl $server_numactl_param /bin/sh <root>/bin/mysqld_safe ...& export BENCHMARK_PID=$! /bin/chrt -v -b --pid 0 $BENCHMARK_PID; ... I see the performance reach to the same level as that with NO_PL + NO_RTP. Following are the improvements: %diff v6.14-rc1 baseline v6.5.0 (pre-EEVDF) -0.95% v6.14-rc1 + NO_PL + NO_RTP +6.06% v6.14-rc1 + (SCHED_BATCH version 1) +1.42% v6.14-rc1 + (SCHED_BATCH version 2) +6.96% I'm no database guy but it looks like running mysqld_safe as SCHED_BATCH which later does a bunch of setup and forks leads to better performance. I see there is a mysqld_safe references in your mysql config [4] but I'm not sure how it works when running with daemonize. Could you login into your SUT and check if you have a mysqld_safe running and just as a precautionary measure, run all "mysqld*" tasks / threads under SCHED_BATCH before starting the load gen? Thank you. [4] https://github.com/aws/repro-collection/blob/main/workloads/mysql/files/my.cnf.tmpl I'll keep digging to see if I find anything interesting but in my case, on a dual socket 3rd Generation EPYC system (2 x 64C/128T) with mysqld* pinned to one CCX (16CPUs) on one socket and running HammerDB with 64 virtual users, I see the above trends. If you need any other information or the preliminary changes for perf sched stats for the new schedstats version, please do let me know. The series will be refreshed soon with the added support and some more features. > > 2. Setup: I used a 16 vCPU / 32G RAM / 1TB RAID0 SSD instance as SUT, > running Ubuntu 22.04 with the latest updates. All kernels were compiled > from source, preserving the same config (as much as possible) to minimize > noise - in particular, CONFIG_HZ=250 was used everywhere. > > 3. Running: To run the repro, set up a SUT machine and a LDG (loadgen) > machine on the same network, clone the git repo on both, and run: > > (on the SUT) ./repro.sh repro-mysql-EEVDF-regression SUT --ldg=<loadgen_IP> > > (on the LDG) ./repro.sh repro-mysql-EEVDF-regression LDG --sut=<SUT_IP> > > The repro will build and test multiple combinations of kernel versions and > scheduler settings, and will prompt you when to reboot the SUT and rerun > the same command to continue the process. > > More instructions can be found both in the repo's README and by running > 'repro.sh --help'. -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2025-02-11 3:27 ` K Prateek Nayak @ 2025-02-12 5:41 ` Cristian Prundeanu 2025-02-12 9:43 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Cristian Prundeanu @ 2025-02-12 5:41 UTC (permalink / raw) To: K Prateek Nayak Cc: Cristian Prundeanu, Hazem Mohamed Abuelfotoh, Ali Saidi, Benjamin Herrenschmidt, Geoff Blake, Csaba Csoma, Bjoern Doebel, Gautham Shenoy, Joseph Salisbury, Dietmar Eggemann, Ingo Molnar, Peter Zijlstra, Linus Torvalds, Borislav Petkov, linux-arm-kernel, linux-kernel, linux-tip-commits, x86 Hi Prateek, Thank you for the analysis details! > Thank you for the reproducer. I haven't tried it yet (in part due > to the slightly scary "Assumptions" section) It wasn't meant to be scary, my apologies. It is meant to say that the reproducer will only perform testing-related tasks (which you'd normally do manually), without touching the infrastructure (firewall, networking, instance mangement, etc). As long as you set all that up the same way you do when you test manually, you will be fine. I'll clarify the README. Should you run into any questions, please do not hesitate to contact me directly, and I'll help clear the path. > v6.14-rc1 baseline > v6.5.0 (pre-EEVDF) -0.95% > v6.14-rc1 + NO_PL + NO_RTP +6.06% This is interesting. While you do reproduce the benefits of NO_PL+NO_RTP, your result shows no regression compared to the baseline CFS. I'm only speculating, but running both SUT and loadgen on the same machine is a large variation of the test setup, and can lead to result differences like this one. > Digging through the scripts, I found that SCHED_BATCH setting is done > via systemd in [3] via the "CPUSchedulingPolicy" parameter. > [3] https://github.com/aws/repro-collection/blob/main/workloads/mysql/files/mysqld.service.tmpl That is correct, the reproducer uses systemd to set the scheduler policy for mysqld. > interestingly, if I do (version 1): [...] > I more or less get the same results as baseline v6.14-rc1 (Weird!) > But then if I do (version 2): [...] > I see the performance reach to the same level as that with NO_PL + > NO_RTP. That's a good find. I will compare on my setup if performance changes when manually setting all mysqld tasks to SCHED_BATCH. And I haven't yet run perf sched stats on the reproducer, but it may hold useful insight. I'll follow up with more details as I gather them. Your find also helps to point out that even when it works, SCHED_BATCH is a more complex and error prone mitigation than just disabling PL and RTP. The same reproducer setup that uses systemd to set SCHED_BATCH does show improvement in 6.12, but not in 6.13+. There may not even be a single approach that works well on both. Conversely, setting NO_PLACE_LAG + NO_RUN_TO_PARITY is simply done at boot time, and does not require further user effort. It's even simpler if those two features are exposed via sysctl, making it trivial to pesist and query with standard Linux commands as needed. Peter, I've renewed my initial patch so it applies to the current sched/core, and removed the dependency on changing the default values first. I'd appreciate you considering it for merging [1]. [1] https://lore.kernel.org/20250212053644.14787-1-cpru@amazon.com -Cristian ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2025-02-12 5:41 ` Cristian Prundeanu @ 2025-02-12 9:43 ` Peter Zijlstra 0 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2025-02-12 9:43 UTC (permalink / raw) To: Cristian Prundeanu Cc: K Prateek Nayak, Hazem Mohamed Abuelfotoh, Ali Saidi, Benjamin Herrenschmidt, Geoff Blake, Csaba Csoma, Bjoern Doebel, Gautham Shenoy, Joseph Salisbury, Dietmar Eggemann, Ingo Molnar, Linus Torvalds, Borislav Petkov, linux-arm-kernel, linux-kernel, linux-tip-commits, x86 On Tue, Feb 11, 2025 at 11:41:13PM -0600, Cristian Prundeanu wrote: > Your find also helps to point out that even when it works, SCHED_BATCH is > a more complex and error prone mitigation than just disabling PL and RTP. > The same reproducer setup that uses systemd to set SCHED_BATCH does show > improvement in 6.12, but not in 6.13+. There may not even be a single > approach that works well on both. > > Conversely, setting NO_PLACE_LAG + NO_RUN_TO_PARITY is simply done at boot > time, and does not require further user effort. For your workload. It will wreck other workloads. Yes, SCHED_BATCH might be more fiddly, but it allows for composition. You can run multiple workloads together and they all behave. Maybe the right thing here is to get mysql patched; so that it will request BATCH itself for the threads that need it. ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY to sysctl 2025-01-28 23:09 ` [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl Cristian Prundeanu 2025-02-11 3:27 ` K Prateek Nayak @ 2025-02-12 5:36 ` Cristian Prundeanu 2025-02-12 9:17 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Cristian Prundeanu @ 2025-02-12 5:36 UTC (permalink / raw) To: Peter Zijlstra Cc: Cristian Prundeanu, K Prateek Nayak, Hazem Mohamed Abuelfotoh, Ali Saidi, Benjamin Herrenschmidt, Geoff Blake, Csaba Csoma, Bjoern Doebel, Gautham Shenoy, Joseph Salisbury, Dietmar Eggemann, Ingo Molnar, Linus Torvalds, Borislav Petkov, linux-arm-kernel, linux-kernel, linux-tip-commits, x86 Replacing CFS with the EEVDF scheduler in kernel 6.6 introduced significant performance degradation in multiple database-oriented workloads. This degradation manifests in all kernel versions using EEVDF, across multiple Linux distributions, hardware architectures (x86_64, aarm64, amd64), and CPU generations. Testing combinations of available scheduler features showed that the largest improvement (short of disabling all EEVDF features) came from disabling both PLACE_LAG and RUN_TO_PARITY. Moving PLACE_LAG and RUN_TO_PARITY to sysctl will allow users to override their default values and persist them with established mechanisms. Link: https://lore.kernel.org/20241017052000.99200-1-cpru@amazon.com Signed-off-by: Cristian Prundeanu <cpru@amazon.com> --- v2: use latest sched/core; defer default value change to a follow-up patch include/linux/sched/sysctl.h | 8 ++++++++ kernel/sched/core.c | 13 +++++++++++++ kernel/sched/fair.c | 7 ++++--- kernel/sched/features.h | 10 ---------- kernel/sysctl.c | 20 ++++++++++++++++++++ 5 files changed, 45 insertions(+), 13 deletions(-) diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 5a64582b086b..a899398bc1c4 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -29,4 +29,12 @@ extern int sysctl_numa_balancing_mode; #define sysctl_numa_balancing_mode 0 #endif +#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL) +extern unsigned int sysctl_sched_place_lag_enabled; +extern unsigned int sysctl_sched_run_to_parity_enabled; +#else +#define sysctl_sched_place_lag_enabled 1 +#define sysctl_sched_run_to_parity_enabled 1 +#endif + #endif /* _LINUX_SCHED_SYSCTL_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9142a0394d46..a379240628ea 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -134,6 +134,19 @@ const_debug unsigned int sysctl_sched_features = 0; #undef SCHED_FEAT +#ifdef CONFIG_SYSCTL +/* + * Using the avg_vruntime, do the right thing and preserve lag across + * sleep+wake cycles. EEVDF placement strategy #1, #2 if disabled. + */ +__read_mostly unsigned int sysctl_sched_place_lag_enabled = 1; +/* + * Inhibit (wakeup) preemption until the current task has either matched the + * 0-lag point or until it has exhausted its slice. + */ +__read_mostly unsigned int sysctl_sched_run_to_parity_enabled = 1; +#endif + /* * Print a warning if need_resched is set for the given duration (if * LATENCY_WARN is enabled). diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1e78caa21436..c87fd1accd54 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -923,7 +923,8 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) * Once selected, run a task until it either becomes non-eligible or * until it gets a new slice. See the HACK in set_next_entity(). */ - if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline) + if (sysctl_sched_run_to_parity_enabled && curr && + curr->vlag == curr->deadline) return curr; /* Pick the leftmost entity if it's eligible */ @@ -5199,7 +5200,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * * EEVDF: placement strategy #1 / #2 */ - if (sched_feat(PLACE_LAG) && cfs_rq->nr_queued && se->vlag) { + if (sysctl_sched_place_lag_enabled && cfs_rq->nr_queued && se->vlag) { struct sched_entity *curr = cfs_rq->curr; unsigned long load; @@ -9327,7 +9328,7 @@ static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_ #else dst_cfs_rq = &cpu_rq(dest_cpu)->cfs; #endif - if (sched_feat(PLACE_LAG) && dst_cfs_rq->nr_queued && + if (sysctl_sched_place_lag_enabled && dst_cfs_rq->nr_queued && !entity_eligible(task_cfs_rq(p), &p->se)) return 1; diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 3c12d9f93331..b98ec31ef2c4 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -1,10 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ -/* - * Using the avg_vruntime, do the right thing and preserve lag across - * sleep+wake cycles. EEVDF placement strategy #1, #2 if disabled. - */ -SCHED_FEAT(PLACE_LAG, true) /* * Give new tasks half a slice to ease into the competition. */ @@ -13,11 +8,6 @@ SCHED_FEAT(PLACE_DEADLINE_INITIAL, true) * Preserve relative virtual deadline on 'migration'. */ SCHED_FEAT(PLACE_REL_DEADLINE, true) -/* - * Inhibit (wakeup) preemption until the current task has either matched the - * 0-lag point or until is has exhausted it's slice. - */ -SCHED_FEAT(RUN_TO_PARITY, true) /* * Allow wakeup of tasks with a shorter slice to cancel RUN_TO_PARITY for * current. diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 7ae7a4136855..11651d87f6d4 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2019,6 +2019,26 @@ static struct ctl_table kern_table[] = { .extra2 = SYSCTL_INT_MAX, }, #endif +#ifdef CONFIG_SCHED_DEBUG + { + .procname = "sched_place_lag_enabled", + .data = &sysctl_sched_place_lag_enabled, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { + .procname = "sched_run_to_parity_enabled", + .data = &sysctl_sched_run_to_parity_enabled, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, +#endif }; static struct ctl_table vm_table[] = { base-commit: 05dbaf8dd8bf537d4b4eb3115ab42a5fb40ff1f5 -- 2.48.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY to sysctl 2025-02-12 5:36 ` [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY " Cristian Prundeanu @ 2025-02-12 9:17 ` Peter Zijlstra 2025-02-12 9:37 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2025-02-12 9:17 UTC (permalink / raw) To: Cristian Prundeanu Cc: K Prateek Nayak, Hazem Mohamed Abuelfotoh, Ali Saidi, Benjamin Herrenschmidt, Geoff Blake, Csaba Csoma, Bjoern Doebel, Gautham Shenoy, Joseph Salisbury, Dietmar Eggemann, Ingo Molnar, Linus Torvalds, Borislav Petkov, linux-arm-kernel, linux-kernel, linux-tip-commits, x86 On Tue, Feb 11, 2025 at 11:36:44PM -0600, Cristian Prundeanu wrote: > Replacing CFS with the EEVDF scheduler in kernel 6.6 introduced > significant performance degradation in multiple database-oriented > workloads. This degradation manifests in all kernel versions using EEVDF, > across multiple Linux distributions, hardware architectures (x86_64, > aarm64, amd64), and CPU generations. > > Testing combinations of available scheduler features showed that the > largest improvement (short of disabling all EEVDF features) came from > disabling both PLACE_LAG and RUN_TO_PARITY. > > Moving PLACE_LAG and RUN_TO_PARITY to sysctl will allow users to override > their default values and persist them with established mechanisms. Nope -- you have knobs in debugfs, and that's where they'll stay. Esp. PLACE_LAG is super dodgy and should not get elevated to anything remotely official. Also, FYI, by keeping these emails threaded in the old thread I nearly missed them again. I'm not sure where this nonsense of keeping everything in one thread came from, but it is bloody stupid. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY to sysctl 2025-02-12 9:17 ` Peter Zijlstra @ 2025-02-12 9:37 ` Peter Zijlstra 2025-02-12 23:00 ` Cristian Prundeanu 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2025-02-12 9:37 UTC (permalink / raw) To: Cristian Prundeanu Cc: K Prateek Nayak, Hazem Mohamed Abuelfotoh, Ali Saidi, Benjamin Herrenschmidt, Geoff Blake, Csaba Csoma, Bjoern Doebel, Gautham Shenoy, Joseph Salisbury, Dietmar Eggemann, Ingo Molnar, Linus Torvalds, Borislav Petkov, linux-arm-kernel, linux-kernel, linux-tip-commits, x86 On Wed, Feb 12, 2025 at 10:17:11AM +0100, Peter Zijlstra wrote: > On Tue, Feb 11, 2025 at 11:36:44PM -0600, Cristian Prundeanu wrote: > > Replacing CFS with the EEVDF scheduler in kernel 6.6 introduced > > significant performance degradation in multiple database-oriented > > workloads. This degradation manifests in all kernel versions using EEVDF, > > across multiple Linux distributions, hardware architectures (x86_64, > > aarm64, amd64), and CPU generations. > > > > Testing combinations of available scheduler features showed that the > > largest improvement (short of disabling all EEVDF features) came from > > disabling both PLACE_LAG and RUN_TO_PARITY. > > > > Moving PLACE_LAG and RUN_TO_PARITY to sysctl will allow users to override > > their default values and persist them with established mechanisms. > > Nope -- you have knobs in debugfs, and that's where they'll stay. Esp. > PLACE_LAG is super dodgy and should not get elevated to anything > remotely official. Just to clarify, the problem with NO_PLACE_LAG is that by discarding lag, a task can game the system to 'gain' time. It fundamentally breaks fairness, and the only reason I implemented it at all was because it is one of the 'official' placement strategies in the original paper. But ideally, it should just go, it is not a sound strategy and relies on tasks behaving themselves. That is, assuming your tasks behave like the traditional periodic or sporadic tasks, then it works, but only because the tasks are limited by the constraints of the task model. If the tasks are unconstrained / aperiodic, this goes out the window and the placement strategy becomes unsound. And given we must assume userspace to be malicious / hostile / unbehaved, the whole thing is just not good. It is for this same reason that SCHED_DEADLINE has a constant bandwidth server on top of the earliest deadline first policy. Pure EDF is only sound for periodic / sporadic tasks, but we cannot assume userspace will behave themselves, so we have to put in guard-rails, CBS in this case. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY to sysctl 2025-02-12 9:37 ` Peter Zijlstra @ 2025-02-12 23:00 ` Cristian Prundeanu 0 siblings, 0 replies; 27+ messages in thread From: Cristian Prundeanu @ 2025-02-12 23:00 UTC (permalink / raw) To: Peter Zijlstra Cc: Cristian Prundeanu, K Prateek Nayak, Hazem Mohamed Abuelfotoh, Ali Saidi, Benjamin Herrenschmidt, Geoff Blake, Csaba Csoma, Bjoern Doebel, Gautham Shenoy, Joseph Salisbury, Dietmar Eggemann, Ingo Molnar, Linus Torvalds, Borislav Petkov, linux-arm-kernel, linux-kernel, linux-tip-commits, x86 >>> Moving PLACE_LAG and RUN_TO_PARITY to sysctl will allow users to override >>> their default values and persist them with established mechanisms. >> >> Nope -- you have knobs in debugfs, and that's where they'll stay. Esp. >> PLACE_LAG is super dodgy and should not get elevated to anything >> remotely official. > > Just to clarify, the problem with NO_PLACE_LAG is that by discarding > lag, a task can game the system to 'gain' time. It fundamentally breaks > fairness, and the only reason I implemented it at all was because it is > one of the 'official' placement strategies in the original paper. Wouldn't this be an argument in favor of more official positioning of this knob? It may be dodgy, but it's currently the best mitigation option, until something better comes along. > If the tasks are unconstrained / aperiodic, this goes out the window and > the placement strategy becomes unsound. And given we must assume > userspace to be malicious / hostile / unbehaved, the whole thing is just > not good. Userspace in general, absolutely. User intent should be king though, and impairing the ability to do precisely what you want with your machine feels like it stands against what Linux is best known (and often feared) for: configurability. There is _another_ OS which has made a habit of dictating how users should want to do something. We're not there of course, but it's a strong cautionary tale. To ask more specifically, isn't a strong point of EEVDF the fact that it considers _more_ user needs and use cases than CFS (for instance, task lag/latency)? >> Conversely, setting NO_PLACE_LAG + NO_RUN_TO_PARITY is simply done at boot >> time, and does not require further user effort. > > For your workload. It will wreck other workloads. I'd like to invite you to name one real-life workload that would be wrecked by allowing PL and RTP override in sysctl. I can name three that are currently impacted (mysql, postgres, and wordpress), with only poor means (increased effort, non-standard persistence leading to higher maintenance cost, requirement for debugfs) to mitigate the regression. > Yes, SCHED_BATCH might be more fiddly, but it allows for composition. > You can run multiple workloads together and they all behave. Shouldn't we leave that to the user to decide, though? Forcing a new default configuration that only works well with multiple workloads can not be the right thing for everyone - especially for large scale providers, where servers and corresponding images are intended to run one main workload. Importantly, things that used to run well and now don't. > Maybe the right thing here is to get mysql patched; so that it will > request BATCH itself for the threads that need it. For mysql in particular, it's a possible avenue (though I still object to the idea that individual users and vendors now need to put in additional effort to maintain the same performance as before). But on a larger picture, this reproducer is only meant as a simplified illustration of the performance issues. It is not a single occurrence. There are far more complex workloads where tuning at thread level is at best impractical, or even downright impossible. Think of managed clusters where the load distribution and corresponding task density are not user controlled, or JVM workloads where individual threads are not even designed to be managed externally, or containers built from external dependencies where tuning a service is anything but trivial. Are we really saying that everyone just needs to swallow the cost of this change, or put up with the lower performance level? Even if the Linux Kernel doesn't concern itself with business cost, surely at least the time burned on this by both commercial and non-commercial projects cannot be lost on you. > Also, FYI, by keeping these emails threaded in the old thread I nearly > missed them again. I'm not sure where this nonsense of keeping > everything in one thread came from, but it is bloody stupid. Thank you. This is a great opportunity for both of us to relate to the opposing stance on this patch, and I hope you too will see the parallel: My reason for threading was well intended. I value your time and wanted to avoid you wasting it by having to search for the previous patch or older threads on the same topic. However, I ended up inadvertently creating an issue for your use case. It, arguably, doesn't have a noticeable impact on my side, and it could be avoided by you, the user, by configuring your email client to always highlight messages directly addressed to you; assuming that your email client supports it, and you are able and willing to invest the effort to do it. Nevertheless, this doesn't make it right. I do apologize for the annoyance; it was not my intent to put additional burden on you, only to have the same experience or efficiency that you are used to having. I did consolidate the two recent threads into this one though, because I believe that it's easier to follow by everyone else. It may be a silly parallel, but please consider that similar frustration is happening to many users who now are asked to put effort towards bringing performance back to previous levels - if at all possible and feasible - and at the same time are denied the right tools to do so. Please consider that it took years for EEVDF commit messages to go from "horribly messes up things" to "isn't perfect yet, but much closer", and it may take years still until it's as stable, performant and vetted across varied scenarios as CFS was in kernel 6.5. Please consider that along this journey are countless users and groups who would rather not wait for perfection, but have easy means to at least get the same performance they were getting before. -Cristian ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <C0E39DE3-EEEB-4A08-850F-A4B7EC809E3A@amazon.com>]
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl [not found] <C0E39DE3-EEEB-4A08-850F-A4B7EC809E3A@amazon.com> @ 2024-10-24 8:12 ` Benjamin Herrenschmidt 2024-10-25 14:43 ` Gautham R. Shenoy 0 siblings, 1 reply; 27+ messages in thread From: Benjamin Herrenschmidt @ 2024-10-24 8:12 UTC (permalink / raw) To: Prundeanu, Cristian, K Prateek Nayak, Peter Zijlstra Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar, x86@kernel.org, linux-arm-kernel@lists.infradead.org, Doebel, Bjoern, Mohamed Abuelfotoh, Hazem, Blake, Geoff, Saidi, Ali, Csoma, Csaba, gautham.shenoy@amd.com On Sat, 2024-10-19 at 02:30 +0000, Prundeanu, Cristian wrote: > > The hammerdb test is a bit more complex than sysbench. It uses two > independent physical machines to perform a TPC-C derived test [1], aiming > to simulate a real-world database workload. The machines are allocated as > an AWS EC2 instance pair on the same cluster placement group [2], to avoid > measuring network bottlenecks instead of server performance. The SUT > instance runs mysql configured to use 2 worker threads per vCPU (32 > total); the load generator instance runs hammerdb configured with 64 > virtual users and 24 warehouses [3]. Each test consists of multiple > 20-minute rounds, run consecutively on multiple independent instance > pairs. Would it be possible to produce something that Prateek and Gautham (Hi Gautham btw !) can easily consume to reproduce ? Maybe a container image or a pair of container images hammering each other ? (the simpler the better). Cheers, Ben. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-24 8:12 ` [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them " Benjamin Herrenschmidt @ 2024-10-25 14:43 ` Gautham R. Shenoy 2024-10-29 4:57 ` Cristian Prundeanu 0 siblings, 1 reply; 27+ messages in thread From: Gautham R. Shenoy @ 2024-10-25 14:43 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Prundeanu, Cristian, K Prateek Nayak, Peter Zijlstra, linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar, x86@kernel.org, linux-arm-kernel@lists.infradead.org, Doebel, Bjoern, Mohamed Abuelfotoh, Hazem, Blake, Geoff, Saidi, Ali, Csoma, Csaba Hello Cristian, Ben, On Thu, Oct 24, 2024 at 07:12:49PM +1100, Benjamin Herrenschmidt wrote: > On Sat, 2024-10-19 at 02:30 +0000, Prundeanu, Cristian wrote: > > > > The hammerdb test is a bit more complex than sysbench. It uses two > > independent physical machines to perform a TPC-C derived test [1], aiming > > to simulate a real-world database workload. The machines are allocated as > > an AWS EC2 instance pair on the same cluster placement group [2], to avoid > > measuring network bottlenecks instead of server performance. The SUT > > instance runs mysql configured to use 2 worker threads per vCPU (32 > > total); the load generator instance runs hammerdb configured with 64 > > virtual users and 24 warehouses [3]. Each test consists of multiple > > 20-minute rounds, run consecutively on multiple independent instance > > pairs. > > Would it be possible to produce something that Prateek and Gautham > (Hi Gautham btw !) can easily consume to reproduce ? > > Maybe a container image or a pair of container images hammering each > other ? (the simpler the better). Yes, that would be useful. Please share your recipe. We will try and reproduce it at our end. In our testing from a few months ago (some of which was presented at OSPM 2024), most of the database related regressions that we observed with EEVDF went away after running these the server threads under SCHED_BATCH. > > Cheers, > Ben. -- Thanks and Regards gautham. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-25 14:43 ` Gautham R. Shenoy @ 2024-10-29 4:57 ` Cristian Prundeanu 2024-10-30 10:21 ` Dietmar Eggemann ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Cristian Prundeanu @ 2024-10-29 4:57 UTC (permalink / raw) To: Gautham R. Shenoy Cc: linux-tip-commits, linux-kernel, Peter Zijlstra, Ingo Molnar, x86, linux-arm-kernel, Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi, Csaba Csoma, Benjamin Herrenschmidt, K Prateek Nayak Hi Gautham, On 2024-10-25, 09:44, "Gautham R. Shenoy" <gautham.shenoy@amd.com <mailto:gautham.shenoy@amd.com>> wrote: > On Thu, Oct 24, 2024 at 07:12:49PM +1100, Benjamin Herrenschmidt wrote: > > On Sat, 2024-10-19 at 02:30 +0000, Prundeanu, Cristian wrote: > > > > > > The hammerdb test is a bit more complex than sysbench. It uses two > > > independent physical machines to perform a TPC-C derived test [1], aiming > > > to simulate a real-world database workload. The machines are allocated as > > > an AWS EC2 instance pair on the same cluster placement group [2], to avoid > > > measuring network bottlenecks instead of server performance. The SUT > > > instance runs mysql configured to use 2 worker threads per vCPU (32 > > > total); the load generator instance runs hammerdb configured with 64 > > > virtual users and 24 warehouses [3]. Each test consists of multiple > > > 20-minute rounds, run consecutively on multiple independent instance > > > pairs. > > > > Would it be possible to produce something that Prateek and Gautham > > (Hi Gautham btw !) can easily consume to reproduce ? > > > > Maybe a container image or a pair of container images hammering each > > other ? (the simpler the better). > > Yes, that would be useful. Please share your recipe. We will try and > reproduce it at our end. In our testing from a few months ago (some of > which was presented at OSPM 2024), most of the database related > regressions that we observed with EEVDF went away after running these > the server threads under SCHED_BATCH. I am working on a repro package that is self contained and as simple to share as possible. My testing with SCHED_BATCH is meanwhile concluded. It did reduce the regression to less than half - but only with WAKEUP_PREEMPTION enabled. When using NO_WAKEUP_PREEMPTION, there was no performance change compared to SCHED_OTHER. (At the risk of stating the obvious, using SCHED_BATCH only to get back to the default CFS performance is still only a workaround, just as disabling PLACE_LAG+RUN_TO_PARITY is; these give us more room to investigate the root cause in EEVDF, but shouldn't be seen as viable alternate solutions.) Do you have more detail on the database regressions you saw a few months ago? What was the magnitude, and which workloads did it manifest on? -Cristian ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-29 4:57 ` Cristian Prundeanu @ 2024-10-30 10:21 ` Dietmar Eggemann 2024-11-01 13:05 ` Peter Zijlstra 2024-11-04 10:19 ` Gautham R. Shenoy 2 siblings, 0 replies; 27+ messages in thread From: Dietmar Eggemann @ 2024-10-30 10:21 UTC (permalink / raw) To: Cristian Prundeanu, Gautham R. Shenoy Cc: linux-tip-commits, linux-kernel, Peter Zijlstra, Ingo Molnar, x86, linux-arm-kernel, Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi, Csaba Csoma, Benjamin Herrenschmidt, K Prateek Nayak Hi Christian, On 29/10/2024 05:57, Cristian Prundeanu wrote: > Hi Gautham, > > On 2024-10-25, 09:44, "Gautham R. Shenoy" <gautham.shenoy@amd.com <mailto:gautham.shenoy@amd.com>> wrote: > >> On Thu, Oct 24, 2024 at 07:12:49PM +1100, Benjamin Herrenschmidt wrote: >>> On Sat, 2024-10-19 at 02:30 +0000, Prundeanu, Cristian wrote: >>>> >>>> The hammerdb test is a bit more complex than sysbench. It uses two >>>> independent physical machines to perform a TPC-C derived test [1], aiming >>>> to simulate a real-world database workload. The machines are allocated as >>>> an AWS EC2 instance pair on the same cluster placement group [2], to avoid >>>> measuring network bottlenecks instead of server performance. The SUT >>>> instance runs mysql configured to use 2 worker threads per vCPU (32 >>>> total); the load generator instance runs hammerdb configured with 64 >>>> virtual users and 24 warehouses [3]. Each test consists of multiple >>>> 20-minute rounds, run consecutively on multiple independent instance >>>> pairs. >>> >>> Would it be possible to produce something that Prateek and Gautham >>> (Hi Gautham btw !) can easily consume to reproduce ? >>> >>> Maybe a container image or a pair of container images hammering each >>> other ? (the simpler the better). >> >> Yes, that would be useful. Please share your recipe. We will try and >> reproduce it at our end. In our testing from a few months ago (some of >> which was presented at OSPM 2024), most of the database related >> regressions that we observed with EEVDF went away after running these >> the server threads under SCHED_BATCH. > > I am working on a repro package that is self contained and as simple to > share as possible. > > My testing with SCHED_BATCH is meanwhile concluded. It did reduce the > regression to less than half - but only with WAKEUP_PREEMPTION enabled. > When using NO_WAKEUP_PREEMPTION, there was no performance change compared > to SCHED_OTHER. Which tasks did you set SCHED_BATCH here? I'm assuming the mysql 'connection' tasks on the SUT (1 task for each virtual user I guess). I did this and see that the regression goes away. I'm using a similar test setup (hammerdb - mysql on AWS EC2 instances). I'm not sure yet how reliable my results are. The big unknown is the host system when I use AWS EC2 instances for hammerdb (Load Gen) and mysql (server). In case I gather test results over multiple days, the host system might have changed? I also tried the (not-mainlined) RESPECT_SLICE (NO_RUN_TO_PARITY) features which shows similar results compared to SCHED_BATCH for those threads. IIRC, RESPECT_SLICE was also helping Gautham to get the performance back for his 'sysbench + mysql' workload: OSPM 24 link to his presentation: https://youtu.be/jrEN4pJiRWU?t=1115 > (At the risk of stating the obvious, using SCHED_BATCH only to get back to > the default CFS performance is still only a workaround, just as disabling > PLACE_LAG+RUN_TO_PARITY is; these give us more room to investigate the > root cause in EEVDF, but shouldn't be seen as viable alternate solutions.) > > Do you have more detail on the database regressions you saw a few months > ago? What was the magnitude, and which workloads did it manifest on? > > -Cristian > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-29 4:57 ` Cristian Prundeanu 2024-10-30 10:21 ` Dietmar Eggemann @ 2024-11-01 13:05 ` Peter Zijlstra 2024-11-04 10:19 ` Gautham R. Shenoy 2 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2024-11-01 13:05 UTC (permalink / raw) To: Cristian Prundeanu Cc: Gautham R. Shenoy, linux-tip-commits, linux-kernel, Ingo Molnar, x86, linux-arm-kernel, Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi, Csaba Csoma, Benjamin Herrenschmidt, K Prateek Nayak On Mon, Oct 28, 2024 at 11:57:49PM -0500, Cristian Prundeanu wrote: > My testing with SCHED_BATCH is meanwhile concluded. It did reduce the > regression to less than half - but only with WAKEUP_PREEMPTION enabled. > When using NO_WAKEUP_PREEMPTION, there was no performance change compared > to SCHED_OTHER. Because BATCH affects wakeup-preemption, and if there isn't any ever, it makes no difference. A BATCH task will not preempt another BATCH task, the only thing driving preemption is slice exhaustion -- and we can now set slice per task to match with the 'work' cycle. > (At the risk of stating the obvious, using SCHED_BATCH only to get back to > the default CFS performance is still only a workaround, It is not really -- it is impossible to schedule all the various workloads without them telling us what they really like. The quest is to find interfaces that make sense and are implementable. But fundamentally tasks will have to start telling us what they need. We've long since ran out of crystal balls. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-29 4:57 ` Cristian Prundeanu 2024-10-30 10:21 ` Dietmar Eggemann 2024-11-01 13:05 ` Peter Zijlstra @ 2024-11-04 10:19 ` Gautham R. Shenoy 2024-11-04 10:34 ` K Prateek Nayak 2 siblings, 1 reply; 27+ messages in thread From: Gautham R. Shenoy @ 2024-11-04 10:19 UTC (permalink / raw) To: Cristian Prundeanu Cc: linux-tip-commits, linux-kernel, Peter Zijlstra, Ingo Molnar, x86, linux-arm-kernel, Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi, Csaba Csoma, Benjamin Herrenschmidt, K Prateek Nayak On Mon, Oct 28, 2024 at 11:57:49PM -0500, Cristian Prundeanu wrote: > Hi Gautham, > > On 2024-10-25, 09:44, "Gautham R. Shenoy" <gautham.shenoy@amd.com <mailto:gautham.shenoy@amd.com>> wrote: > > > On Thu, Oct 24, 2024 at 07:12:49PM +1100, Benjamin Herrenschmidt wrote: > > > On Sat, 2024-10-19 at 02:30 +0000, Prundeanu, Cristian wrote: > > > > > > > > The hammerdb test is a bit more complex than sysbench. It uses two > > > > independent physical machines to perform a TPC-C derived test [1], aiming > > > > to simulate a real-world database workload. The machines are allocated as > > > > an AWS EC2 instance pair on the same cluster placement group [2], to avoid > > > > measuring network bottlenecks instead of server performance. The SUT > > > > instance runs mysql configured to use 2 worker threads per vCPU (32 > > > > total); the load generator instance runs hammerdb configured with 64 > > > > virtual users and 24 warehouses [3]. Each test consists of multiple > > > > 20-minute rounds, run consecutively on multiple independent instance > > > > pairs. > > > > > > Would it be possible to produce something that Prateek and Gautham > > > (Hi Gautham btw !) can easily consume to reproduce ? > > > > > > Maybe a container image or a pair of container images hammering each > > > other ? (the simpler the better). > > > > Yes, that would be useful. Please share your recipe. We will try and > > reproduce it at our end. In our testing from a few months ago (some of > > which was presented at OSPM 2024), most of the database related > > regressions that we observed with EEVDF went away after running these > > the server threads under SCHED_BATCH. > > I am working on a repro package that is self contained and as simple to > share as possible. Sorry for the delay in response. I was away for the Diwali festival. Thank you for working on the repro package. > > My testing with SCHED_BATCH is meanwhile concluded. It did reduce the > regression to less than half - but only with WAKEUP_PREEMPTION enabled. > When using NO_WAKEUP_PREEMPTION, there was no performance change compared > to SCHED_OTHER. > > (At the risk of stating the obvious, using SCHED_BATCH only to get back to > the default CFS performance is still only a workaround, just as disabling > PLACE_LAG+RUN_TO_PARITY is; these give us more room to investigate the > root cause in EEVDF, but shouldn't be seen as viable alternate solutions.) > > Do you have more detail on the database regressions you saw a few months > ago? What was the magnitude, and which workloads did it manifest on? There were three variants of sysbench + MySQL which showed regression with EEVDF. 1. 1 Table, 10M Rows, read-only queries. 2. 3 Tables, 10M Rows each, read-only queries. 3. 1 Segmented Table, 10M Rows, read-only queries. These saw regressions in the range of 9-12%. The other database workload which showed regression was MongoDB + YCSB workload c. There the magnitude of the regression was around 17%. As mentioned by Dietmar, we observed these regressions to go away with the original EEVDF complete patches which had a feature called RESPECT_SLICE which allowed a running task to run till its slice gets over without being preempted by a newly woken up task. However, Peter suggested exploring SCHED_BATCH which fixed the regression even without EEVDF complete patchset. > > -Cristian -- Thanks and Regards gautham. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-11-04 10:19 ` Gautham R. Shenoy @ 2024-11-04 10:34 ` K Prateek Nayak 0 siblings, 0 replies; 27+ messages in thread From: K Prateek Nayak @ 2024-11-04 10:34 UTC (permalink / raw) To: Cristian Prundeanu, Gautham R. Shenoy Cc: linux-tip-commits, linux-kernel, Peter Zijlstra, Ingo Molnar, x86, linux-arm-kernel, Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi, Csaba Csoma, Benjamin Herrenschmidt Hello Cristian, Gautham, On 11/4/2024 3:49 PM, Gautham R. Shenoy wrote: > On Mon, Oct 28, 2024 at 11:57:49PM -0500, Cristian Prundeanu wrote: >> Hi Gautham, >> >> On 2024-10-25, 09:44, "Gautham R. Shenoy" <gautham.shenoy@amd.com <mailto:gautham.shenoy@amd.com>> wrote: >> >>> On Thu, Oct 24, 2024 at 07:12:49PM +1100, Benjamin Herrenschmidt wrote: >>>> On Sat, 2024-10-19 at 02:30 +0000, Prundeanu, Cristian wrote: >>>>> >>>>> The hammerdb test is a bit more complex than sysbench. It uses two >>>>> independent physical machines to perform a TPC-C derived test [1], aiming >>>>> to simulate a real-world database workload. The machines are allocated as >>>>> an AWS EC2 instance pair on the same cluster placement group [2], to avoid >>>>> measuring network bottlenecks instead of server performance. The SUT >>>>> instance runs mysql configured to use 2 worker threads per vCPU (32 >>>>> total); the load generator instance runs hammerdb configured with 64 >>>>> virtual users and 24 warehouses [3]. Each test consists of multiple >>>>> 20-minute rounds, run consecutively on multiple independent instance >>>>> pairs. >>>> >>>> Would it be possible to produce something that Prateek and Gautham >>>> (Hi Gautham btw !) can easily consume to reproduce ? >>>> >>>> Maybe a container image or a pair of container images hammering each >>>> other ? (the simpler the better). >>> >>> Yes, that would be useful. Please share your recipe. We will try and >>> reproduce it at our end. In our testing from a few months ago (some of >>> which was presented at OSPM 2024), most of the database related >>> regressions that we observed with EEVDF went away after running these >>> the server threads under SCHED_BATCH. >> >> I am working on a repro package that is self contained and as simple to >> share as possible. > > Sorry for the delay in response. I was away for the Diwali festival. > Thank you for working on the repro package. > > >> >> My testing with SCHED_BATCH is meanwhile concluded. It did reduce the >> regression to less than half - but only with WAKEUP_PREEMPTION enabled. >> When using NO_WAKEUP_PREEMPTION, there was no performance change compared >> to SCHED_OTHER. >> >> (At the risk of stating the obvious, using SCHED_BATCH only to get back to >> the default CFS performance is still only a workaround, just as disabling >> PLACE_LAG+RUN_TO_PARITY is; these give us more room to investigate the >> root cause in EEVDF, but shouldn't be seen as viable alternate solutions.) >> >> Do you have more detail on the database regressions you saw a few months >> ago? What was the magnitude, and which workloads did it manifest on? > > > There were three variants of sysbench + MySQL which showed regression > with EEVDF. > > 1. 1 Table, 10M Rows, read-only queries. > 2. 3 Tables, 10M Rows each, read-only queries. > 3. 1 Segmented Table, 10M Rows, read-only queries. > > These saw regressions in the range of 9-12%. > > The other database workload which showed regression was MongoDB + YCSB > workload c. There the magnitude of the regression was around 17%. > > As mentioned by Dietmar, we observed these regressions to go away with > the original EEVDF complete patches which had a feature called > RESPECT_SLICE which allowed a running task to run till its slice gets > over without being preempted by a newly woken up task. > > However, Peter suggested exploring SCHED_BATCH which fixed the > regression even without EEVDF complete patchset. Adding to that, since we had to test a variety of workloads, often where number of threads autoscales, we used the following methodology to check if using SCHED_BATCH solves the regressions observed: # echo 1 > /sys/kernel/tracing/events/task/enable # cat dump_python.py import time import sys with open("/sys/kernel/tracing/trace_pipe") as tf: for l in tf: if not l.startswith("#") or "comm=bash" not in l: pid_start = l.index("pid=") + 4 pid = int(l[pid_start: l.index(" ", pid_start)]) print(pid) sys.stdout.flush() # watch 'python3 dump_python.py | while read i; do chrt -v -b --pid 0 $i; done' Post running the above, we launch the benchmark. It is not pretty but it has worked for various different kind of benchmarks we've tested. On an addition note, since EEVDF got rid of both "wakeup_granularity_ns" and "latency_ns", and SCHED_BATCH helps with the absence of former, have you tested using a larger values of "base_slice_ns" in tandum with SCHED_BATCH / NO_WAKEUP_PREEMPTION ? > >> >> -Cristian > > -- > Thanks and Regards > gautham. -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl
@ 2024-10-17 5:19 Cristian Prundeanu
2024-10-17 9:10 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Cristian Prundeanu @ 2024-10-17 5:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, x86, linux-arm-kernel,
Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi,
Csaba Csoma, Cristian Prundeanu
This patchset disables the scheduler features PLACE_LAG and RUN_TO_PARITY
and moves them to sysctl.
Replacing CFS with the EEVDF scheduler in kernel 6.6 introduced
significant performance degradation in multiple database-oriented
workloads. This degradation manifests in all kernel versions using EEVDF,
across multiple Linux distributions, hardware architectures (x86_64,
aarm64, amd64), and CPU generations.
For example, running mysql+hammerdb results in a 12-17% throughput
reduction and 12-18% latency increase compared to kernel 6.5 (using
default scheduler settings everywhere). The magnitude of this performance
impact is comparable to the average performance difference of a CPU
generation over its predecessor.
Testing combinations of available scheduler features showed that the
largest improvement (short of disabling all EEVDF features) came from
disabling both PLACE_LAG and RUN_TO_PARITY:
Kernel | default | NO_PLACE_LAG and
aarm64 | config | NO_RUN_TO_PARITY
---------+----------+-----------------
6.5 | baseline | N/A
6.6 | -13.2% | -6.8%
6.7 | -13.1% | -6.0%
6.8 | -12.3% | -6.5%
6.9 | -12.7% | -6.9%
6.10 | -13.5% | -5.8%
6.11 | -12.6% | -5.8%
6.12-rc2 | -12.2% | -8.9%
---------+----------+-----------------
Kernel | default | NO_PLACE_LAG and
x86_64 | config | NO_RUN_TO_PARITY
---------+----------+-----------------
6.5 | baseline | N/A
6.6 | -16.8% | -10.8%
6.7 | -16.4% | -9.9%
6.8 | -17.2% | -9.5%
6.9 | -17.4% | -9.7%
6.10 | -16.5% | -9.0%
6.11 | -15.0% | -8.5%
6.12-rc2 | -12.7% | -10.9%
---------+----------+-----------------
While the long term approach is debugging and fixing the scheduler
behavior, algorithm changes to address performance issues of this nature
are specialized (and likely prolonged or open-ended) research. Until a
change is identified which fixes the performance degradation, in the
interest of a better out-of-the-box performance: (1) disable these
features by default, and (2) expose these values in sysctl instead of
debugfs, so they can be more easily persisted across reboots.
Cristian Prundeanu (2):
sched: Disable PLACE_LAG and RUN_TO_PARITY
sched: Move PLACE_LAG and RUN_TO_PARITY to sysctl
include/linux/sched/sysctl.h | 8 ++++++++
kernel/sched/core.c | 13 +++++++++++++
kernel/sched/fair.c | 5 +++--
kernel/sched/features.h | 10 ----------
kernel/sysctl.c | 20 ++++++++++++++++++++
5 files changed, 44 insertions(+), 12 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-17 5:19 Cristian Prundeanu @ 2024-10-17 9:10 ` Peter Zijlstra 2024-10-17 18:19 ` Prundeanu, Cristian 2024-11-14 20:10 ` Joseph Salisbury 2024-11-25 11:35 ` Cristian Prundeanu 2 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2024-10-17 9:10 UTC (permalink / raw) To: Cristian Prundeanu Cc: linux-tip-commits, linux-kernel, Ingo Molnar, x86, linux-arm-kernel, Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi, Csaba Csoma, gautham.shenoy On Thu, Oct 17, 2024 at 12:19:58AM -0500, Cristian Prundeanu wrote: > For example, running mysql+hammerdb results in a 12-17% throughput Gautham, is this a benchmark you're running? > Testing combinations of available scheduler features showed that the > largest improvement (short of disabling all EEVDF features) came from > disabling both PLACE_LAG and RUN_TO_PARITY: How does using SCHED_BATCH compare? > While the long term approach is debugging and fixing the scheduler > behavior, algorithm changes to address performance issues of this nature > are specialized (and likely prolonged or open-ended) research. Until a > change is identified which fixes the performance degradation, in the > interest of a better out-of-the-box performance: (1) disable these > features by default, and (2) expose these values in sysctl instead of > debugfs, so they can be more easily persisted across reboots. So disabling them by default will undoubtedly affect a ton of other workloads. And sysctl is arguably more of an ABI than debugfs, which doesn't really sound suitable for workaround. And I don't see how adding a line to /etc/rc.local is harder than adding a line to /etc/sysctl.conf ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-17 9:10 ` Peter Zijlstra @ 2024-10-17 18:19 ` Prundeanu, Cristian 2024-10-18 7:07 ` K Prateek Nayak 2024-10-18 9:54 ` Mohamed Abuelfotoh, Hazem 0 siblings, 2 replies; 27+ messages in thread From: Prundeanu, Cristian @ 2024-10-17 18:19 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar, x86@kernel.org, linux-arm-kernel@lists.infradead.org, Doebel, Bjoern, Mohamed Abuelfotoh, Hazem, Blake, Geoff, Saidi, Ali, Csoma, Csaba, gautham.shenoy@amd.com On 2024-10-17, 04:11, "Peter Zijlstra" <peterz@infradead.org> wrote: >> For example, running mysql+hammerdb results in a 12-17% throughput > Gautham, is this a benchmark you're running? Most of my testing for this investigation is on mysql+hammerdb because it simplifies differentiating statistically meaningful results, but performance impact (and improvement from disabling the two features) also shows on workloads based on postgresql and on wordpress+nginx. > How does using SCHED_BATCH compare? I haven't tested with SCHED_BATCH yet, will update the thread with results as they accumulate (each variation of the test takes multiple hours, not counting result processing and evaluation). Looking at man sched for SCHED_BATCH: "the scheduler will apply a small scheduling penalty with respect to wakeup behavior, so that this thread is mildly disfavored in scheduling decisions". Would this correctly translate to "the thread will run more deterministically, but be scheduled less frequently than other threads", i.e. expectedly lower performance in exchange for less variability? > So disabling them by default will undoubtedly affect a ton of other > workloads. That's very likely either way, as the testing space is near infinite, but it seems more practical to first address the issue we already know about. At this time, I don't have any data points to indicate a negative impact of disabling them for popular production workloads (as opposed to the flip case). More testing is in progress (looking at the major areas: workloads heavy on CPU, RAM, disk, and networking); so far, the results show no downside. > And sysctl is arguably more of an ABI than debugfs, which > doesn't really sound suitable for workaround. > > And I don't see how adding a line to /etc/rc.local is harder than adding > a line to /etc/sysctl.conf Adding a line is equally difficult both ways, you're right. But aren't most distros better equipped to manage (persist, modify, automate) sysctl parameters in a standardized manner? Whereas rc.local seems more "individual need / edge case" oriented. For instance: changes are done by editing the file, which is poorly scriptable (unlike the sysctl command, which is a unified interface that reconciles changes); the load order is also typically late in the boot stage, making it not an ideal place for settings that affect system processes. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-17 18:19 ` Prundeanu, Cristian @ 2024-10-18 7:07 ` K Prateek Nayak 2024-10-18 9:54 ` Mohamed Abuelfotoh, Hazem 1 sibling, 0 replies; 27+ messages in thread From: K Prateek Nayak @ 2024-10-18 7:07 UTC (permalink / raw) To: Prundeanu, Cristian, Peter Zijlstra Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar, x86@kernel.org, linux-arm-kernel@lists.infradead.org, Doebel, Bjoern, Mohamed Abuelfotoh, Hazem, Blake, Geoff, Saidi, Ali, Csoma, Csaba, gautham.shenoy@amd.com Hello Christian, On 10/17/2024 11:49 PM, Prundeanu, Cristian wrote: > On 2024-10-17, 04:11, "Peter Zijlstra" <peterz@infradead.org> wrote: > >>> For example, running mysql+hammerdb results in a 12-17% throughput >> Gautham, is this a benchmark you're running? Most of our testing used sysbench as the benchmark driver. How does mysql+hammerdb work specifically? Do the tasks driving the request are located on a separate server or are co-located with the benchmarks threads on the same server? Most of our testing uses affinity to make sure the drivers do not run on same CPUs as the workload threads. If the two can run on the same CPU, then we have observed interesting behavior with a wide amount of deviation. > > Most of my testing for this investigation is on mysql+hammerdb because it > simplifies differentiating statistically meaningful results, but > performance impact (and improvement from disabling the two features) also > shows on workloads based on postgresql and on wordpress+nginx. Did you see any glaring changes in scheduler statistics with the introduction of EEVDF in v6.6? EEVDF commits up till v6.9 were easy to revert from my experience but I've not tried it on v6.12-rcX with the EEVDF complete series. Is all the regression seen purely attributable to EEVDF alone on the more recent kernels? > >> How does using SCHED_BATCH compare? > > I haven't tested with SCHED_BATCH yet, will update the thread with results > as they accumulate (each variation of the test takes multiple hours, not > counting result processing and evaluation). Could you also test running with: echo NO_WAKEUP_PREEMPTION > /sys/kernel/debug/sched/features In our testing, the using SCHED_BATCH prevents aggressive wakeup preemption, and those benchmarks also showed improvements with NO_WAKEUP_PREEMPTION. On a side note, what is the CONFIG_HZ and the preemption model on your test kernel (most of my testing was with CONFIG+HZ=250, voluntary preemption) > > Looking at man sched for SCHED_BATCH: "the scheduler will apply a small > scheduling penalty with respect to wakeup behavior, so that this thread is > mildly disfavored in scheduling decisions". Would this correctly translate > to "the thread will run more deterministically, but be scheduled less > frequently than other threads", i.e. expectedly lower performance in > exchange for less variability? > >> So disabling them by default will undoubtedly affect a ton of other >> workloads. > > That's very likely either way, as the testing space is near infinite, but > it seems more practical to first address the issue we already know about. RUN_TO_PARITY was introduced when Chenyu discovered that a large regression in blogbench reported by Intel Test Robot (https://lore.kernel.org/all/202308101628.7af4631a-oliver.sang@intel.com/) was the result of very aggressive wakeup preemption (https://lore.kernel.org/all/ZNWgAeN%2FEVS%2FvOLi@chenyu5-mobl2.bbrouter/) The data in the latter link helped root-cause the actual issue with the algorithm that the benchmark disliked. Similar information for the database benchmarks you are running, can help narrow down the issue. > > At this time, I don't have any data points to indicate a negative > impact of disabling them for popular production workloads (as opposed to > the flip case). More testing is in progress (looking at the major areas: > workloads heavy on CPU, RAM, disk, and networking); so far, the results > show no downside. Analyzing your approach, what you are essentially doing with the two sched features is as follows: o NO_PLACE_LAG - Without place lag, a newly enqueued entity will always start from the avg_vruntime point in the task timeline i.e., it will always be eligible at the time of enqueue. o NO_RUN_TO_PARITY - Do not run the current task until the vruntime meets its deadline after the first pick. Instead, preempt the current running task if it found to be ineligible at the time of wakeup. From what I can tell, your benchmark has a set of threads that like to get cpu time as fast as possible. With EEVDF Complete (I would recommend using current tip:sched/urgent branch to test them out) setting a more aggressive nice value to these threads should enable them to negate the effect of RUN_TO_PARITY thanks to PREEMPT_SHORT. As for NO_PLACE_LAG, the DELAY_DEQUEUE feature should help task shed off any lag it has built up and should very likely start from the zero-lag point unless it is a very short sleeper. > >> And sysctl is arguably more of an ABI than debugfs, which >> doesn't really sound suitable for workaround. >> >> And I don't see how adding a line to /etc/rc.local is harder than adding >> a line to /etc/sysctl.conf > > Adding a line is equally difficult both ways, you're right. But aren't > most distros better equipped to manage (persist, modify, automate) sysctl > parameters in a standardized manner? > Whereas rc.local seems more "individual need / edge case" oriented. For > instance: changes are done by editing the file, which is poorly scriptable > (unlike the sysctl command, which is a unified interface that reconciles > changes); the load order is also typically late in the boot stage, Is there any reason to flip it very early into the boot? Have you seen anything go awry with system processes during boot with EEVDF? > making > it not an ideal place for settings that affect system processes. > -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-17 18:19 ` Prundeanu, Cristian 2024-10-18 7:07 ` K Prateek Nayak @ 2024-10-18 9:54 ` Mohamed Abuelfotoh, Hazem 1 sibling, 0 replies; 27+ messages in thread From: Mohamed Abuelfotoh, Hazem @ 2024-10-18 9:54 UTC (permalink / raw) To: Prundeanu, Cristian, Peter Zijlstra Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar, x86@kernel.org, linux-arm-kernel@lists.infradead.org, Doebel, Bjoern, Blake, Geoff, Saidi, Ali, Csoma, Csaba, gautham.shenoy@amd.com >> And sysctl is arguably more of an ABI than debugfs, which >> doesn't really sound suitable for workaround. >> >> And I don't see how adding a line to /etc/rc.local is harder than adding >> a line to /etc/sysctl.conf > > Adding a line is equally difficult both ways, you're right. But aren't > most distros better equipped to manage (persist, modify, automate) sysctl > parameters in a standardized manner? > Whereas rc.local seems more "individual need / edge case" oriented. For > instance: changes are done by editing the file, which is poorly scriptable > (unlike the sysctl command, which is a unified interface that reconciles > changes); the load order is also typically late in the boot stage, making > it not an ideal place for settings that affect system processes. > I'd add to what Cristian mentioned is that having these tunables as sysctls will make them more detectable to the end users because checking output of sysctl -a is usually one of the first steps during performance troubleshooting vs checking /sys/kernel/debug/sched/ files so it's easier for people to spot these configurations as sysctls if they notice performance difference after upgrading the kernel. Hazem ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-17 5:19 Cristian Prundeanu 2024-10-17 9:10 ` Peter Zijlstra @ 2024-11-14 20:10 ` Joseph Salisbury 2024-11-19 10:29 ` Dietmar Eggemann 2024-11-25 11:35 ` Cristian Prundeanu 2 siblings, 1 reply; 27+ messages in thread From: Joseph Salisbury @ 2024-11-14 20:10 UTC (permalink / raw) To: Cristian Prundeanu, linux-tip-commits Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, x86, linux-arm-kernel, Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi, Csaba Csoma On 10/17/24 01:19, Cristian Prundeanu wrote: > This patchset disables the scheduler features PLACE_LAG and RUN_TO_PARITY > and moves them to sysctl. > > Replacing CFS with the EEVDF scheduler in kernel 6.6 introduced > significant performance degradation in multiple database-oriented > workloads. This degradation manifests in all kernel versions using EEVDF, > across multiple Linux distributions, hardware architectures (x86_64, > aarm64, amd64), and CPU generations. > > For example, running mysql+hammerdb results in a 12-17% throughput > reduction and 12-18% latency increase compared to kernel 6.5 (using > default scheduler settings everywhere). The magnitude of this performance > impact is comparable to the average performance difference of a CPU > generation over its predecessor. > > Testing combinations of available scheduler features showed that the > largest improvement (short of disabling all EEVDF features) came from > disabling both PLACE_LAG and RUN_TO_PARITY: > > Kernel | default | NO_PLACE_LAG and > aarm64 | config | NO_RUN_TO_PARITY > ---------+----------+----------------- > 6.5 | baseline | N/A > 6.6 | -13.2% | -6.8% > 6.7 | -13.1% | -6.0% > 6.8 | -12.3% | -6.5% > 6.9 | -12.7% | -6.9% > 6.10 | -13.5% | -5.8% > 6.11 | -12.6% | -5.8% > 6.12-rc2 | -12.2% | -8.9% > ---------+----------+----------------- > > Kernel | default | NO_PLACE_LAG and > x86_64 | config | NO_RUN_TO_PARITY > ---------+----------+----------------- > 6.5 | baseline | N/A > 6.6 | -16.8% | -10.8% > 6.7 | -16.4% | -9.9% > 6.8 | -17.2% | -9.5% > 6.9 | -17.4% | -9.7% > 6.10 | -16.5% | -9.0% > 6.11 | -15.0% | -8.5% > 6.12-rc2 | -12.7% | -10.9% > ---------+----------+----------------- > > While the long term approach is debugging and fixing the scheduler > behavior, algorithm changes to address performance issues of this nature > are specialized (and likely prolonged or open-ended) research. Until a > change is identified which fixes the performance degradation, in the > interest of a better out-of-the-box performance: (1) disable these > features by default, and (2) expose these values in sysctl instead of > debugfs, so they can be more easily persisted across reboots. > > Cristian Prundeanu (2): > sched: Disable PLACE_LAG and RUN_TO_PARITY > sched: Move PLACE_LAG and RUN_TO_PARITY to sysctl > > include/linux/sched/sysctl.h | 8 ++++++++ > kernel/sched/core.c | 13 +++++++++++++ > kernel/sched/fair.c | 5 +++-- > kernel/sched/features.h | 10 ---------- > kernel/sysctl.c | 20 ++++++++++++++++++++ > 5 files changed, 44 insertions(+), 12 deletions(-) > Hi Cristian, This is a confirmation that we are also seeing a 9% performance regression with the TPCC benchmark after v6.6-rc1. We narrowed down the regression was caused due to commit: 86bfbb7ce4f6 ("sched/fair: Add lag based placement") This regression was reported via this thread: https://lore.kernel.org/lkml/1c447727-92ed-416c-bca1-a7ca0974f0df@oracle.com/ Phil Auld suggested to try turning off the PLACE_LAG sched feature. We tested with NO_PLACE_LAG and can confirm it brought back 5% of the performance loss. We do not yet know what effect NO_PLACE_LAG will have on other benchmarks, but it indeed helps TPCC. Thanks for the work to move PLACE_LAG and RUN_TO_PARITY to sysctl! Joe ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-11-14 20:10 ` Joseph Salisbury @ 2024-11-19 10:29 ` Dietmar Eggemann 0 siblings, 0 replies; 27+ messages in thread From: Dietmar Eggemann @ 2024-11-19 10:29 UTC (permalink / raw) To: Joseph Salisbury, Cristian Prundeanu, linux-tip-commits Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, x86, linux-arm-kernel, Bjoern Doebel, Hazem Mohamed Abuelfotoh, Geoff Blake, Ali Saidi, Csaba Csoma On 14/11/2024 21:10, Joseph Salisbury wrote: Hi Joseph, > On 10/17/24 01:19, Cristian Prundeanu wrote: [...] > Hi Cristian, > > This is a confirmation that we are also seeing a 9% performance > regression with the TPCC benchmark after v6.6-rc1. We narrowed down the > regression was caused due to commit: > 86bfbb7ce4f6 ("sched/fair: Add lag based placement") > > This regression was reported via this thread: > https://lore.kernel.org/lkml/1c447727-92ed-416c-bca1-a7ca0974f0df@oracle.com/ > > Phil Auld suggested to try turning off the PLACE_LAG sched feature. We > tested with NO_PLACE_LAG and can confirm it brought back 5% of the > performance loss. We do not yet know what effect NO_PLACE_LAG will have > on other benchmarks, but it indeed helps TPCC. Can you try to run mysql in SCHED_BATCH when using EEVDF? https://lkml.kernel.org/r/20241029045749.37257-1-cpru@amazon.com The regression went away for me when changing mysql threads to SCHED_BATCH. You can either start mysql with 'CPUSchedulingPolicy=batch': #cat /etc/systemd/system/mysql.service [Service] CPUSchedulingPolicy=batch ExecStart=/usr/local/mysql/bin/mysqld_safe # systemctl daemon-reload # systemctl restart mysql or change the policy with chrt for all mysql threads when doing consecutive test runs starting from the 2. run ('connection' threads have to exists already) # chrt -b -a -p 0 $PID_MYSQL # ps -p $PID_MYSQL -To comm,pid,tid,nice,class COMMAND PID TID NI CLS mysqld 4872 4872 0 B ib_io_ibuf 4872 4878 0 B ... xpl_accept-3 4872 4921 0 B connection 4872 5007 0 B ... connection 4872 5413 0 B My hunch is that this is due to the 'connection' threads (1 per virtual user) running in SCHED_BATCH. I yet have to confirm this by only changing the 'connection' tasks to SCHED_BATCH. [..] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-10-17 5:19 Cristian Prundeanu 2024-10-17 9:10 ` Peter Zijlstra 2024-11-14 20:10 ` Joseph Salisbury @ 2024-11-25 11:35 ` Cristian Prundeanu 2024-11-26 3:58 ` K Prateek Nayak ` (2 more replies) 2 siblings, 3 replies; 27+ messages in thread From: Cristian Prundeanu @ 2024-11-25 11:35 UTC (permalink / raw) To: cpru Cc: kprateek.nayak, abuehaze, alisaidi, benh, blakgeof, csabac, doebel, gautham.shenoy, joseph.salisbury, dietmar.eggemann, linux-arm-kernel, linux-kernel, linux-tip-commits, mingo, peterz, x86 Here are more results with recent 6.12 code, and also using SCHED_BATCH. The control tests were run anew on Ubuntu 22.04 with the current pre-built kernels 6.5 (baseline) and 6.8 (regression out of the box). When updating mysql from 8.0.30 to 8.4.2, the regression grew even larger. Disabling PLACE_LAG and RUN _TO_PARITY improved the results more than using SCHED_BATCH. Kernel | default | NO_PLACE_LAG and | SCHED_BATCH | mysql | config | NO_RUN_TO_PARITY | | version ---------+----------+------------------+-------------+--------- 6.8 | -15.3% | | | 8.0.30 6.12-rc7 | -11.4% | -9.2% | -11.6% | 8.0.30 | | | | 6.8 | -18.1% | | | 8.4.2 6.12-rc7 | -14.0% | -10.2% | -12.7% | 8.4.2 ---------+----------+------------------+-------------+--------- Confidence intervals for all tests are smaller than +/- 0.5%. I expect to have the repro package ready by the end of the week. Thank you for your collective patience and efforts to confirm these results. On 2024-11-01, Peter Zijlstra wrote: >> (At the risk of stating the obvious, using SCHED_BATCH only to get back to >> the default CFS performance is still only a workaround, > > It is not really -- it is impossible to schedule all the various > workloads without them telling us what they really like. The quest is to > find interfaces that make sense and are implementable. But fundamentally > tasks will have to start telling us what they need. We've long since ran > out of crystal balls. Completely agree that the best performance is obtained when the tasks are individually tuned to the scheduler and explicitly set running parameters. This isn't different from before. But shouldn't our gold standard for default performance be CFS? There is a significant regression out of the box when using EEVDF; how is seeking additional tuning just to recover the lost performance not a workaround? (Not to mention that this additional tuning means shifting the burden on many users who may not be familiar enough with scheduler functionality. We're essentially asking everyone to spend considerable effort to maintain status quo from kernel 6.5.) On 2024-11-14, Joseph Salisbury wrote: > This is a confirmation that we are also seeing a 9% performance > regression with the TPCC benchmark after v6.6-rc1. We narrowed down the > regression was caused due to commit: > 86bfbb7ce4f6 ("sched/fair: Add lag based placement") > > This regression was reported via this thread: > https://lore.kernel.org/lkml/1c447727-92ed-416c-bca1-a7ca0974f0df@oracle.com/ > > Phil Auld suggested to try turning off the PLACE_LAG sched feature. We > tested with NO_PLACE_LAG and can confirm it brought back 5% of the > performance loss. We do not yet know what effect NO_PLACE_LAG will have > on other benchmarks, but it indeed helps TPCC. Thank you for confirming the regression. I've been monitoring performance on the v6.12-rcX tags since this thread started, and the results have been largely constant. I've also tested other benchmarks to verify whether (1) the regression exists and (2) the patch proposed in this thread negatively affects them. On postgresql and wordpress/nginx there is a regression which is improved when applying the patch; on mongo and mariadb no regression manifested, and the patch did not make their performance worse. On 2024-11-19, Dietmar Eggemann wrote: > #cat /etc/systemd/system/mysql.service > > [Service] > CPUSchedulingPolicy=batch > ExecStart=/usr/local/mysql/bin/mysqld_safe This is the approach I used as well to get the results above. > My hunch is that this is due to the 'connection' threads (1 per virtual > user) running in SCHED_BATCH. I yet have to confirm this by only > changing the 'connection' tasks to SCHED_BATCH. Did you have a chance to run with this scenario? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-11-25 11:35 ` Cristian Prundeanu @ 2024-11-26 3:58 ` K Prateek Nayak 2024-11-26 15:12 ` Dietmar Eggemann 2024-11-28 10:32 ` Cristian Prundeanu 2 siblings, 0 replies; 27+ messages in thread From: K Prateek Nayak @ 2024-11-26 3:58 UTC (permalink / raw) To: Cristian Prundeanu Cc: abuehaze, alisaidi, benh, blakgeof, csabac, doebel, gautham.shenoy, joseph.salisbury, dietmar.eggemann, linux-arm-kernel, linux-kernel, linux-tip-commits, mingo, peterz, x86 Hello Cristian, On 11/25/2024 5:05 PM, Cristian Prundeanu wrote: > Here are more results with recent 6.12 code, and also using SCHED_BATCH. > The control tests were run anew on Ubuntu 22.04 with the current pre-built > kernels 6.5 (baseline) and 6.8 (regression out of the box). > > When updating mysql from 8.0.30 to 8.4.2, the regression grew even larger. > Disabling PLACE_LAG and RUN _TO_PARITY improved the results more than > using SCHED_BATCH. > > Kernel | default | NO_PLACE_LAG and | SCHED_BATCH | mysql > | config | NO_RUN_TO_PARITY | | version > ---------+----------+------------------+-------------+--------- > 6.8 | -15.3% | | | 8.0.30 > 6.12-rc7 | -11.4% | -9.2% | -11.6% | 8.0.30 > | | | | > 6.8 | -18.1% | | | 8.4.2 > 6.12-rc7 | -14.0% | -10.2% | -12.7% | 8.4.2 > ---------+----------+------------------+-------------+--------- > > Confidence intervals for all tests are smaller than +/- 0.5%. > > I expect to have the repro package ready by the end of the week. Thank you > for your collective patience and efforts to confirm these results. Thank you! In the meantime, there is a new enhancement to perf-tool being proposed to use the data from /proc/schedstat to profile workloads and spot any obvious changes in the scheduling behavior at https://lore.kernel.org/lkml/20241122084452.1064968-1-swapnil.sapkal@amd.com/ It applies cleanly on tip:sched/core at tag "sched-core-2024-11-18" Would it be possible to use the perf-tool built there to collect the scheduling stats for MySQL benchmark runs on both v6.5 and v6.8 and share the output of "perf sched stats diff" and the two perf.data files recorded? It would help narrow down the regression if this can be linked to a system-wide behavior. Data from a run with NO_PLACE_LAG and NO_RUN_TO_PARITY can also help look at metrics that are helping improve the performance combared to vanilla v6.8 case. The proposed perf-tools changes are arch agnostic and should work on any system as long as it has /proc/schedstats with version 15 and above. > > [..snip..] > -- Thanks and Regards, Prateek ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-11-25 11:35 ` Cristian Prundeanu 2024-11-26 3:58 ` K Prateek Nayak @ 2024-11-26 15:12 ` Dietmar Eggemann 2024-11-28 10:32 ` Cristian Prundeanu 2 siblings, 0 replies; 27+ messages in thread From: Dietmar Eggemann @ 2024-11-26 15:12 UTC (permalink / raw) To: Cristian Prundeanu Cc: kprateek.nayak, abuehaze, alisaidi, benh, blakgeof, csabac, doebel, gautham.shenoy, joseph.salisbury, linux-arm-kernel, linux-kernel, linux-tip-commits, mingo, peterz, x86 On 25/11/2024 12:35, Cristian Prundeanu wrote: > Here are more results with recent 6.12 code, and also using SCHED_BATCH. > The control tests were run anew on Ubuntu 22.04 with the current pre-built > kernels 6.5 (baseline) and 6.8 (regression out of the box). > > When updating mysql from 8.0.30 to 8.4.2, the regression grew even larger. > Disabling PLACE_LAG and RUN _TO_PARITY improved the results more than > using SCHED_BATCH. > > Kernel | default | NO_PLACE_LAG and | SCHED_BATCH | mysql > | config | NO_RUN_TO_PARITY | | version > ---------+----------+------------------+-------------+--------- > 6.8 | -15.3% | | | 8.0.30 > 6.12-rc7 | -11.4% | -9.2% | -11.6% | 8.0.30 > | | | | > 6.8 | -18.1% | | | 8.4.2 > 6.12-rc7 | -14.0% | -10.2% | -12.7% | 8.4.2 > ---------+----------+------------------+-------------+--------- > > Confidence intervals for all tests are smaller than +/- 0.5%. > > I expect to have the repro package ready by the end of the week. Thank you > for your collective patience and efforts to confirm these results. The results I got look different: SUT kernel arm64 (mysql-8.4.0) (1) 6.5.13 baseline (2) 6.12.0-rc4 -12.9% (3) 6.12.0-rc4 NO_PLACE_LAG +6.4% (4) v6.12-rc4 SCHED_BATCH +10.8% 5 test runs each: confidence level (95%) <= ±0.56% (2) is still in sync but (3)/(4) looks way better for me. Maybe a difference in our test setup can explain the different test results: I use: HammerDB Load Generator <-> MySQL SUT 192 VCPUs <-> 16 VCPUs Virtual users: 256 Warehouse count: 64 3 min rampup 10 min test run time performance data: NOPM (New Operations Per Minute) So I have 256 'connection' tasks running on the 16 SUT VCPUS. > On 2024-11-01, Peter Zijlstra wrote: > >>> (At the risk of stating the obvious, using SCHED_BATCH only to get back to >>> the default CFS performance is still only a workaround, >> >> It is not really -- it is impossible to schedule all the various >> workloads without them telling us what they really like. The quest is to >> find interfaces that make sense and are implementable. But fundamentally >> tasks will have to start telling us what they need. We've long since ran >> out of crystal balls. > > Completely agree that the best performance is obtained when the tasks are > individually tuned to the scheduler and explicitly set running parameters. > This isn't different from before. > > But shouldn't our gold standard for default performance be CFS? There is a > significant regression out of the box when using EEVDF; how is seeking > additional tuning just to recover the lost performance not a workaround? > > (Not to mention that this additional tuning means shifting the burden on > many users who may not be familiar enough with scheduler functionality. > We're essentially asking everyone to spend considerable effort to maintain > status quo from kernel 6.5.) > > > On 2024-11-14, Joseph Salisbury wrote: > >> This is a confirmation that we are also seeing a 9% performance >> regression with the TPCC benchmark after v6.6-rc1. We narrowed down the >> regression was caused due to commit: >> 86bfbb7ce4f6 ("sched/fair: Add lag based placement") >> >> This regression was reported via this thread: >> https://lore.kernel.org/lkml/1c447727-92ed-416c-bca1-a7ca0974f0df@oracle.com/ >> >> Phil Auld suggested to try turning off the PLACE_LAG sched feature. We >> tested with NO_PLACE_LAG and can confirm it brought back 5% of the >> performance loss. We do not yet know what effect NO_PLACE_LAG will have >> on other benchmarks, but it indeed helps TPCC. > > Thank you for confirming the regression. I've been monitoring performance > on the v6.12-rcX tags since this thread started, and the results have been > largely constant. > > I've also tested other benchmarks to verify whether (1) the regression > exists and (2) the patch proposed in this thread negatively affects them. > On postgresql and wordpress/nginx there is a regression which is improved > when applying the patch; on mongo and mariadb no regression manifested, and > the patch did not make their performance worse. > > > On 2024-11-19, Dietmar Eggemann wrote: > >> #cat /etc/systemd/system/mysql.service >> >> [Service] >> CPUSchedulingPolicy=batch >> ExecStart=/usr/local/mysql/bin/mysqld_safe > > This is the approach I used as well to get the results above. OK. >> My hunch is that this is due to the 'connection' threads (1 per virtual >> user) running in SCHED_BATCH. I yet have to confirm this by only >> changing the 'connection' tasks to SCHED_BATCH. > > Did you have a chance to run with this scenario? Yeah, I did. The results where worse than running all mysqld threads in SCHED_BATCH but still better than the baseline. (5) v6.12-rc4 'connection' tasks in SCHED_BATCH +6.8% ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-11-25 11:35 ` Cristian Prundeanu 2024-11-26 3:58 ` K Prateek Nayak 2024-11-26 15:12 ` Dietmar Eggemann @ 2024-11-28 10:32 ` Cristian Prundeanu 2024-11-29 10:12 ` Dietmar Eggemann 2 siblings, 1 reply; 27+ messages in thread From: Cristian Prundeanu @ 2024-11-28 10:32 UTC (permalink / raw) To: cpru Cc: abuehaze, alisaidi, benh, blakgeof, csabac, dietmar.eggemann, doebel, gautham.shenoy, joseph.salisbury, kprateek.nayak, linux-arm-kernel, linux-kernel, linux-tip-commits, mingo, peterz, x86 On 2024-11-26, K Prateek Nayak wrote: > Would it be possible to use the perf-tool built there to collect > the scheduling stats for MySQL benchmark runs on both v6.5 and v6.8 and > share the output of "perf sched stats diff" and the two perf.data files > recorded? I'll add this to the list of my next tests. Thank you for mentioning it! On 2024-11-26, Dietmar Eggemann wrote: > SUT kernel arm64 (mysql-8.4.0) > (2) 6.12.0-rc4 -12.9% > (3) 6.12.0-rc4 NO_PLACE_LAG +6.4% > (4) v6.12-rc4 SCHED_BATCH +10.8% This is very interesting; our setups are close, yet I have not seen any feature or policy combination that performs above the 6.5 CFS baseline. I look forward to seeing your results with the repro when it's ready. Did you only use NO_PLACE_LAG or was it together with NO_RUN_TO_PARITY? Was SCHED_BATCH used with the default feature set (all enabled)? Which distro/version did you use for the SUT? > Maybe a difference in our test setup can explain the different test results: > > I use: > > HammerDB Load Generator <-> MySQL SUT > 192 VCPUs <-> 16 VCPUs > > Virtual users: 256 > Warehouse count: 64 > 3 min rampup > 10 min test run time > performance data: NOPM (New Operations Per Minute) > > So I have 256 'connection' tasks running on the 16 SUT VCPUS. My setup: SUT - 16 vCPUs, 32 GB RAM Loadgen - 64 vCPU, 128 GB RAM (anything large enough to not be a bottleneck should work) Virtual users: 4 x vCPUs = 64 Warehouses: 24 Rampup: 5 min Test runtime: 20 min x 10 times, each on 4 different SUT/Loadgen pairs Value recorded: geometric_mean(NOPM) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl 2024-11-28 10:32 ` Cristian Prundeanu @ 2024-11-29 10:12 ` Dietmar Eggemann 0 siblings, 0 replies; 27+ messages in thread From: Dietmar Eggemann @ 2024-11-29 10:12 UTC (permalink / raw) To: Cristian Prundeanu Cc: abuehaze, alisaidi, benh, blakgeof, csabac, doebel, gautham.shenoy, joseph.salisbury, kprateek.nayak, linux-arm-kernel, linux-kernel, linux-tip-commits, mingo, peterz, x86 On 28/11/2024 11:32, Cristian Prundeanu wrote: [...] > On 2024-11-26, Dietmar Eggemann wrote: > >> SUT kernel arm64 (mysql-8.4.0) >> (2) 6.12.0-rc4 -12.9% >> (3) 6.12.0-rc4 NO_PLACE_LAG +6.4% >> (4) v6.12-rc4 SCHED_BATCH +10.8% > > This is very interesting; our setups are close, yet I have not seen any > feature or policy combination that performs above the 6.5 CFS baseline. > I look forward to seeing your results with the repro when it's ready. > > Did you only use NO_PLACE_LAG or was it together with NO_RUN_TO_PARITY? Only NO_PLACE_LAG. > Was SCHED_BATCH used with the default feature set (all enabled)? Yes. > Which distro/version did you use for the SUT? The default, Ubuntu 24.04 Arm64 server. >> Maybe a difference in our test setup can explain the different test results: >> >> I use: >> >> HammerDB Load Generator <-> MySQL SUT >> 192 VCPUs <-> 16 VCPUs >> >> Virtual users: 256 >> Warehouse count: 64 >> 3 min rampup >> 10 min test run time >> performance data: NOPM (New Operations Per Minute) >> >> So I have 256 'connection' tasks running on the 16 SUT VCPUS. > > My setup: > > SUT - 16 vCPUs, 32 GB RAM > Loadgen - 64 vCPU, 128 GB RAM (anything large enough to not be a > bottleneck should work) > > Virtual users: 4 x vCPUs = 64 > Warehouses: 24 > Rampup: 5 min > Test runtime: 20 min x 10 times, each on 4 different SUT/Loadgen pairs > Value recorded: geometric_mean(NOPM) Looks like you have 4 times less 'connection' tasks on your 16 VCPUs. So much less concurrency/preemption ... ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-02-12 23:01 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250119110410.GAZ4zcKkx5sCjD5XvH@fat_crate.local>
2025-01-28 23:09 ` [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them to sysctl Cristian Prundeanu
2025-02-11 3:27 ` K Prateek Nayak
2025-02-12 5:41 ` Cristian Prundeanu
2025-02-12 9:43 ` Peter Zijlstra
2025-02-12 5:36 ` [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY " Cristian Prundeanu
2025-02-12 9:17 ` Peter Zijlstra
2025-02-12 9:37 ` Peter Zijlstra
2025-02-12 23:00 ` Cristian Prundeanu
[not found] <C0E39DE3-EEEB-4A08-850F-A4B7EC809E3A@amazon.com>
2024-10-24 8:12 ` [PATCH 0/2] [tip: sched/core] sched: Disable PLACE_LAG and RUN_TO_PARITY and move them " Benjamin Herrenschmidt
2024-10-25 14:43 ` Gautham R. Shenoy
2024-10-29 4:57 ` Cristian Prundeanu
2024-10-30 10:21 ` Dietmar Eggemann
2024-11-01 13:05 ` Peter Zijlstra
2024-11-04 10:19 ` Gautham R. Shenoy
2024-11-04 10:34 ` K Prateek Nayak
2024-10-17 5:19 Cristian Prundeanu
2024-10-17 9:10 ` Peter Zijlstra
2024-10-17 18:19 ` Prundeanu, Cristian
2024-10-18 7:07 ` K Prateek Nayak
2024-10-18 9:54 ` Mohamed Abuelfotoh, Hazem
2024-11-14 20:10 ` Joseph Salisbury
2024-11-19 10:29 ` Dietmar Eggemann
2024-11-25 11:35 ` Cristian Prundeanu
2024-11-26 3:58 ` K Prateek Nayak
2024-11-26 15:12 ` Dietmar Eggemann
2024-11-28 10:32 ` Cristian Prundeanu
2024-11-29 10:12 ` Dietmar Eggemann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).