From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CB625C02198 for ; Wed, 12 Feb 2025 23:01:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type: Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date :Subject:CC:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=7clgdrBcayHFAivKWtedT4B5oeFGJNEZp+u1MEeMxvU=; b=IWHzly1V/WeMO3DQ1YDjAAF6YC EueIQZha9Eg14Now6BymBcAa6XezJ/KdE9Z4H4dn8RmVEbc7Ge2l9mnKqSqz/1L+whspGD4wrL5E1 XKyghgQ5g3BFC3GjlM1irvP6QG18CUp0d57J2NWIATA9J9HbOF63u70EKulbPevwZobXKO7SgL1mG F889eprMjTZuAMsVneYEbft6w8TgU/4JvHe4QG+PGJfccFP7MArQCpxz7a6kqnK+s/ixe3njzGePd neVIshj1Vf9eRsdXKNZTeA39wtkCjYZGmS4VPhGXpfytimj4iPyQafRctdMgEFECXXNHwTq0Vjkv5 lDLuC7MA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tiLjl-000000097Lt-1XlU; Wed, 12 Feb 2025 23:01:45 +0000 Received: from smtp-fw-80009.amazon.com ([99.78.197.220]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tiLiK-000000097BV-13G4 for linux-arm-kernel@lists.infradead.org; Wed, 12 Feb 2025 23:00:17 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1739401216; x=1770937216; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7clgdrBcayHFAivKWtedT4B5oeFGJNEZp+u1MEeMxvU=; b=TWiSa5wMPbI1Og4kJRmJUYoY7QKepKqiGMFyUI5SDpvjvgg6Vcda2ssS mmsspNn1fnq81+bYMQWcday67yeVfDYKyNvn+pKniBFUb5NK9zMZZFsAw wwqvZB4SLZuZEapJL1IHTffPR4WiCV9oJyj/bEiK6YNOhEBGmguJ9+b0O Y=; X-IronPort-AV: E=Sophos;i="6.13,281,1732579200"; d="scan'208";a="171993392" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2025 23:00:15 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.38.20:20924] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.9.187:2525] with esmtp (Farcaster) id 46ec316a-84fc-48d3-81a3-a13e85f75502; Wed, 12 Feb 2025 23:00:15 +0000 (UTC) X-Farcaster-Flow-ID: 46ec316a-84fc-48d3-81a3-a13e85f75502 Received: from EX19D016UWA004.ant.amazon.com (10.13.139.119) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Wed, 12 Feb 2025 23:00:15 +0000 Received: from 88665a51a6b2.amazon.com (10.106.179.55) by EX19D016UWA004.ant.amazon.com (10.13.139.119) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Wed, 12 Feb 2025 23:00:12 +0000 From: Cristian Prundeanu To: Peter Zijlstra CC: Cristian Prundeanu , K Prateek Nayak , Hazem Mohamed Abuelfotoh , "Ali Saidi" , Benjamin Herrenschmidt , Geoff Blake , Csaba Csoma , Bjoern Doebel , Gautham Shenoy , Joseph Salisbury , Dietmar Eggemann , Ingo Molnar , Linus Torvalds , Borislav Petkov , , , , Subject: Re: [PATCH v2] [tip: sched/core] sched: Move PLACE_LAG and RUN_TO_PARITY to sysctl Date: Wed, 12 Feb 2025 17:00:02 -0600 Message-ID: <20250212230002.95945-1-cpru@amazon.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250212093721.GA24784@noisy.programming.kicks-ass.net> References: <20250212094307.GB19118@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.106.179.55] X-ClientProxiedBy: EX19D041UWB002.ant.amazon.com (10.13.139.179) To EX19D016UWA004.ant.amazon.com (10.13.139.119) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250212_150016_339417_75D635A4 X-CRM114-Status: GOOD ( 31.44 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org >>> Moving PLACE_LAG and RUN_TO_PARITY to sysctl will allow users to override >>> their default values and persist them with established mechanisms. >> >> Nope -- you have knobs in debugfs, and that's where they'll stay. Esp. >> PLACE_LAG is super dodgy and should not get elevated to anything >> remotely official. > > Just to clarify, the problem with NO_PLACE_LAG is that by discarding > lag, a task can game the system to 'gain' time. It fundamentally breaks > fairness, and the only reason I implemented it at all was because it is > one of the 'official' placement strategies in the original paper. Wouldn't this be an argument in favor of more official positioning of this knob? It may be dodgy, but it's currently the best mitigation option, until something better comes along. > If the tasks are unconstrained / aperiodic, this goes out the window and > the placement strategy becomes unsound. And given we must assume > userspace to be malicious / hostile / unbehaved, the whole thing is just > not good. Userspace in general, absolutely. User intent should be king though, and impairing the ability to do precisely what you want with your machine feels like it stands against what Linux is best known (and often feared) for: configurability. There is _another_ OS which has made a habit of dictating how users should want to do something. We're not there of course, but it's a strong cautionary tale. To ask more specifically, isn't a strong point of EEVDF the fact that it considers _more_ user needs and use cases than CFS (for instance, task lag/latency)? >> Conversely, setting NO_PLACE_LAG + NO_RUN_TO_PARITY is simply done at boot >> time, and does not require further user effort. > > For your workload. It will wreck other workloads. I'd like to invite you to name one real-life workload that would be wrecked by allowing PL and RTP override in sysctl. I can name three that are currently impacted (mysql, postgres, and wordpress), with only poor means (increased effort, non-standard persistence leading to higher maintenance cost, requirement for debugfs) to mitigate the regression. > Yes, SCHED_BATCH might be more fiddly, but it allows for composition. > You can run multiple workloads together and they all behave. Shouldn't we leave that to the user to decide, though? Forcing a new default configuration that only works well with multiple workloads can not be the right thing for everyone - especially for large scale providers, where servers and corresponding images are intended to run one main workload. Importantly, things that used to run well and now don't. > Maybe the right thing here is to get mysql patched; so that it will > request BATCH itself for the threads that need it. For mysql in particular, it's a possible avenue (though I still object to the idea that individual users and vendors now need to put in additional effort to maintain the same performance as before). But on a larger picture, this reproducer is only meant as a simplified illustration of the performance issues. It is not a single occurrence. There are far more complex workloads where tuning at thread level is at best impractical, or even downright impossible. Think of managed clusters where the load distribution and corresponding task density are not user controlled, or JVM workloads where individual threads are not even designed to be managed externally, or containers built from external dependencies where tuning a service is anything but trivial. Are we really saying that everyone just needs to swallow the cost of this change, or put up with the lower performance level? Even if the Linux Kernel doesn't concern itself with business cost, surely at least the time burned on this by both commercial and non-commercial projects cannot be lost on you. > Also, FYI, by keeping these emails threaded in the old thread I nearly > missed them again. I'm not sure where this nonsense of keeping > everything in one thread came from, but it is bloody stupid. Thank you. This is a great opportunity for both of us to relate to the opposing stance on this patch, and I hope you too will see the parallel: My reason for threading was well intended. I value your time and wanted to avoid you wasting it by having to search for the previous patch or older threads on the same topic. However, I ended up inadvertently creating an issue for your use case. It, arguably, doesn't have a noticeable impact on my side, and it could be avoided by you, the user, by configuring your email client to always highlight messages directly addressed to you; assuming that your email client supports it, and you are able and willing to invest the effort to do it. Nevertheless, this doesn't make it right. I do apologize for the annoyance; it was not my intent to put additional burden on you, only to have the same experience or efficiency that you are used to having. I did consolidate the two recent threads into this one though, because I believe that it's easier to follow by everyone else. It may be a silly parallel, but please consider that similar frustration is happening to many users who now are asked to put effort towards bringing performance back to previous levels - if at all possible and feasible - and at the same time are denied the right tools to do so. Please consider that it took years for EEVDF commit messages to go from "horribly messes up things" to "isn't perfect yet, but much closer", and it may take years still until it's as stable, performant and vetted across varied scenarios as CFS was in kernel 6.5. Please consider that along this journey are countless users and groups who would rather not wait for perfection, but have easy means to at least get the same performance they were getting before. -Cristian