From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C2C91531C5; Tue, 17 Sep 2024 21:37:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726609080; cv=none; b=mJOM1f0W7LKcCRzp6LAQUBz8QjvM2bccbNcmUMcKYbWAy5EJFONNenX8H757ZgWFiaDn23++IT5ror9reM8jOv/9euo705nyYquDQlPMujfjkvIFF+Ls35xH4vMmwNCwS/h2kXfMsZo0yHPOdiJTR5y/iJSN7BC2sLDnKrJ7cG8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726609080; c=relaxed/simple; bh=r4yu7OaAkw2nHJ5m2iWzbbbfoy+QdCsQhP0wHKG9pyw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=NID6YwyY8d2Y1OdK+ekWNQLoy8dij6OA75GmO69yPdGl/3Mz55+qXgFZ8l/SX+yJQDMalWQX4ha3Tu8RhABVfcyDFRV3/pvPcfi/HQsedLCFnEBiBeMgH/zLWOchoXWkZIGj4ZvNLtqiATiEXM9yvMcdrTKeVxFW5rPjbJMkYdM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IWphPGPh; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IWphPGPh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726609078; x=1758145078; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=r4yu7OaAkw2nHJ5m2iWzbbbfoy+QdCsQhP0wHKG9pyw=; b=IWphPGPhURbjathUSkcE5b4OwyGgxz+f6qzJ4J++CgwxHYMNjET3LkvT YzzcgmnMaAoprQ37Ho2gUtL9GJh4PGt0GRp4GnT698EkiCkb//xgHMA16 TTL80GK/zvTtntbo5qn2CBeej8K0WIrTSfoxhR4/MZXNjIp7A+IzmO9Fu LPjZ3kJ2xg9Dl8OnoKVNjlhu2uGViwNMvcuOarigqzW6UEirp/OXmqk3c qHwTOIe09NQK11aZkvlwBhB/48zt5xemGor/68z9uVZ81PiJLF/eX3a4I POxuPvMvfEHjj8N2v2YMccqAuWVqoFOZCyn/Es7BOKQeHCdMPsJmumd8C g==; X-CSE-ConnectionGUID: gM9CNxzQTuaX8+qnJLgaOQ== X-CSE-MsgGUID: sxPmm42lQiqC8pQWeWEXfQ== X-IronPort-AV: E=McAfee;i="6700,10204,11198"; a="25379339" X-IronPort-AV: E=Sophos;i="6.10,235,1719903600"; d="scan'208";a="25379339" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2024 14:37:58 -0700 X-CSE-ConnectionGUID: he5jR1jZScmWRbBMkc2/og== X-CSE-MsgGUID: lIFX4D4WSTSNW9zy31G0Ow== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,235,1719903600"; d="scan'208";a="73424010" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmviesa003.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2024 14:37:57 -0700 Date: Tue, 17 Sep 2024 14:43:37 -0700 From: Ricardo Neri To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , Vincent Guittot , "Rafael J. Wysocki" , Viresh Kumar , Juri Lelli , Steven Rostedt , Dietmar Eggemann , John Stultz , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 11/16] sched/qos: Add rampup multiplier QoS Message-ID: <20240917214337.GA10143@ranerica-svr.sc.intel.com> References: <20240820163512.1096301-1-qyousef@layalina.io> <20240820163512.1096301-12-qyousef@layalina.io> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240820163512.1096301-12-qyousef@layalina.io> User-Agent: Mutt/1.9.4 (2018-02-28) On Tue, Aug 20, 2024 at 05:35:07PM +0100, Qais Yousef wrote: > Bursty tasks are hard to predict. To use resources efficiently, the > system would like to be exact as much as possible. But this poses > a challenge for these bursty tasks that need to get access to more > resources quickly. > > The new SCHED_QOS_RAMPUP_MULTIPLIER allows userspace to do that. As the > name implies, it only helps them to transition to a higher performance > state when they get _busier_. That is perfectly periodic tasks by > definition are not going through a transition and will run at a constant > performance level. It is the tasks that need to transition from one > periodic state to another periodic state that is at a higher level that > this rampup_multiplier will help with. It also slows down the ewma decay > of util_est which should help those bursty tasks to keep their faster > rampup. > > This should work complimentary with uclamp. uclamp tells the system > about min and max perf requirements which can be applied immediately. > > rampup_multiplier is about reactiveness of the task to change. > Specifically to a change for a higher performance level. The task might > necessary need to have a min perf requirements, but it can have sudden > burst of changes that require higher perf level and it needs the system > to provide this faster. > > TODO: update the sched_qos docs > > Signed-off-by: Qais Yousef > --- > include/linux/sched.h | 7 ++++ > include/uapi/linux/sched.h | 2 ++ > kernel/sched/core.c | 66 ++++++++++++++++++++++++++++++++++++++ > kernel/sched/fair.c | 6 ++-- > kernel/sched/syscalls.c | 38 ++++++++++++++++++++-- > 5 files changed, 115 insertions(+), 4 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 2e8c5a9ffa76..a30ee43a25fb 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -404,6 +404,11 @@ struct sched_info { > #endif /* CONFIG_SCHED_INFO */ > }; > > +struct sched_qos { > + DECLARE_BITMAP(user_defined, SCHED_QOS_MAX); > + unsigned int rampup_multiplier; > +}; > + > /* > * Integer metrics need fixed point arithmetic, e.g., sched/fair > * has a few: load, load_avg, util_avg, freq, and capacity. > @@ -882,6 +887,8 @@ struct task_struct { > > struct sched_info sched_info; > > + struct sched_qos sched_qos; > + > struct list_head tasks; > #ifdef CONFIG_SMP > struct plist_node pushable_tasks; > diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h > index 67ef99f64ddc..0baba91ba5b8 100644 > --- a/include/uapi/linux/sched.h > +++ b/include/uapi/linux/sched.h > @@ -104,6 +104,8 @@ struct clone_args { > }; > > enum sched_qos_type { > + SCHED_QOS_RAMPUP_MULTIPLIER, > + SCHED_QOS_MAX, > }; > #endif > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index c91e6a62c7ab..54faa845cb29 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -152,6 +152,8 @@ __read_mostly int sysctl_resched_latency_warn_once = 1; > */ > const_debug unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK; > > +unsigned int sysctl_sched_qos_default_rampup_multiplier = 1; > + > __read_mostly int scheduler_running; > > #ifdef CONFIG_SCHED_CORE > @@ -4488,6 +4490,47 @@ static int sysctl_schedstats(struct ctl_table *table, int write, void *buffer, > #endif /* CONFIG_SCHEDSTATS */ > > #ifdef CONFIG_SYSCTL > +static void sched_qos_sync_sysctl(void) > +{ > + struct task_struct *g, *p; > + > + guard(rcu)(); > + for_each_process_thread(g, p) { > + struct rq_flags rf; > + struct rq *rq; > + > + rq = task_rq_lock(p, &rf); > + if (!test_bit(SCHED_QOS_RAMPUP_MULTIPLIER, p->sched_qos.user_defined)) > + p->sched_qos.rampup_multiplier = sysctl_sched_qos_default_rampup_multiplier; > + task_rq_unlock(rq, p, &rf); > + } > +} > + > +static int sysctl_sched_qos_handler(struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos) > +{ > + unsigned int old_rampup_mult; > + int result; > + > + old_rampup_mult = sysctl_sched_qos_default_rampup_multiplier; > + > + result = proc_dointvec(table, write, buffer, lenp, ppos); > + if (result) > + goto undo; > + if (!write) > + return 0; > + > + if (old_rampup_mult != sysctl_sched_qos_default_rampup_multiplier) { > + sched_qos_sync_sysctl(); > + } > + > + return 0; > + > +undo: > + sysctl_sched_qos_default_rampup_multiplier = old_rampup_mult; > + return result; > +} > + > static struct ctl_table sched_core_sysctls[] = { > #ifdef CONFIG_SCHEDSTATS > { > @@ -4534,6 +4577,13 @@ static struct ctl_table sched_core_sysctls[] = { > .extra2 = SYSCTL_FOUR, > }, > #endif /* CONFIG_NUMA_BALANCING */ > + { > + .procname = "sched_qos_default_rampup_multiplier", > + .data = &sysctl_sched_qos_default_rampup_multiplier, > + .maxlen = sizeof(unsigned int), IIUC, user space needs to select a value between 0 and (2^32 - 1). Does this mean that it will need fine-tuning for each product and application? Could there be some translation to a fewer number of QoS levels that are qualitatively? Also, I think about Intel processors. They work with hardware-controlled performance scaling. The proposed interface would help us to communicate per-task multipliers to hardware, but they would be used as hints to hardware and not acted upon by the kernel to scale frequency.