From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F078346BF for ; Tue, 28 Jan 2025 16:46:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738082775; cv=none; b=q6BGRPJCHpkiRFIckXNGEjcc0NpkplYdaXfudg7Y9II5Of7z7qCWjcxhF2rIbhRrG5nEAvmnkIgGAOGTZiMjUWdawjTT2HiNqUnb09uShO3981JHqXFtT++zwtdftwpo0rDXTsLI2RkiINhC6BTb31grcxVlgdyuCK1iFhGo7N4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738082775; c=relaxed/simple; bh=PAgsWvx4aAC0LMzFZqrLWS0Rp/vjFXjWelnHhDMJwDs=; h=From:To:Cc:Subject:In-Reply-To:Date:Message-ID:MIME-Version: Content-Type; b=I0d/El8ox1jKrPFEGbAnBxwx0SaYWbZ0ofjDkDL5TQVnGM5Y+Zg1rKbQx5F4ivNSgv9LEMsRb8xJCAkY33YODgYYCEtZ7bRciehYHb7DArbA48b5YNCzo4UI2/ApjEd+XRQWE/6YNqbrVfLkce6c/w8BmctbHIT7vBRDEf1K2t4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=mjcmooEF; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=8tXffNqe; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="mjcmooEF"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="8tXffNqe" From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738082770; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to; bh=FEsfNWIW2N+mQ2YfF2OqNdv8ejJH8nYVUzA0K9foR6o=; b=mjcmooEF+eqYNWlPu+lybi/xn+vG3vh8BydCrMWK+zjpY+pdKg3MbiOz96ioEq4OFgEdbe ebN9N5RD1ggJuiDGNbMQpty01gZGaet34taxNmajh3Jtl/bbATgQhFi5ZfcW7lWcxSvzDv 8NOsoNqvrl7vI5OWeiEcwKyJrAa+HcV2Q2iZK3rJocPMqawIugl00d/xhsmJ8UnJa1W6qK IU+EN9Leig6bBy2dQc4tK5YbM5L9AJXVoJVjqFjRtoMn91ri67vNFaSX8MCgm5yDw1Gr26 5c7vUzyWZNz/yogeLm6p+NC4O7Jl8kpbDu6NrnT1WXBgSS2WB015b/1tXAVFXA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738082770; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to; bh=FEsfNWIW2N+mQ2YfF2OqNdv8ejJH8nYVUzA0K9foR6o=; b=8tXffNqek56LLw2Dndbxu3KEmmdyJW3J033qV30v84FDv2iKU1Nyt50lygE/xFH8WY1MZC +s/jstd981CTEABg== To: John Stultz , LKML Cc: John Stultz , Anna-Maria Behnsen , Frederic Weisbecker , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Stephen Boyd , Yury Norov , Bitao Hu , Andrew Morton , kernel-team@android.com Subject: Re: [RFC][PATCH 0/3] DynamicHZ: Configuring the timer tick rate at boot time In-Reply-To: <20250128063301.3879317-1-jstultz@google.com> Date: Tue, 28 Jan 2025 17:46:10 +0100 Message-ID: <87cyg67up9.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable John! On Mon, Jan 27 2025 at 22:32, John Stultz wrote: > The HZ value has long been a compile time constant. This is > really useful as there is a lot of hotpath code that uses HZ > when setting timers (needing to convert nanoseconds to ticks), > thus the division is much faster with a compile time constant > divisor. To some extent, yes. Though the meaning of the 'HZ tick' has become pretty blury over time. If you actually look at the timer wheel timer usage, then the vast majority is converting SI units based timeouts/delays to ticks and they do not care about the actual tick frequency at all. Just grep for '_to_jiffies()' aside of the tons of 'HZ * $N' places which are sprinkled across the code base. Code which relies on accurate wakeups is mostly using hrtimers anyway. The only two places which are truly tick bound are the scheduler and the timer wheel itself, where the latter is not really about it. > One area that needed adjustments was the cputime accounting, as > it assumes we only account one tick per interrupt, so I=E2=80=99ve > reworked some of that logic to pipe through the actual tick > count. And you got that patently wrong... > However, having to select the system HZ value at build time is > somewhat limiting. Distros have to make choices for their users > as to what the best HZ value would be balancing latency and > power usage. > > With Android, this is a major issue, as we have one GKI binary > that runs across a wide array of devices from top of the line > flagship phones to watches. Balancing the choice for HZ is > difficult, we currently have HZ=3D250, but some devices would love > to have HZ=3D1000, while other devices aren=E2=80=99t willing to pay the > power cost of 4x the timer slots, resulting in shorter idle > times. The shorter idle times are because timer wheel timers wake up more accurately with HZ=3D1000 and not because the scheduler is more agressive? > Also, I've not yet gotten this to work for the fixed > periodic-tick paths (you need a oneshot capable clockevent). Which is not a given on the museum pieces we insist to support just because we can. But with periodic timers it should be easy enough to make clockevents::set_state_periodic() take a tick frequency argument and convert the ~70 callbacks to handle it. > Mostly because in that case we always just increment by a single > tick. While for dyn_hz=3D250 or dyn_hz=3D1000 calculating the > periodic tick count is pretty simple (4 ticks, 10 ticks). But > for dyn_hz=3D300, or other possible values, it doesn=E2=80=99t evenly > divide, so we would have to do a 3,3,4,3,3,4 style interval to > stay on time and I=E2=80=99ve not yet thought through how to do > remainder handling efficiently yet. I doubt you need that. Programming it to the next closest value is good enough and there is no reason to overengineer it for a marginal benefit of "accuracy". But that's obviously not really working with your chosen minimalistic approach. Aside of that, using random HZ values is a pretty academic exercise and HZ=3D300 had been introduced for multimedia to cater for 30FPS. But that was long ago when high resolution timers, NOHZ and modern graphic devices did not exist. I seriously doubt that HZ=3D300 has any actual advantage on modern systems. Sure, I know that SteamOS uses HZ=3D300, but AFAICT from public discussions this just caters to the HZ=3D300 myth and is not backed by any factual evidence that HZ=3D300 is so superior. Quite the contrary there are enough people who actually want HZ=3D1000 for better responsiveness. But let me come back to your proposed hack, which is admittedly cute. Though I'm not really convinced that it is more than a bandaid, which papers over the most obvious places to make it "work". Let's take a step back and look at the usage of 'HZ': 1) Jiffies and related timer wheel interfaces jiffies should just go away completely and be replaced by a simple millisecond counter, which is accessible in the same way as jiffies today. That removes the bulk of HZ usage all over the place and makes the usage sites simpler as the interfaces just use SI units and the gazillions (~4500 to jiffies and ~1000 from jiffies) back and forth conversions just go away. We obviously need to keep the time_before/after/*() interfaces for 32bit, unless we decide to limit the uptime for 32-bit machines to ~8 years and force reboot them before the counter can overflow :) On the timer wheel side that means that the base granularity is always 1ms, which only affects the maximum timeout. The timer expiry is just batched on the actual tick frequency and should not have any other side effects except for slightly moving the granularity boundaries depending on the tick frequency. But that's not any different from the hard coded HZ values. The other minor change is to make the next timer interrupt retrieval for NOHZ round up the next event to the tick boundary, but that's trivial enough. 2) Clock events Periodic mode is trivial to fix with a tick frequency argument to the set_state_periodic() callback. Oneshot mode just works as it programs the hardware to the next closest event. Not much different from the current situation with a hard-coded HZ value. 3) Accounting The accounting has to be seperated from the jiffies advancement and it has to feed the delta to the last tick in nanoseconds into the accounting path, which internally operates in nanoseconds already today. 4) Scheduler I leave that part to Peter as he definitely has a better overview of what needs to be done than me. Thanks, tglx