From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yx1-f45.google.com (mail-yx1-f45.google.com [74.125.224.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8579B421EF8 for ; Wed, 21 Jan 2026 22:42:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.224.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769035368; cv=none; b=c8H4PnMBQVj+aAWYV2VRKWs5Jbt+uZQqgu0+wky1pG2IaOfXDrUpldbYolnxwYwphuB2ip7JSNzmV5taVsF26gDFi2XNSK185CIz4qyXLlWvAMZKQ9tdvYWICMHzPI5QKHWLew2hzW+7be4G/IsEibZUY/AEAlS1CdZVwyiE0eQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769035368; c=relaxed/simple; bh=U6fShtmhIXf5CpxJy9U31Hce1Rv3cRFIffJOmgUeStc=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=UX/Nuge8LktXX6PgK4yhLDU7v4VoC6LSMQL7W5NpgLznb3f9I9/Iqtr62EjVTGAq2TS7YsdAw769XplNduGrTi9lKTrYwawA1HEaVvo2RPJM7hiAMCDnBeLGhgOpf2GPVa7U7sinIM6bxoQ2udhGY2AQQZygc0tvY/s9HiHYxnI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fEDfCYP6; arc=none smtp.client-ip=74.125.224.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fEDfCYP6" Received: by mail-yx1-f45.google.com with SMTP id 956f58d0204a3-649523de905so269593d50.1 for ; Wed, 21 Jan 2026 14:42:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769035364; x=1769640164; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=i+ln0rGFbK/VgHbdF/l62LCNQSo7NQ85BPmeaRzBnWY=; b=fEDfCYP6cmWcjV4vztUeZsVpzzE4REt7MDjpNDO6TVFps9wtdK3gvoyh1h2OwHuyG9 9HPc9HpDGVEACEbY6O3x1r9Xba6E/G7QDe1GupG9GplF0csKn/lIUR18ZCyByLqjuMbc +U6VARoOgEEQrBYRyp64ze3nw8sHFqv6J6JZvuGDS+nGnkpsGwCvL5O5TYyfx3Jy4USe 9LfKmnF0RI4SncbIq7EpXfuMdC08GhzXkAdgXQuyOjT/8tJs7kSXdqSJsmC9ajqIYGKG UUHLol6WTadF0gxUM/Gn57nXFBE6P6hcpLeWVg8OcpDe27Ucr2V8B4BT15zC1npY04MD gpHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769035364; x=1769640164; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i+ln0rGFbK/VgHbdF/l62LCNQSo7NQ85BPmeaRzBnWY=; b=qzQCRAaa+UtlfNmOBj9ix6J5Ztf354uUgIxfLzUlDkzi2UlhbaMo99XFintT3WEl1q XcqGB8Hh+P6CZCUrzmP0x5RPC8C/46gfUQ1a9CwEBsSUUPVqbODH7nnt7q+ZyCsdz6Ql 7Ywh//nMlPNlZl3/BAgsn2E4Gq11SP26pZlElXWGgGUoCp99+1A9gW8cOPAqAX90rsPg J9t22AsPITiXjPu1liWUKn4geylyT7ggk9dZ9eiDoi9D/kOYSIM/LL0kesz4QPIStXSA Z2ITHDZ0LkUjAEqdd+EDiQYqsYwAvCfcPC7l5KX7IxZzaFugngjbtQrMX/SY+7E6JqKr Qn1Q== X-Forwarded-Encrypted: i=1; AJvYcCXihr6Usw2KYC476MDpY9wPPltE/mDQA/DWaZc25qh7xr+3C+fS5V49+63rj5LDdfnwp4onAcec1g==@vger.kernel.org X-Gm-Message-State: AOJu0YxR8C+hbPxojJkR8+LUkppAsKl6reo5OCImT4g7uXpsJ+G5HBiA pepiql/5RNwSxStmXtZK24M8LvskoXNp/w5v+kV2HFqItUpzeT6qxsCi X-Gm-Gg: AZuq6aKAmuUKB4M8hHXphHPLH7kCIvtPSbMcHf0IDQ7c0E1l/tX/sk2eUc6VakrQ+48 F0XfIu0jxlZ/eU4xby+Ph8LB1UBisiIxGjqcZ7dpXCuIBQmv6Upc//LnjmWQjr4+JxMUPiiPL4+ drRHXuAR+7DeA4zujA5Thj6pKifSBf15AHnH3h7gxmFg9u7a7RvQ3t7XGszcdiejEzOmvsy/xCd +Y2RO9fI9pzBjoHrXrXOup8HZTKtBmbzu73ubA0g41qBgPGIFrjV4g7T08ubUeDIMIgBJkxs5TU sb4n9ligTF0++SOGQYz0M7bkWVUBNaaeY4LFRfJB1GLwCdH1Lef+oTmpBbq66C6uDYRIHB2hvZm kL6QEWlZfCvGZw7oRtpSVfwjbZnhrKrs+7h31FbtAvyC6ghBGANcp2V0NHiZXEqSQ2/a+ZwBKpj 4SlmUEm7cfdKZ4Q/nHyi1UZ/ATXbg= X-Received: by 2002:a05:690e:42cf:b0:649:3875:920f with SMTP id 956f58d0204a3-64951339da9mr723857d50.23.1769035364127; Wed, 21 Jan 2026 14:42:44 -0800 (PST) Received: from ?IPV6:2600:6c56:7d00:582f::64e? ([2600:6c56:7d00:582f::64e]) by smtp.googlemail.com with ESMTPSA id 956f58d0204a3-649170acbdbsm8446107d50.13.2026.01.21.14.42.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Jan 2026 14:42:43 -0800 (PST) Message-ID: Date: Wed, 21 Jan 2026 16:42:41 -0600 Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/1] cpuidle: menu: Fix high wakeup latency on modern Intel server platforms To: "Ionut Nechita (Sunlight Linux)" , rafael@kernel.org Cc: ionut_n2001@yahoo.com, daniel.lezcano@linaro.org, christian.loehle@arm.com, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org References: <20260120211725.124349-1-sunlightlinux@gmail.com> Content-Language: en-US From: Russell Haley In-Reply-To: <20260120211725.124349-1-sunlightlinux@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 1/20/26 3:17 PM, Ionut Nechita (Sunlight Linux) wrote: > From: Ionut Nechita > > Hi, > > This patch addresses a performance regression in the menu cpuidle governor > affecting modern Intel server platforms (Sapphire Rapids, Granite Rapids, > and newer). > > == Problem Description == > > On Intel server platforms from 2022 onwards, we observe excessive wakeup > latencies (~150us) in network-sensitive workloads when using the menu > governor with NOHZ_FULL enabled. > > Measurement with qperf tcp_lat shows: > - Sapphire Rapids (SPR): 151us latency > - Ice Lake (ICL): 12us latency > - Skylake (SKL): 21us latency > > The 12x latency regression on SPR compared to Ice Lake is unacceptable for > latency-sensitive applications (HPC, real-time, financial trading, etc.). > > == Root Cause == > > The issue stems from menu.c:294-295: > > if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC) > predicted_ns = data->next_timer_ns; > > When the tick is already stopped and the predicted idle duration is short > (<2ms), the governor switches to using next_timer_ns directly (often > 10ms+). This causes the selection of very deep package C-states (PC6). > > Modern server platforms have significantly longer C-state exit latencies > due to architectural changes: > - Tile-based architecture with per-tile power gating > - DDR5 power management overhead > - CXL link restoration > - Complex mesh interconnect resynchronization > > When a network packet arrives after 500us but the governor selected PC6 > based on a 10ms timer, the 150us exit latency dominates the response time. > > On older platforms (Ice Lake, Skylake) with faster C-state transitions > (12-21us), this issue was less noticeable, but SPR's tile architecture > makes it critical. > > == Solution == > > Instead of using next_timer_ns directly (100% timer-based), add a 25% > safety margin to the prediction and clamp to next_timer_ns: > > predicted_ns = min(predicted_ns + (predicted_ns >> 2), data->next_timer_ns); > > This provides: > - Conservative prediction (avoids too-shallow states) > - Protection against excessively deep states (clamped to timer) > - Platform-agnostic solution (no hardcoded thresholds) > - Minimal overhead (one shift, one add, one min) > > The 25% margin (>> 2 = divide by 4) was chosen as a balance between: > - Too small (10%): Insufficient protection on high-latency platforms > - Too large (50%): Overly conservative, may hurt power efficiency > > == Results == > > Testing on Sapphire Rapids with qperf tcp_lat: > - Before: 151us average latency > - After: ~30us average latency > - Improvement: 5x latency reduction > > Testing on Ice Lake and Skylake shows minimal impact: > - Ice Lake: 12us → 12us (no regression) > - Skylake: 21us → 21us (no regression) > > Power efficiency testing shows <1% difference in package power consumption > during mixed workloads, well within measurement noise. > > == Examples == > > Short prediction (500us), timer at 10ms: > - Before: predicted_ns = 10ms → selects PC6 → 151us wakeup > - After: predicted_ns = min(625us, 10ms) = 625us → selects C1E → 15us wakeup > > Long prediction (1800us), timer at 2ms: > - Before: predicted_ns = 2ms → selects C6 > - After: predicted_ns = min(2250us, 2ms) = 2ms → selects C6 (same state) > > The algorithm naturally adapts to workload characteristics without > platform-specific tuning. > > Ionut Nechita (1): > cpuidle: menu: Add 25% safety margin to short predictions when tick is > stopped > > drivers/cpuidle/governors/menu.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > -- > 2.52.0 Rafael's patch [1] from a few hours before yours might address the same problem, it looks like? Maybe try and see. [1] https://lore.kernel.org/all/5959091.DvuYhMxLoT@rafael.j.wysocki/