From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 481D4243956 for ; Sun, 7 Dec 2025 20:38:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765139904; cv=none; b=Z9RK51y2H/WuUSdWjLBdTxG6I2GJz1x5vHw0ITiq1ItN+tBvulx2JHk6UuDxy7uGZDtqg0QDh4Z2t4XgehGFJnVfE0MWB7s72S152xahAm/HwGZqrJRJsLa6nb9CoCBQCVJEbs0qYhk7xGDHtJ9oE1A6AXwbnlmc86Zp1g9dV1A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765139904; c=relaxed/simple; bh=uRZ/rSW3zzaPlwoxUn8qD9kwcG0eeq30YUYxaP33WpE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lPaBIzqFyxjTFqikPEgren6Ne/UHhhIYTrEBClXZ9x85kblsDy2Kawsi0JDrmV7NTk6hTvK7mOFx7J4W8EmZPWU89nAdrm+LsPK1weZn6qyhcEzk7DdSnuCYBzfhJ7lrpCT+u/ryYUjf5QyGXUKbGqEBG04safcPPeSjUBzK8fc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MQfYo8QB; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MQfYo8QB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1765139902; x=1796675902; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=uRZ/rSW3zzaPlwoxUn8qD9kwcG0eeq30YUYxaP33WpE=; b=MQfYo8QBMeW0T4bq5PAOHu7RkdW9AeUtYL0FH6097k2WMS96c9wkOlvN lyflsFAJgBPLVAWuCgJrYheC/vCSjA6bEvOsZtXuuXgCePxgyNsZO8/AT LkFSVtfK4Ko+9GoeIpqp4lXKXiqsVgbl7b5yhkxBSK0EvlvQwBzCjyf0Y xfZCOn6xFFGoBIDpwhsZ4OL6HOLVsbHMg26ktgiMSx4wyMgVS0MNAMBI4 4wrvav6itsnYgheEHT1Xd20DN6SwNB+kQzT5kVJ6lQKchlGoep3JRC5YO 9VOGRSpFng/i0CoPEhhMyRHv/tS0Uvuw/csAxxtespes5dD6yTtbja3WY A==; X-CSE-ConnectionGUID: +7Mqu6+QTGGaHFeuBgSLqQ== X-CSE-MsgGUID: HwO3dTASSq+E2ctsYYD9Xw== X-IronPort-AV: E=McAfee;i="6800,10657,11635"; a="67055321" X-IronPort-AV: E=Sophos;i="6.20,257,1758610800"; d="scan'208";a="67055321" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2025 12:38:21 -0800 X-CSE-ConnectionGUID: qIrn1PyEQ9yhvh8adsYY5Q== X-CSE-MsgGUID: hULZ7FLVTu2uVXLUY7jD5A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,257,1758610800"; d="scan'208";a="200897709" Received: from tassilo.jf.intel.com (HELO tassilo) ([10.54.38.190]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2025 12:38:21 -0800 Date: Sun, 7 Dec 2025 12:38:20 -0800 From: Andi Kleen To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, x86@kernel.org, ggherdovich@suse.cz, rafael.j.wysocki@intel.com Subject: Re: [PATCH] x86/aperfmperf: Don't disable scheduler APERF/MPERF on bad samples Message-ID: References: <20251204180914.1855553-1-ak@linux.intel.com> <20251205161052.GH2528459@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20251205161052.GH2528459@noisy.programming.kicks-ass.net> On Fri, Dec 05, 2025 at 05:10:52PM +0100, Peter Zijlstra wrote: > On Thu, Dec 04, 2025 at 10:09:14AM -0800, Andi Kleen wrote: > > The APERF and MPERF MSRs get read together and the ratio > > between the two is used to scale the scheduler capacity with frequency. > > > > Since e2b0d619b400 when there is ever an over/underflow of > > the APERF/MPERF computation the sampling gets completely > > disabled, under the assumption that there is a problem with > > the hardware. > > > > However this can happen without any malfunction when there is > > a long enough interruption between the two MSR reads, for > > example due to an unlucky NMI or SMI or other system event > > causing delays. We saw it when a delay resulted in > > Acnt_Delta << Mcnt_Delta (about ~4k for acnt_delta and > > 2M for MCnt_Delta) > > > > In this case the ratio computation underflows, which is detected, > > but then APERF/MPERF usage gets incorrectly disabled forever. > > > > Remove the code to completely disable APERF/MPERF on > > a bad sample. Instead when any over/underflow happens > > return the fallback full capacity. > > So what systems are actually showing this bad behaviour and what are we > doing to cure the problem rather than fight the symptom? We saw it with an artificial stress test on an Intel internal system, but like I (and Andrew) explained it is unavoidable and general: Delays can always happen due to many reasons on any system: NMIs, SMIs, virtualization, other random system issues. > Also, a system where this is systematically buggered would really be > better off disabling it, no? The particular failure case here if it was common (lots of very long execution delays) would make the system fairly unusable anyways. The scheduler doing a slightly worse job is the least of your troubles in such a case. For other failures I'm not aware of a system (perhaps short of a hypervisor that doesn't save/restore when switching underlying cpus) that actually has broken APERF/MPERF. So breaking good systems just for a hypothetical bad case doesn't seem like a good trade off. The main difference to the old strategy is that if there is a really bad case but it results in bad samples that don't under/overflow they would still be used, while the old one would stop on any bad sample.But any attempt to handle this without impacting good cases would either need complexity or magic threshold numbers. So it seems better to not even try. -Andi