From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8E132F1FD2 for ; Tue, 25 Nov 2025 10:24:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764066246; cv=none; b=Egzw41sX0H7lBUrVKYoVRVxj8k+flIAIPKhcFuHPJqBgAaaFl1wMXJO9yUUhC3od5rrSsEDu2bcIaM0lUgnt9nam2ioDgKiDz5PdM3VZQBEIr6zlywvzSaUniI2GoPN9o7IOehgJ0K0l/gktPHD5RlwE4JjoDiJYo0g8f7OsBsk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764066246; c=relaxed/simple; bh=VyZt21DwQ3vPn9KNWUNLNyJdvbaxb1/cxFvzwpMVjiE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=DBH68SYRnM/dreVtJbplPiaxBxTVon52KDYwcCxjDRNgqSpX66ijFwIngpRrSZ7B2n7h5x9FOoqTAozsQypAbvUi7p53CHoKEb7iRjwZvbK+mA5P0vtPtVTDkb5zfuKCHefkefkJrF4wsblZknWqzDM2ZXWUBfbfwmvpsJeDZms= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fMFJ/Y+n; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fMFJ/Y+n" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764066243; x=1795602243; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=VyZt21DwQ3vPn9KNWUNLNyJdvbaxb1/cxFvzwpMVjiE=; b=fMFJ/Y+n3X1S+kOHl3g4TWg2KnugQ+tc/yNCijT5TPLkHWztzofJ1Psb B3X4rOHO9AqxGEBtCgO/5BZ9oraq/ItyEoq5ss39+Ikmxgsf4UdBbEGiC tjss4DjSVdL6Nqm7ezNNZPkJ76K7nFSZ5BWpaFe2Lr1tRBHXxLrsBeoRJ nb27JCF5JuvS5HxsI+1HURSFfxpjDcU2cM639iDj3auqL6f9rHuCO3iPB fQdMUYUzU8L6unPjqSwVzPFHOi+rQHunkFezN6F2SyMyrf05Syao6gVIU 6VP8H4VpeAUjy2IJjVPPP1yFxIv4I73OWJRKCwvWSr0ZIZzgLrQWmdYIt A==; X-CSE-ConnectionGUID: Y7uEpy0UTDSay6DWGQnq4A== X-CSE-MsgGUID: vVa70gc7QRKrlXPNXaaKCg== X-IronPort-AV: E=McAfee;i="6800,10657,11623"; a="66115276" X-IronPort-AV: E=Sophos;i="6.20,225,1758610800"; d="scan'208";a="66115276" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Nov 2025 02:24:01 -0800 X-CSE-ConnectionGUID: RuvfkC8PSQOdWg++dunKbw== X-CSE-MsgGUID: iaJn4sb2RBOUOVBJp1O2MQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,225,1758610800"; d="scan'208";a="215953712" Received: from abityuts-desk.ger.corp.intel.com (HELO localhost) ([10.245.244.152]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Nov 2025 02:23:59 -0800 Date: Tue, 25 Nov 2025 12:23:56 +0200 From: "andriy.shevchenko@linux.intel.com" To: "Stamatis, Ilias" Cc: "nadav.amit@gmail.com" , "david@kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "bhe@redhat.com" , "huang.ying.caritas@gmail.com" , "nh-open-source@amazon.com" Subject: Re: [PATCH] Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()" Message-ID: References: <20251124165349.3377826-1-ilstam@amazon.com> <20251124085816.07dbf5a4ec6235b2943840a0@linux-foundation.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Organization: Intel Finland Oy - BIC 0357606-4 - c/o Alberga Business Park, 6 krs, Bertel Jungin Aukio 5, 02600 Espoo On Tue, Nov 25, 2025 at 09:56:36AM +0000, Stamatis, Ilias wrote: > On Tue, 2025-11-25 at 08:50 +0200, andriy.shevchenko@linux.intel.com wrote: > > On Mon, Nov 24, 2025 at 11:30:46PM +0000, Stamatis, Ilias wrote: > > > On Mon, 2025-11-24 at 21:52 +0200, andriy.shevchenko@linux.intel.com wrote: > > > > On Mon, Nov 24, 2025 at 07:35:31PM +0000, Stamatis, Ilias wrote: > > > > > On Mon, 2025-11-24 at 20:55 +0200, andriy.shevchenko@linux.intel.com wrote: > > > > > > On Mon, Nov 24, 2025 at 06:01:35PM +0000, Stamatis, Ilias wrote: > > > > > > > On Mon, 2025-11-24 at 08:58 -0800, Andrew Morton wrote: > > > > > > > > On Mon, 24 Nov 2025 16:53:49 +0000 Ilias Stamatis wrote: ... > > > > > > > > > Commit 97523a4edb7b ("kernel/resource: remove first_lvl / siblings_only > > > > > > > > > logic") removed an optimization introduced by commit 756398750e11 > > > > > > > > > ("resource: avoid unnecessary lookups in find_next_iomem_res()"). That > > > > > > > > > was not called out in the message of the first commit explicitly so it's > > > > > > > > > not entirely clear whether removing the optimization happened > > > > > > > > > inadvertently or not. > > > > > > > > > > > > > > > > > > As the original commit message of the optimization explains there is no > > > > > > > > > point considering the children of a subtree in find_next_iomem_res() if > > > > > > > > > the top level range does not match. Reinstating the optimization results > > > > > > > > > in significant performance improvements in systems with very large iomem > > > > > > > > > maps when mmaping /dev/mem. > > > > > > > > > > > > > > > > It would be great if we could quantify "significant performance > > > > > > > > improvements"? > > > > > > > > > > > > > > I've done my testing with older kernel versions in systems where `wc -l > > > > > > > /proc/iomem` can return ~5k. In that environment I see mmaping parts of > > > > > > > /dev/mem taking 700-1500μs without the optimisation and 10-50μs with the > > > > > > > optimisation. > > > > > > > > > > > > > > The real-world use case we care about is hypervisor live update where having to > > > > > > > do lots of these mmaps() serially can significantly affect the guest downtime > > > > > > > if the cost is 20-30x. > > > > > > > > > > > > Thanks for providing this information. > > > > > > > > > > > > > > It also would be good to know which exact function(s) is a bottleneck. > > > > > > > > > > > > > > Perf tracing shows that ~95% of CPU time is spent in find_next_iomem_res(), > > > > > > > > > > > > Have you investigated possibility to return that check directly into > > > > > > the culprit? > > > > > > > > > > I'm sorry, I don't understand this. Could you please clarify what you mean? > > > > > What do you consider to be the culprit and which check do you refer to? > > > > > > > > The mentioned patch removed the check for siblings from next_resource(). > > > > The function that your test case complains about is find_next_iomem_res(). > > > > Hence, have you tried to reinstantiate the (removed) check from next_resource() > > > > in find_next_iomem_res() and see if it helps? > > > > > > next_resource() does accept a 'skip_children' parameter in the latest kernel > > > today which is equivalent to the 'sibling_only' parameter in the older > > > kernels. > > > > It used to be > > > > if (sibling_only) > > return p->sibling; > > > > if (p->child) > > return p->child; > > ... > > This returns p->sibling if sibling_only == true. > The return value might also be NULL. > > > and become (in the latest kernels) > > > > if (!skip_children && p->child) > > return p->child; > > ... > > if (!skip_children && p->child) > return p->child; > while (!p->sibling && p->parent) { > p = p->parent; > if (p == subtree_root) > return NULL; > } > return p->sibling; > > This is the full function on the latest kernel. If skip_children == true and > there is a sibling, it also returns p->sibling. > > If p->sibling is NULL, it'll try to get the parent. In the case of > find_next_iomem_res() the parent will be iomem_resource, in which case the if > (p == subtree_root) path is taken and we return NULL (same as the case of > p->sibling being NULL above). Thanks for elaboration. Please summarise this, add the performance test results and send a v2. Seems okay to me. > > Can you elaborate how are they interoperable? > > > > TL;DR: I don't think it's an equivalent. So, it's not a literal equivalent, but it behaves in a very similar way. -- With Best Regards, Andy Shevchenko