From: "andriy.shevchenko@linux.intel.com" <andriy.shevchenko@linux.intel.com>
To: "Stamatis, Ilias" <ilstam@amazon.co.uk>
Cc: "nadav.amit@gmail.com" <nadav.amit@gmail.com>,
"david@kernel.org" <david@kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"bhe@redhat.com" <bhe@redhat.com>,
"huang.ying.caritas@gmail.com" <huang.ying.caritas@gmail.com>,
"nh-open-source@amazon.com" <nh-open-source@amazon.com>
Subject: Re: [PATCH] Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
Date: Tue, 25 Nov 2025 12:23:56 +0200 [thread overview]
Message-ID: <aSWDvPVZRTCCfRpV@smile.fi.intel.com> (raw)
In-Reply-To: <b9f8623fb383ef21bb47e6c9716c2c389581d12d.camel@amazon.co.uk>
On Tue, Nov 25, 2025 at 09:56:36AM +0000, Stamatis, Ilias wrote:
> On Tue, 2025-11-25 at 08:50 +0200, andriy.shevchenko@linux.intel.com wrote:
> > On Mon, Nov 24, 2025 at 11:30:46PM +0000, Stamatis, Ilias wrote:
> > > On Mon, 2025-11-24 at 21:52 +0200, andriy.shevchenko@linux.intel.com wrote:
> > > > On Mon, Nov 24, 2025 at 07:35:31PM +0000, Stamatis, Ilias wrote:
> > > > > On Mon, 2025-11-24 at 20:55 +0200, andriy.shevchenko@linux.intel.com wrote:
> > > > > > On Mon, Nov 24, 2025 at 06:01:35PM +0000, Stamatis, Ilias wrote:
> > > > > > > On Mon, 2025-11-24 at 08:58 -0800, Andrew Morton wrote:
> > > > > > > > On Mon, 24 Nov 2025 16:53:49 +0000 Ilias Stamatis <ilstam@amazon.com> wrote:
...
> > > > > > > > > Commit 97523a4edb7b ("kernel/resource: remove first_lvl / siblings_only
> > > > > > > > > logic") removed an optimization introduced by commit 756398750e11
> > > > > > > > > ("resource: avoid unnecessary lookups in find_next_iomem_res()"). That
> > > > > > > > > was not called out in the message of the first commit explicitly so it's
> > > > > > > > > not entirely clear whether removing the optimization happened
> > > > > > > > > inadvertently or not.
> > > > > > > > >
> > > > > > > > > As the original commit message of the optimization explains there is no
> > > > > > > > > point considering the children of a subtree in find_next_iomem_res() if
> > > > > > > > > the top level range does not match. Reinstating the optimization results
> > > > > > > > > in significant performance improvements in systems with very large iomem
> > > > > > > > > maps when mmaping /dev/mem.
> > > > > > > >
> > > > > > > > It would be great if we could quantify "significant performance
> > > > > > > > improvements"?
> > > > > > >
> > > > > > > I've done my testing with older kernel versions in systems where `wc -l
> > > > > > > /proc/iomem` can return ~5k. In that environment I see mmaping parts of
> > > > > > > /dev/mem taking 700-1500μs without the optimisation and 10-50μs with the
> > > > > > > optimisation.
> > > > > > >
> > > > > > > The real-world use case we care about is hypervisor live update where having to
> > > > > > > do lots of these mmaps() serially can significantly affect the guest downtime
> > > > > > > if the cost is 20-30x.
> > > > > >
> > > > > > Thanks for providing this information.
> > > > > >
> > > > > > > > It also would be good to know which exact function(s) is a bottleneck.
> > > > > > >
> > > > > > > Perf tracing shows that ~95% of CPU time is spent in find_next_iomem_res(),
> > > > > >
> > > > > > Have you investigated possibility to return that check directly into
> > > > > > the culprit?
> > > > >
> > > > > I'm sorry, I don't understand this. Could you please clarify what you mean?
> > > > > What do you consider to be the culprit and which check do you refer to?
> > > >
> > > > The mentioned patch removed the check for siblings from next_resource().
> > > > The function that your test case complains about is find_next_iomem_res().
> > > > Hence, have you tried to reinstantiate the (removed) check from next_resource()
> > > > in find_next_iomem_res() and see if it helps?
> > >
> > > next_resource() does accept a 'skip_children' parameter in the latest kernel
> > > today which is equivalent to the 'sibling_only' parameter in the older
> > > kernels.
> >
> > It used to be
> >
> > if (sibling_only)
> > return p->sibling;
> >
> > if (p->child)
> > return p->child;
> > ...
>
> This returns p->sibling if sibling_only == true.
> The return value might also be NULL.
>
> > and become (in the latest kernels)
> >
> > if (!skip_children && p->child)
> > return p->child;
> > ...
>
> if (!skip_children && p->child)
> return p->child;
> while (!p->sibling && p->parent) {
> p = p->parent;
> if (p == subtree_root)
> return NULL;
> }
> return p->sibling;
>
> This is the full function on the latest kernel. If skip_children == true and
> there is a sibling, it also returns p->sibling.
>
> If p->sibling is NULL, it'll try to get the parent. In the case of
> find_next_iomem_res() the parent will be iomem_resource, in which case the if
> (p == subtree_root) path is taken and we return NULL (same as the case of
> p->sibling being NULL above).
Thanks for elaboration.
Please summarise this, add the performance test results and send a v2.
Seems okay to me.
> > Can you elaborate how are they interoperable?
> >
> > TL;DR: I don't think it's an equivalent.
So, it's not a literal equivalent, but it behaves in a very similar way.
--
With Best Regards,
Andy Shevchenko
next prev parent reply other threads:[~2025-11-25 10:24 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-24 16:53 [PATCH] Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()" Ilias Stamatis
2025-11-24 16:58 ` Andrew Morton
2025-11-24 17:05 ` Andy Shevchenko
2025-11-24 18:01 ` Stamatis, Ilias
2025-11-24 18:55 ` andriy.shevchenko
2025-11-24 19:35 ` Stamatis, Ilias
2025-11-24 19:52 ` andriy.shevchenko
2025-11-24 23:30 ` Stamatis, Ilias
2025-11-25 6:50 ` andriy.shevchenko
2025-11-25 9:56 ` Stamatis, Ilias
2025-11-25 10:23 ` andriy.shevchenko [this message]
2025-11-25 14:23 ` Stamatis, Ilias
2025-11-25 18:30 ` andriy.shevchenko
2025-11-25 8:09 ` David Hildenbrand (Red Hat)
2025-11-25 8:18 ` David Hildenbrand (Red Hat)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aSWDvPVZRTCCfRpV@smile.fi.intel.com \
--to=andriy.shevchenko@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=david@kernel.org \
--cc=huang.ying.caritas@gmail.com \
--cc=ilstam@amazon.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nadav.amit@gmail.com \
--cc=nh-open-source@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.