[PATCH] kernel/resource: optimize find_next_iomem

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] kernel/resource: optimize find_next_iomem_res
@ 2024-05-31  5:36 Chia-I Wu
  2024-05-31  8:57 ` Andy Shevchenko
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Chia-I Wu @ 2024-05-31  5:36 UTC (permalink / raw)
  To: amd-gfx, dri-devel, christian.koenig, alexander.deucher,
	Greg Kroah-Hartman, Andy Shevchenko, Alison Schofield, Dave Jiang,
	Baoquan He, linux-kernel

We can skip children resources when the parent resource does not cover
the range.

This should help vmf_insert_* users on x86, such as several DRM drivers.
On my AMD Ryzen 5 7520C, when streaming data from cpu memory into amdgpu
bo, the throughput goes from 5.1GB/s to 6.6GB/s.  perf report says

  34.69%--__do_fault
  34.60%--amdgpu_gem_fault
  34.00%--ttm_bo_vm_fault_reserved
  32.95%--vmf_insert_pfn_prot
  25.89%--track_pfn_insert
  24.35%--lookup_memtype
  21.77%--pat_pagerange_is_ram
  20.80%--walk_system_ram_range
  17.42%--find_next_iomem_res

before this change, and

  26.67%--__do_fault
  26.57%--amdgpu_gem_fault
  25.83%--ttm_bo_vm_fault_reserved
  24.40%--vmf_insert_pfn_prot
  14.30%--track_pfn_insert
  12.20%--lookup_memtype
  9.34%--pat_pagerange_is_ram
  8.22%--walk_system_ram_range
  5.09%--find_next_iomem_res

after.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
 kernel/resource.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index fcbca39dbc450..19b84b4f9a577 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -326,6 +326,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
 			       unsigned long flags, unsigned long desc,
 			       struct resource *res)
 {
+	bool skip_children = false;
 	struct resource *p;
 
 	if (!res)
@@ -336,7 +337,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
 
 	read_lock(&resource_lock);
 
-	for_each_resource(&iomem_resource, p, false) {
+	for_each_resource(&iomem_resource, p, skip_children) {
 		/* If we passed the resource we are looking for, stop */
 		if (p->start > end) {
 			p = NULL;
@@ -344,8 +345,11 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
 		}
 
 		/* Skip until we find a range that matches what we look for */
-		if (p->end < start)
+		if (p->end < start) {
+			skip_children = true;
 			continue;
+		}
+		skip_children = false;
 
 		if ((p->flags & flags) != flags)
 			continue;
-- 
2.45.1.288.g0e0cd299f1-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] kernel/resource: optimize find_next_iomem_res
  2024-05-31  5:36 [PATCH] kernel/resource: optimize find_next_iomem_res Chia-I Wu
@ 2024-05-31  8:57 ` Andy Shevchenko
       [not found]   ` <CAPaKu7SsD+X7KAO=3vEYU_7YGM_f+7k1fdC9nEK=-NaJw8oYaA@mail.gmail.com>
  2024-05-31 15:32 ` Andy Shevchenko
  2024-06-04 15:31 ` Greg Kroah-Hartman
  2 siblings, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2024-05-31  8:57 UTC (permalink / raw)
  To: Chia-I Wu, Ilpo Järvinen
  Cc: amd-gfx, dri-devel, christian.koenig, alexander.deucher,
	Greg Kroah-Hartman, Alison Schofield, Dave Jiang, Baoquan He,
	linux-kernel

On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:
> We can skip children resources when the parent resource does not cover
> the range.
> 
> This should help vmf_insert_* users on x86, such as several DRM drivers.
> On my AMD Ryzen 5 7520C, when streaming data from cpu memory into amdgpu
> bo, the throughput goes from 5.1GB/s to 6.6GB/s.  perf report says
> 
>   34.69%--__do_fault
>   34.60%--amdgpu_gem_fault
>   34.00%--ttm_bo_vm_fault_reserved
>   32.95%--vmf_insert_pfn_prot
>   25.89%--track_pfn_insert
>   24.35%--lookup_memtype
>   21.77%--pat_pagerange_is_ram
>   20.80%--walk_system_ram_range
>   17.42%--find_next_iomem_res
> 
> before this change, and
> 
>   26.67%--__do_fault
>   26.57%--amdgpu_gem_fault
>   25.83%--ttm_bo_vm_fault_reserved
>   24.40%--vmf_insert_pfn_prot
>   14.30%--track_pfn_insert
>   12.20%--lookup_memtype
>   9.34%--pat_pagerange_is_ram
>   8.22%--walk_system_ram_range
>   5.09%--find_next_iomem_res
> 
> after.

Is there any documentation that explicitly says that the children resources
must not overlap parent's one? Do we have some test cases? (Either way they
needs to be added / expanded).

P.S> I'm not so sure about this change. It needs a thoroughly testing, esp.
in PCI case. Cc'ing to Ilpo.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] kernel/resource: optimize find_next_iomem_res
  2024-05-31  5:36 [PATCH] kernel/resource: optimize find_next_iomem_res Chia-I Wu
  2024-05-31  8:57 ` Andy Shevchenko
@ 2024-05-31 15:32 ` Andy Shevchenko
  2024-06-04 15:31 ` Greg Kroah-Hartman
  2 siblings, 0 replies; 9+ messages in thread
From: Andy Shevchenko @ 2024-05-31 15:32 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: amd-gfx, dri-devel, christian.koenig, alexander.deucher,
	Greg Kroah-Hartman, Alison Schofield, Dave Jiang, Baoquan He,
	linux-kernel

On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:
> We can skip children resources when the parent resource does not cover
> the range.

> This should help vmf_insert_* users on x86, such as several DRM drivers.

vmf_insert_*()

> On my AMD Ryzen 5 7520C, when streaming data from cpu memory into amdgpu
> bo, the throughput goes from 5.1GB/s to 6.6GB/s.  perf report says

Also in the $Subj (and pay attention to the prefix)

"resource: ... find_next_iomem_res()"


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] kernel/resource: optimize find_next_iomem_res
       [not found]   ` <CAPaKu7SsD+X7KAO=3vEYU_7YGM_f+7k1fdC9nEK=-NaJw8oYaA@mail.gmail.com>
@ 2024-06-02  9:06     ` Andy Shevchenko
  2024-06-03  7:24       ` Ilpo Järvinen
  2024-06-03  7:28     ` Ilpo Järvinen
  1 sibling, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2024-06-02  9:06 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Ilpo Järvinen, amd-gfx, dri-devel, christian.koenig,
	alexander.deucher, Greg Kroah-Hartman, Alison Schofield,
	Dave Jiang, Baoquan He, linux-kernel

On Fri, May 31, 2024 at 02:31:45PM -0700, Chia-I Wu wrote:
> On Fri, May 31, 2024 at 1:57 AM Andy Shevchenko <
> andriy.shevchenko@linux.intel.com> wrote:
> > On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:

...

> > P.S> I'm not so sure about this change. It needs a thoroughly testing, esp.
> > in PCI case. Cc'ing to Ilpo.

> What's special about PCI?

PCI, due to its nature, may rebuild resources either by shrinking or expanding
of the entire subtree after the PCI bridge in question. And this may happen at
run-time due to hotplug support. But I'm not a deep expert in this area, Ilpo
knows much more than me.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] kernel/resource: optimize find_next_iomem_res
  2024-06-02  9:06     ` Andy Shevchenko
@ 2024-06-03  7:24       ` Ilpo Järvinen
  2024-06-04  5:04         ` Chia-I Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2024-06-03  7:24 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Chia-I Wu, amd-gfx, dri-devel, christian.koenig,
	alexander.deucher, Greg Kroah-Hartman, Alison Schofield,
	Dave Jiang, Baoquan He, LKML

[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]

On Sun, 2 Jun 2024, Andy Shevchenko wrote:

> On Fri, May 31, 2024 at 02:31:45PM -0700, Chia-I Wu wrote:
> > On Fri, May 31, 2024 at 1:57 AM Andy Shevchenko <
> > andriy.shevchenko@linux.intel.com> wrote:
> > > On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:
> 
> ...
> 
> > > P.S> I'm not so sure about this change. It needs a thoroughly testing, esp.
> > > in PCI case. Cc'ing to Ilpo.
> 
> > What's special about PCI?
> 
> PCI, due to its nature, may rebuild resources either by shrinking or expanding
> of the entire subtree after the PCI bridge in question. And this may happen at
> run-time due to hotplug support. But I'm not a deep expert in this area, Ilpo
> knows much more than me.

There is code which clearly tries to do expanding resource but that 
usually fails to work as intended because of a parent resource whose size 
is fixed because it's already assigned.

Some other code might block shrinking too under certain conditions.

This area would need to be reworked in PCI core but it's massive and 
scary looking change.

-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] kernel/resource: optimize find_next_iomem_res
       [not found]   ` <CAPaKu7SsD+X7KAO=3vEYU_7YGM_f+7k1fdC9nEK=-NaJw8oYaA@mail.gmail.com>
  2024-06-02  9:06     ` Andy Shevchenko
@ 2024-06-03  7:28     ` Ilpo Järvinen
  1 sibling, 0 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2024-06-03  7:28 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Andy Shevchenko, amd-gfx, dri-devel, christian.koenig,
	alexander.deucher, Greg Kroah-Hartman, Alison Schofield,
	Dave Jiang, Baoquan He, LKML

[-- Attachment #1: Type: text/plain, Size: 2468 bytes --]

On Fri, 31 May 2024, Chia-I Wu wrote:
> On Fri, May 31, 2024 at 1:57 AM Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> wrote:
>       On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:
>       > We can skip children resources when the parent resource does not cover
>       > the range.
>       >
>       > This should help vmf_insert_* users on x86, such as several DRM
>       drivers.
>       > On my AMD Ryzen 5 7520C, when streaming data from cpu memory into
>       amdgpu
>       > bo, the throughput goes from 5.1GB/s to 6.6GB/s.  perf report says
>       >
>       >   34.69%--__do_fault
>       >   34.60%--amdgpu_gem_fault
>       >   34.00%--ttm_bo_vm_fault_reserved
>       >   32.95%--vmf_insert_pfn_prot
>       >   25.89%--track_pfn_insert
>       >   24.35%--lookup_memtype
>       >   21.77%--pat_pagerange_is_ram
>       >   20.80%--walk_system_ram_range
>       >   17.42%--find_next_iomem_res
>       >
>       > before this change, and
>       >
>       >   26.67%--__do_fault
>       >   26.57%--amdgpu_gem_fault
>       >   25.83%--ttm_bo_vm_fault_reserved
>       >   24.40%--vmf_insert_pfn_prot
>       >   14.30%--track_pfn_insert
>       >   12.20%--lookup_memtype
>       >   9.34%--pat_pagerange_is_ram
>       >   8.22%--walk_system_ram_range
>       >   5.09%--find_next_iomem_res
>       >
>       > after.
> 
>       Is there any documentation that explicitly says that the children
>       resources
>       must not overlap parent's one? Do we have some test cases? (Either way
>       they
>       needs to be added / expanded).
> 
> I think it's the opposite.  The assumption here is that a child is always a subset of
> its parent.  Thus, if the range to be checked is not covered by a parent, we can skip
> the children.
>
> That's guaranteed by __request_resource.  I am less sure about __insert_resource but
> it appears to be the case too.  FWIW, resource_is_exclusive has the same assumption
> already.

Yes, the children resources are contained within the parent resource (at 
least in PCI but given the code, I'd expect that to be general state of 
affairs).

> It looks like I need to do some refactoring to add tests.
> 
> 
>       P.S> I'm not so sure about this change. It needs a thoroughly testing,
>       esp.
>       in PCI case. Cc'ing to Ilpo.
> 
> What's special about PCI?

-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] kernel/resource: optimize find_next_iomem_res
  2024-06-03  7:24       ` Ilpo Järvinen
@ 2024-06-04  5:04         ` Chia-I Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Chia-I Wu @ 2024-06-04  5:04 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Andy Shevchenko, amd-gfx, dri-devel, christian.koenig,
	alexander.deucher, Greg Kroah-Hartman, Alison Schofield,
	Dave Jiang, Baoquan He, LKML

On Mon, Jun 3, 2024 at 12:24 AM Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com> wrote:
>
> On Sun, 2 Jun 2024, Andy Shevchenko wrote:
>
> > On Fri, May 31, 2024 at 02:31:45PM -0700, Chia-I Wu wrote:
> > > On Fri, May 31, 2024 at 1:57 AM Andy Shevchenko <
> > > andriy.shevchenko@linux.intel.com> wrote:
> > > > On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:
> >
> > ...
> >
> > > > P.S> I'm not so sure about this change. It needs a thoroughly testing, esp.
> > > > in PCI case. Cc'ing to Ilpo.
> >
> > > What's special about PCI?
> >
> > PCI, due to its nature, may rebuild resources either by shrinking or expanding
> > of the entire subtree after the PCI bridge in question. And this may happen at
> > run-time due to hotplug support. But I'm not a deep expert in this area, Ilpo
> > knows much more than me.
>
> There is code which clearly tries to do expanding resource but that
> usually fails to work as intended because of a parent resource whose size
> is fixed because it's already assigned.
>
> Some other code might block shrinking too under certain conditions.
>
> This area would need to be reworked in PCI core but it's massive and
> scary looking change.
Given the nature of this change (skip checking against children when
the parent does not match), unless a child resource can exceed its
parent resource, I don't think this change affects correctness.

The walk does not hold the resource lock outside of
find_next_iomem_res().  Updating the tree while the walk is in
progress has always been a bit ill-defined.  The patch does not change
that (but it might change the timing a bit).

I can export __walk_iomem_res_desc() and write some unit tests against
it.  Would that be enough to justify this change?

>
> --
>  i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] kernel/resource: optimize find_next_iomem_res
  2024-05-31  5:36 [PATCH] kernel/resource: optimize find_next_iomem_res Chia-I Wu
  2024-05-31  8:57 ` Andy Shevchenko
  2024-05-31 15:32 ` Andy Shevchenko
@ 2024-06-04 15:31 ` Greg Kroah-Hartman
  2024-06-04 21:37   ` Chia-I Wu
  2 siblings, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2024-06-04 15:31 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: amd-gfx, dri-devel, christian.koenig, alexander.deucher,
	Andy Shevchenko, Alison Schofield, Dave Jiang, Baoquan He,
	linux-kernel

On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:
> We can skip children resources when the parent resource does not cover
> the range.
> 
> This should help vmf_insert_* users on x86, such as several DRM drivers.
> On my AMD Ryzen 5 7520C, when streaming data from cpu memory into amdgpu
> bo, the throughput goes from 5.1GB/s to 6.6GB/s.  perf report says
> 
>   34.69%--__do_fault
>   34.60%--amdgpu_gem_fault
>   34.00%--ttm_bo_vm_fault_reserved
>   32.95%--vmf_insert_pfn_prot
>   25.89%--track_pfn_insert
>   24.35%--lookup_memtype
>   21.77%--pat_pagerange_is_ram
>   20.80%--walk_system_ram_range
>   17.42%--find_next_iomem_res
> 
> before this change, and
> 
>   26.67%--__do_fault
>   26.57%--amdgpu_gem_fault
>   25.83%--ttm_bo_vm_fault_reserved
>   24.40%--vmf_insert_pfn_prot
>   14.30%--track_pfn_insert
>   12.20%--lookup_memtype
>   9.34%--pat_pagerange_is_ram
>   8.22%--walk_system_ram_range
>   5.09%--find_next_iomem_res
> 
> after.

That's great, but why is walk_system_ram_range() being called so often?

Shouldn't that be a "set up the device" only type of thing?  Why hammer
on "lookup_memtype" when you know the memtype, you just did the same
thing for the previous frame.

This feels like it could be optimized to just "don't call these things"
which would make it go faster, right?

What am I missing here, why does this always have to be calculated all
the time?  Resource mapping changes are rare, if at all, over the
lifetime of a system uptime.  Constantly calculating something that
never changes feels odd to me.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] kernel/resource: optimize find_next_iomem_res
  2024-06-04 15:31 ` Greg Kroah-Hartman
@ 2024-06-04 21:37   ` Chia-I Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Chia-I Wu @ 2024-06-04 21:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: amd-gfx, dri-devel, christian.koenig, alexander.deucher,
	Andy Shevchenko, Alison Schofield, Dave Jiang, Baoquan He,
	linux-kernel

On Tue, Jun 4, 2024 at 8:41 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, May 30, 2024 at 10:36:57PM -0700, Chia-I Wu wrote:
> > We can skip children resources when the parent resource does not cover
> > the range.
> >
> > This should help vmf_insert_* users on x86, such as several DRM drivers.
> > On my AMD Ryzen 5 7520C, when streaming data from cpu memory into amdgpu
> > bo, the throughput goes from 5.1GB/s to 6.6GB/s.  perf report says
> >
> >   34.69%--__do_fault
> >   34.60%--amdgpu_gem_fault
> >   34.00%--ttm_bo_vm_fault_reserved
> >   32.95%--vmf_insert_pfn_prot
> >   25.89%--track_pfn_insert
> >   24.35%--lookup_memtype
> >   21.77%--pat_pagerange_is_ram
> >   20.80%--walk_system_ram_range
> >   17.42%--find_next_iomem_res
> >
> > before this change, and
> >
> >   26.67%--__do_fault
> >   26.57%--amdgpu_gem_fault
> >   25.83%--ttm_bo_vm_fault_reserved
> >   24.40%--vmf_insert_pfn_prot
> >   14.30%--track_pfn_insert
> >   12.20%--lookup_memtype
> >   9.34%--pat_pagerange_is_ram
> >   8.22%--walk_system_ram_range
> >   5.09%--find_next_iomem_res
> >
> > after.
>
> That's great, but why is walk_system_ram_range() being called so often?
>
> Shouldn't that be a "set up the device" only type of thing?  Why hammer
> on "lookup_memtype" when you know the memtype, you just did the same
> thing for the previous frame.
>
> This feels like it could be optimized to just "don't call these things"
> which would make it go faster, right?
>
> What am I missing here, why does this always have to be calculated all
> the time?  Resource mapping changes are rare, if at all, over the
> lifetime of a system uptime.  Constantly calculating something that
> never changes feels odd to me.
Yeah, that would be even better.

I am not familiar with x86 pat code.  I will have to defer that to
those more familiar with the matter.

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-04 21:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-31  5:36 [PATCH] kernel/resource: optimize find_next_iomem_res Chia-I Wu
2024-05-31  8:57 ` Andy Shevchenko
     [not found]   ` <CAPaKu7SsD+X7KAO=3vEYU_7YGM_f+7k1fdC9nEK=-NaJw8oYaA@mail.gmail.com>
2024-06-02  9:06     ` Andy Shevchenko
2024-06-03  7:24       ` Ilpo Järvinen
2024-06-04  5:04         ` Chia-I Wu
2024-06-03  7:28     ` Ilpo Järvinen
2024-05-31 15:32 ` Andy Shevchenko
2024-06-04 15:31 ` Greg Kroah-Hartman
2024-06-04 21:37   ` Chia-I Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox