Re: [PATCH] mm/kmemleak: Avoid scanning potential huge holes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lang Yu <Lang.Yu@amd.com>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/kmemleak: Avoid scanning potential huge holes
Date: Mon, 8 Nov 2021 17:44:25 +0800	[thread overview]
Message-ID: <YYjxedn98jVhofba@lang-desktop> (raw)
In-Reply-To: <a82d57cc-c99a-406c-15b7-3e92975dc452@redhat.com>

On Mon, Nov 08, 2021 at 10:24:43AM +0100, David Hildenbrand wrote:
> On 08.11.21 10:06, Lang Yu wrote:
> > On Mon, Nov 08, 2021 at 09:23:16AM +0100, David Hildenbrand wrote:
> >> On 08.11.21 08:27, Lang Yu wrote:
> >>> On Fri, Nov 05, 2021 at 02:14:50PM +0100, David Hildenbrand wrote:
> >>>> On 05.11.21 04:52, Lang Yu wrote:
> >>>>> When using devm_request_free_mem_region() and
> >>>>> devm_memremap_pages() to add ZONE_DEVICE memory, if requested
> >>>>> free mem region pfn were huge(e.g., 0x400000000 ,we found
> >>>>> on some amd apus, amdkfd svm will request a such free mem region),
> >>>>> the node_end_pfn() will be also huge(see move_pfn_range_to_zone()).
> >>>>> It creates a huge hole between node_start_pfn() and node_end_pfn().
> >>>>>
> >>>>> In such a case, following code snippet acctually was
> >>>>> just doing busy test_bit() looping on the huge hole.
> >>>>>
> >>>>> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >>>>> 	struct page *page = pfn_to_online_page(pfn);
> >>>>> 		if (!page)
> >>>>> 			continue;
> >>>>> 	...
> >>>>> }
> >>>>>
> >>>>> So we got a soft lockup:
> >>>>>
> >>>>>  watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
> >>>>>  CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
> >>>>>  RIP: 0010:pfn_to_online_page+0x5/0xd0
> >>>>>  Call Trace:
> >>>>>   ? kmemleak_scan+0x16a/0x440
> >>>>>   kmemleak_write+0x306/0x3a0
> >>>>>   ? common_file_perm+0x72/0x170
> >>>>>   full_proxy_write+0x5c/0x90
> >>>>>   vfs_write+0xb9/0x260
> >>>>>   ksys_write+0x67/0xe0
> >>>>>   __x64_sys_write+0x1a/0x20
> >>>>>   do_syscall_64+0x3b/0xc0
> >>>>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>>>
> >>>>> I did some tests with the patch.
> >>>>>
> >>>>> (1) amdgpu module unloaded
> >>>>>
> >>>>> before the patch:
> >>>>>
> >>>>> real    0m0.976s
> >>>>> user    0m0.000s
> >>>>> sys     0m0.968s
> >>>>>
> >>>>> after the patch:
> >>>>>
> >>>>> real    0m0.981s
> >>>>> user    0m0.000s
> >>>>> sys     0m0.973s
> >>>>>
> >>>>> (2) amdgpu module loaded
> >>>>>
> >>>>> before the patch:
> >>>>>
> >>>>> real    0m35.365s
> >>>>> user    0m0.000s
> >>>>> sys     0m35.354s
> >>>>>
> >>>>> after the patch:
> >>>>>
> >>>>> real    0m1.049s
> >>>>> user    0m0.000s
> >>>>> sys     0m1.042s
> >>>>>
> >>>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
> >>>>> ---
> >>>>>  mm/kmemleak.c | 9 +++++----
> >>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> >>>>> index b57383c17cf6..d07444613a84 100644
> >>>>> --- a/mm/kmemleak.c
> >>>>> +++ b/mm/kmemleak.c
> >>>>> @@ -1403,6 +1403,7 @@ static void kmemleak_scan(void)
> >>>>>  {
> >>>>>  	unsigned long flags;
> >>>>>  	struct kmemleak_object *object;
> >>>>> +	struct zone *zone;
> >>>>>  	int i;
> >>>>>  	int new_leaks = 0;
> >>>>>  
> >>>>> @@ -1443,9 +1444,9 @@ static void kmemleak_scan(void)
> >>>>>  	 * Struct page scanning for each node.
> >>>>>  	 */
> >>>>>  	get_online_mems();
> >>>>> -	for_each_online_node(i) {
> >>>>> -		unsigned long start_pfn = node_start_pfn(i);
> >>>>> -		unsigned long end_pfn = node_end_pfn(i);
> >>>>> +	for_each_populated_zone(zone) {
> >>>>> +		unsigned long start_pfn = zone->zone_start_pfn;
> >>>>> +		unsigned long end_pfn = zone_end_pfn(zone);
> >>>>>  		unsigned long pfn;
> >>>>>  
> >>>>>  		for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >>>>> @@ -1455,7 +1456,7 @@ static void kmemleak_scan(void)
> >>>>>  				continue;
> >>>>>  
> >>>>>  			/* only scan pages belonging to this node */
> >>>>> -			if (page_to_nid(page) != i)
> >>>>> +			if (page_to_nid(page) != zone_to_nid(zone))
> >>>>
> >>>> With overlapping zones you might rescan ranges ... instead we should do:
> >>>>
> >>>> /* only scan pages belonging to this zone */
> >>>> if (zone != page_zone(page))
> >>>> 	...
> >>>>
> >>>> Or alternatively:
> >>>>
> >>>> /* only scan pages belonging to this node */
> >>>> if (page_to_nid(page) != zone_to_nid(zone))
> >>>> 	continue;
> >>>> /* only scan pages belonging to this zone */
> >>>> if (page_zonenum(page) != zone_idx(zone))
> >>>> 	continue;
> >>>
> >>> The original code has covered that, i.e., 
> >>> only scan pages belonging to this node.
> >>> I didn't change that behavior.
> >>
> >> Again, you can easily have overlapping zones -- ZONE_NORMAL and
> >> ZONE_MOVABLE -- in which case, a PFN is spanned by multiple zones, but
> >> only belongs to a single zone.
> >>
> >> The original code would scan each PFN exactly once, as it was iterating
> >> the node PFNs. Your changed code might scan a single PFN multiple times,
> >> if it's spanned by multiple zones.
> >>
> > 
> > Did you mean a single PFN is shared by multiple zones belonging to the 
> > same node here? Thanks!
> 
> Not shared, spanned. A PFN always belongs to exactly one ZONE+NODE, but
> might be "spanned" by multiple nodes or multiple zones, because nodes
> and zones can overlap We can get the actual zone of a PFN via
> page_zone(page) in my example above. Note that checking for the zone
> structure (not the zone number/idx) implicitly checks for the node.
> 
> 
> Let's take a look at an example:
> 
> ...
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory32/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory33/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory34/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory35/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory36/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory37/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory38/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory39/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory40/valid_zones
> Movable
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory41/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory42/valid_zones
> Movable
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory43/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory44/valid_zones
> Movable
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory45/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory46/valid_zones
> Movable
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory47/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory48/valid_zones
> 
> 
> # cat /proc/zoneinfo
> Node 0, zone      DMA
> 	...
>         spanned  4095
>         present  3998
>         managed  3977
> 	...
>   start_pfn:           1
> Node 0, zone    DMA32
> 	...
>         spanned  1044480
>         present  782304
>         managed  765920
> 	...
>   start_pfn:           4096
> Node 0, zone   Normal
> 	...
>         spanned  524288
>         present  393216
>         managed  365736
> 	...
>   start_pfn:           1048576
> Node 0, zone  Movable
> 	...
>         spanned  229376
>         present  131072
>         managed  131072
>   start_pfn:           1310720
> 
> 
> So Normal spans:
> 
> 1048576 -> 1572863
> 
> And Movable spans:
> 
> 1310720 -> 1540095
> 
> Both zones overlap.

Thanks for your clarification! I got it.

Regards,
Lang

> -- 
> Thanks,
> 
> David / dhildenb
>

next prev parent reply	other threads:[~2021-11-08  9:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-05  3:52 [PATCH] mm/kmemleak: Avoid scanning potential huge holes Lang Yu
2021-11-05 13:14 ` David Hildenbrand
2021-11-08  7:27   ` Lang Yu
2021-11-08  8:23     ` David Hildenbrand
2021-11-08  9:06       ` Lang Yu
2021-11-08  9:24         ` David Hildenbrand
2021-11-08  9:44           ` Lang Yu [this message]
2021-11-06 19:20 ` kernel test robot
2021-11-06 19:20   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYjxedn98jVhofba@lang-desktop \
    --to=lang.yu@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.