From: Lang Yu <Lang.Yu@amd.com>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Catalin Marinas <catalin.marinas@arm.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/kmemleak: Avoid scanning potential huge holes
Date: Mon, 8 Nov 2021 17:44:25 +0800 [thread overview]
Message-ID: <YYjxedn98jVhofba@lang-desktop> (raw)
In-Reply-To: <a82d57cc-c99a-406c-15b7-3e92975dc452@redhat.com>
On Mon, Nov 08, 2021 at 10:24:43AM +0100, David Hildenbrand wrote:
> On 08.11.21 10:06, Lang Yu wrote:
> > On Mon, Nov 08, 2021 at 09:23:16AM +0100, David Hildenbrand wrote:
> >> On 08.11.21 08:27, Lang Yu wrote:
> >>> On Fri, Nov 05, 2021 at 02:14:50PM +0100, David Hildenbrand wrote:
> >>>> On 05.11.21 04:52, Lang Yu wrote:
> >>>>> When using devm_request_free_mem_region() and
> >>>>> devm_memremap_pages() to add ZONE_DEVICE memory, if requested
> >>>>> free mem region pfn were huge(e.g., 0x400000000 ,we found
> >>>>> on some amd apus, amdkfd svm will request a such free mem region),
> >>>>> the node_end_pfn() will be also huge(see move_pfn_range_to_zone()).
> >>>>> It creates a huge hole between node_start_pfn() and node_end_pfn().
> >>>>>
> >>>>> In such a case, following code snippet acctually was
> >>>>> just doing busy test_bit() looping on the huge hole.
> >>>>>
> >>>>> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >>>>> struct page *page = pfn_to_online_page(pfn);
> >>>>> if (!page)
> >>>>> continue;
> >>>>> ...
> >>>>> }
> >>>>>
> >>>>> So we got a soft lockup:
> >>>>>
> >>>>> watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
> >>>>> CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
> >>>>> RIP: 0010:pfn_to_online_page+0x5/0xd0
> >>>>> Call Trace:
> >>>>> ? kmemleak_scan+0x16a/0x440
> >>>>> kmemleak_write+0x306/0x3a0
> >>>>> ? common_file_perm+0x72/0x170
> >>>>> full_proxy_write+0x5c/0x90
> >>>>> vfs_write+0xb9/0x260
> >>>>> ksys_write+0x67/0xe0
> >>>>> __x64_sys_write+0x1a/0x20
> >>>>> do_syscall_64+0x3b/0xc0
> >>>>> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>>>
> >>>>> I did some tests with the patch.
> >>>>>
> >>>>> (1) amdgpu module unloaded
> >>>>>
> >>>>> before the patch:
> >>>>>
> >>>>> real 0m0.976s
> >>>>> user 0m0.000s
> >>>>> sys 0m0.968s
> >>>>>
> >>>>> after the patch:
> >>>>>
> >>>>> real 0m0.981s
> >>>>> user 0m0.000s
> >>>>> sys 0m0.973s
> >>>>>
> >>>>> (2) amdgpu module loaded
> >>>>>
> >>>>> before the patch:
> >>>>>
> >>>>> real 0m35.365s
> >>>>> user 0m0.000s
> >>>>> sys 0m35.354s
> >>>>>
> >>>>> after the patch:
> >>>>>
> >>>>> real 0m1.049s
> >>>>> user 0m0.000s
> >>>>> sys 0m1.042s
> >>>>>
> >>>>> Signed-off-by: Lang Yu <lang.yu@amd.com>
> >>>>> ---
> >>>>> mm/kmemleak.c | 9 +++++----
> >>>>> 1 file changed, 5 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> >>>>> index b57383c17cf6..d07444613a84 100644
> >>>>> --- a/mm/kmemleak.c
> >>>>> +++ b/mm/kmemleak.c
> >>>>> @@ -1403,6 +1403,7 @@ static void kmemleak_scan(void)
> >>>>> {
> >>>>> unsigned long flags;
> >>>>> struct kmemleak_object *object;
> >>>>> + struct zone *zone;
> >>>>> int i;
> >>>>> int new_leaks = 0;
> >>>>>
> >>>>> @@ -1443,9 +1444,9 @@ static void kmemleak_scan(void)
> >>>>> * Struct page scanning for each node.
> >>>>> */
> >>>>> get_online_mems();
> >>>>> - for_each_online_node(i) {
> >>>>> - unsigned long start_pfn = node_start_pfn(i);
> >>>>> - unsigned long end_pfn = node_end_pfn(i);
> >>>>> + for_each_populated_zone(zone) {
> >>>>> + unsigned long start_pfn = zone->zone_start_pfn;
> >>>>> + unsigned long end_pfn = zone_end_pfn(zone);
> >>>>> unsigned long pfn;
> >>>>>
> >>>>> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >>>>> @@ -1455,7 +1456,7 @@ static void kmemleak_scan(void)
> >>>>> continue;
> >>>>>
> >>>>> /* only scan pages belonging to this node */
> >>>>> - if (page_to_nid(page) != i)
> >>>>> + if (page_to_nid(page) != zone_to_nid(zone))
> >>>>
> >>>> With overlapping zones you might rescan ranges ... instead we should do:
> >>>>
> >>>> /* only scan pages belonging to this zone */
> >>>> if (zone != page_zone(page))
> >>>> ...
> >>>>
> >>>> Or alternatively:
> >>>>
> >>>> /* only scan pages belonging to this node */
> >>>> if (page_to_nid(page) != zone_to_nid(zone))
> >>>> continue;
> >>>> /* only scan pages belonging to this zone */
> >>>> if (page_zonenum(page) != zone_idx(zone))
> >>>> continue;
> >>>
> >>> The original code has covered that, i.e.,
> >>> only scan pages belonging to this node.
> >>> I didn't change that behavior.
> >>
> >> Again, you can easily have overlapping zones -- ZONE_NORMAL and
> >> ZONE_MOVABLE -- in which case, a PFN is spanned by multiple zones, but
> >> only belongs to a single zone.
> >>
> >> The original code would scan each PFN exactly once, as it was iterating
> >> the node PFNs. Your changed code might scan a single PFN multiple times,
> >> if it's spanned by multiple zones.
> >>
> >
> > Did you mean a single PFN is shared by multiple zones belonging to the
> > same node here? Thanks!
>
> Not shared, spanned. A PFN always belongs to exactly one ZONE+NODE, but
> might be "spanned" by multiple nodes or multiple zones, because nodes
> and zones can overlap We can get the actual zone of a PFN via
> page_zone(page) in my example above. Note that checking for the zone
> structure (not the zone number/idx) implicitly checks for the node.
>
>
> Let's take a look at an example:
>
> ...
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory32/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory33/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory34/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory35/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory36/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory37/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory38/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory39/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory40/valid_zones
> Movable
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory41/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory42/valid_zones
> Movable
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory43/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory44/valid_zones
> Movable
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory45/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory46/valid_zones
> Movable
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory47/valid_zones
> Normal
> [root@vm-0 ~]# cat /sys/devices/system/memory/memory48/valid_zones
>
>
> # cat /proc/zoneinfo
> Node 0, zone DMA
> ...
> spanned 4095
> present 3998
> managed 3977
> ...
> start_pfn: 1
> Node 0, zone DMA32
> ...
> spanned 1044480
> present 782304
> managed 765920
> ...
> start_pfn: 4096
> Node 0, zone Normal
> ...
> spanned 524288
> present 393216
> managed 365736
> ...
> start_pfn: 1048576
> Node 0, zone Movable
> ...
> spanned 229376
> present 131072
> managed 131072
> start_pfn: 1310720
>
>
> So Normal spans:
>
> 1048576 -> 1572863
>
> And Movable spans:
>
> 1310720 -> 1540095
>
> Both zones overlap.
Thanks for your clarification! I got it.
Regards,
Lang
> --
> Thanks,
>
> David / dhildenb
>
next prev parent reply other threads:[~2021-11-08 9:44 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-05 3:52 [PATCH] mm/kmemleak: Avoid scanning potential huge holes Lang Yu
2021-11-05 13:14 ` David Hildenbrand
2021-11-08 7:27 ` Lang Yu
2021-11-08 8:23 ` David Hildenbrand
2021-11-08 9:06 ` Lang Yu
2021-11-08 9:24 ` David Hildenbrand
2021-11-08 9:44 ` Lang Yu [this message]
2021-11-06 19:20 ` kernel test robot
2021-11-06 19:20 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YYjxedn98jVhofba@lang-desktop \
--to=lang.yu@amd.com \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.