From: Qian Cai <cai@lca.pw>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
David Hildenbrand <david@redhat.com>,
Michal Hocko <mhocko@kernel.org>,
Mike Kravetz <mike.kravetz@oracle.com>
Subject: memory offline infinite loop after soft offline
Date: Fri, 11 Oct 2019 17:32:44 -0400 [thread overview]
Message-ID: <1570829564.5937.36.camel@lca.pw> (raw)
# /opt/ltp/runtest/bin/move_pages12
move_pages12.c:263: INFO: Free RAM 258988928 kB
move_pages12.c:281: INFO: Increasing 2048kB hugepages pool on node 0 to 4
move_pages12.c:291: INFO: Increasing 2048kB hugepages pool on node 8 to 4
move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 8
move_pages12.c:197: PASS: Bug not reproduced
move_pages12.c:197: PASS: Bug not reproduced
for mem in $(ls -d /sys/devices/system/memory/memory*); do
echo offline > $mem/state
echo online > $mem/state
done
That LTP move_pages12 test will first madvise(MADV_SOFT_OFFLINE) for a range.
Then, one of "echo offline" will trigger an infinite loop in __offline_pages()
here,
/* check again */
ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
NULL, check_pages_isolated_cb);
} while (ret);
because check_pages_isolated_cb() always return -EBUSY from
test_pages_isolated(),
pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
skip_hwpoisoned_pages);
...
return pfn < end_pfn ? -EBUSY : 0;
The root cause is in __test_page_isolated_in_pageblock() where "pfn" is always
less than "end_pfn" because the associated page is not a PageBuddy.
while (pfn < end_pfn) {
...
else
break;
return pfn;
Adding a dump_page() for that pfn shows,
[ 101.665160][ T8885] pfn = 77501, end_pfn = 78000
[ 101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[ 101.665329][ T8885] flags: 0x3fffc000000000()
[ 101.665391][ T8885] raw: 003fffc000000000 0000000000000000 ffffffff01dd0500
0000000000000000
[ 101.665498][ T8885] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[ 101.665588][ T8885] page dumped because: soft_offline
[ 101.665639][ T8885] page_owner tracks the page as freed
[ 101.665697][ T8885] page last allocated via order 5, migratetype Movable,
gfp_mask
0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
THISNODE)
[ 101.665924][ T8885] prep_new_page+0x3c0/0x440
[ 101.665962][ T8885] get_page_from_freelist+0x2568/0x2bb0
[ 101.666059][ T8885] __alloc_pages_nodemask+0x1b4/0x670
[ 101.666115][ T8885] alloc_fresh_huge_page+0x244/0x6e0
[ 101.666183][ T8885] alloc_migrate_huge_page+0x30/0x70
[ 101.666254][ T8885] alloc_new_node_page+0xc4/0x380
[ 101.666325][ T8885] migrate_pages+0x3b4/0x19e0
[ 101.666375][ T8885] do_move_pages_to_node.isra.29.part.30+0x44/0xa0
[ 101.666464][ T8885] kernel_move_pages+0x498/0xfc0
[ 101.666520][ T8885] sys_move_pages+0x28/0x40
[ 101.666643][ T8885] system_call+0x5c/0x68
[ 101.666665][ T8885] page last free stack trace:
[ 101.666704][ T8885] __free_pages_ok+0xa4c/0xd40
[ 101.666773][ T8885] update_and_free_page+0x2dc/0x5b0
[ 101.666821][ T8885] free_huge_page+0x2dc/0x740
[ 101.666875][ T8885] __put_compound_page+0x64/0xc0
[ 101.666926][ T8885] putback_active_hugepage+0x228/0x390
[ 101.666990][ T8885] migrate_pages+0xa78/0x19e0
[ 101.667048][ T8885] soft_offline_page+0x314/0x1050
[ 101.667117][ T8885] sys_madvise+0x1068/0x1080
[ 101.667185][ T8885] system_call+0x5c/0x68
next reply other threads:[~2019-10-11 21:32 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-11 21:32 Qian Cai [this message]
2019-10-12 10:30 ` memory offline infinite loop after soft offline osalvador
2019-10-14 8:39 ` Michal Hocko
2019-10-17 9:34 ` Naoya Horiguchi
2019-10-17 10:01 ` Michal Hocko
2019-10-17 10:03 ` David Hildenbrand
2019-10-17 18:07 ` Qian Cai
2019-10-17 18:27 ` Michal Hocko
2019-10-18 2:19 ` Naoya Horiguchi
2019-10-18 6:06 ` Michal Hocko
2019-10-18 6:32 ` Naoya Horiguchi
2019-10-18 7:33 ` Michal Hocko
2019-10-18 8:46 ` Naoya Horiguchi
2019-10-18 11:56 ` Qian Cai
2019-10-21 3:16 ` Naoya Horiguchi
2020-05-15 2:46 ` Qian Cai
2020-05-15 3:48 ` HORIGUCHI NAOYA(堀口 直也)
2020-05-19 4:17 ` Qian Cai
2019-10-18 8:13 ` David Hildenbrand
2019-10-18 8:24 ` Michal Hocko
2019-10-18 8:38 ` David Hildenbrand
2019-10-18 8:55 ` Michal Hocko
2019-10-18 11:00 ` David Hildenbrand
2019-10-18 11:05 ` David Hildenbrand
2019-10-18 11:34 ` Michal Hocko
2019-10-18 11:51 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1570829564.5937.36.camel@lca.pw \
--to=cai@lca.pw \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.