All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	David Hildenbrand <david@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: memory offline infinite loop after soft offline
Date: Fri, 11 Oct 2019 17:32:44 -0400	[thread overview]
Message-ID: <1570829564.5937.36.camel@lca.pw> (raw)

# /opt/ltp/runtest/bin/move_pages12
move_pages12.c:263: INFO: Free RAM 258988928 kB
move_pages12.c:281: INFO: Increasing 2048kB hugepages pool on node 0 to 4
move_pages12.c:291: INFO: Increasing 2048kB hugepages pool on node 8 to 4
move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 8
move_pages12.c:197: PASS: Bug not reproduced
move_pages12.c:197: PASS: Bug not reproduced

for mem in $(ls -d /sys/devices/system/memory/memory*); do
        echo offline > $mem/state
        echo online > $mem/state
done

That LTP move_pages12 test will first madvise(MADV_SOFT_OFFLINE) for a range.
Then, one of "echo offline" will trigger an infinite loop in __offline_pages()
here,

		/* check again */
		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
					    NULL, check_pages_isolated_cb);
	} while (ret);

because check_pages_isolated_cb() always return -EBUSY from
test_pages_isolated(),


	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
						skip_hwpoisoned_pages);
        ...
        return pfn < end_pfn ? -EBUSY : 0;

The root cause is in __test_page_isolated_in_pageblock() where "pfn" is always
less than "end_pfn" because the associated page is not a PageBuddy.

	while (pfn < end_pfn) {
	...
		else
			break;

	return pfn;

Adding a dump_page() for that pfn shows,

[  101.665160][ T8885] pfn = 77501, end_pfn = 78000
[  101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[  101.665329][ T8885] flags: 0x3fffc000000000()
[  101.665391][ T8885] raw: 003fffc000000000 0000000000000000 ffffffff01dd0500
0000000000000000
[  101.665498][ T8885] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[  101.665588][ T8885] page dumped because: soft_offline
[  101.665639][ T8885] page_owner tracks the page as freed
[  101.665697][ T8885] page last allocated via order 5, migratetype Movable,
gfp_mask
0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
THISNODE)
[  101.665924][ T8885]  prep_new_page+0x3c0/0x440
[  101.665962][ T8885]  get_page_from_freelist+0x2568/0x2bb0
[  101.666059][ T8885]  __alloc_pages_nodemask+0x1b4/0x670
[  101.666115][ T8885]  alloc_fresh_huge_page+0x244/0x6e0
[  101.666183][ T8885]  alloc_migrate_huge_page+0x30/0x70
[  101.666254][ T8885]  alloc_new_node_page+0xc4/0x380
[  101.666325][ T8885]  migrate_pages+0x3b4/0x19e0
[  101.666375][ T8885]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
[  101.666464][ T8885]  kernel_move_pages+0x498/0xfc0
[  101.666520][ T8885]  sys_move_pages+0x28/0x40
[  101.666643][ T8885]  system_call+0x5c/0x68
[  101.666665][ T8885] page last free stack trace:
[  101.666704][ T8885]  __free_pages_ok+0xa4c/0xd40
[  101.666773][ T8885]  update_and_free_page+0x2dc/0x5b0
[  101.666821][ T8885]  free_huge_page+0x2dc/0x740
[  101.666875][ T8885]  __put_compound_page+0x64/0xc0
[  101.666926][ T8885]  putback_active_hugepage+0x228/0x390
[  101.666990][ T8885]  migrate_pages+0xa78/0x19e0
[  101.667048][ T8885]  soft_offline_page+0x314/0x1050
[  101.667117][ T8885]  sys_madvise+0x1068/0x1080
[  101.667185][ T8885]  system_call+0x5c/0x68


             reply	other threads:[~2019-10-11 21:32 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-11 21:32 Qian Cai [this message]
2019-10-12 10:30 ` memory offline infinite loop after soft offline osalvador
2019-10-14  8:39 ` Michal Hocko
2019-10-17  9:34   ` Naoya Horiguchi
2019-10-17 10:01     ` Michal Hocko
2019-10-17 10:03       ` David Hildenbrand
2019-10-17 18:07       ` Qian Cai
2019-10-17 18:27         ` Michal Hocko
2019-10-18  2:19           ` Naoya Horiguchi
2019-10-18  6:06             ` Michal Hocko
2019-10-18  6:32               ` Naoya Horiguchi
2019-10-18  7:33                 ` Michal Hocko
2019-10-18  8:46                   ` Naoya Horiguchi
2019-10-18 11:56                 ` Qian Cai
2019-10-21  3:16                   ` Naoya Horiguchi
2020-05-15  2:46                     ` Qian Cai
2020-05-15  3:48                       ` HORIGUCHI NAOYA(堀口 直也)
2020-05-19  4:17                         ` Qian Cai
2019-10-18  8:13             ` David Hildenbrand
2019-10-18  8:24               ` Michal Hocko
2019-10-18  8:38                 ` David Hildenbrand
2019-10-18  8:55                   ` Michal Hocko
2019-10-18 11:00                     ` David Hildenbrand
2019-10-18 11:05                       ` David Hildenbrand
2019-10-18 11:34                       ` Michal Hocko
2019-10-18 11:51                         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1570829564.5937.36.camel@lca.pw \
    --to=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.