From: Oscar Salvador <osalvador@techadventures.net>
To: linux-mm@kvack.org
Cc: mhocko@suse.com, vbabka@suse.cz, pasha.tatashin@oracle.com,
akpm@linux-foundation.org
Subject: [RFC] Checking for error code in __offline_pages
Date: Wed, 23 May 2018 09:35:47 +0200 [thread overview]
Message-ID: <20180523073547.GA29266@techadventures.net> (raw)
Hi,
This is something I spotted while testing offlining memory.
__offline_pages() calls do_migrate_range() to try to migrate a range,
but we do not actually check for the error code.
This, besides of ignoring underlying failures, can led to a situations
where we never break up the loop because we are totally unaware of
what is going on.
They way I spotted this was when trying to offline all memblocks belonging
to a node.
Due to an unfortunate setting with movablecore, memblocks containing bootmem
memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
So while trying to remove that memory, the system failed in:
do_migrate_range()
{
...
if (PageLRU(page))
ret = isolate_lru_page(page);
else
ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
if (!ret)
// success: do something
else
if (page_count(page))
ret = -EBUSY;
...
}
Since the pages from bootmem are not LRU, we call isolate_movable_page()
but we fail when checking for __PageMovable().
Since the page_count is more than 0 we return -EBUSY, but we do not check this
in our caller, so we keep trying to migrate this memory over and over:
repeat:
...
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
goto repeat;
}
But this is not only situation where we can get stuck.
For example, if we fail with -ENOMEM in
migrate_pages()->unmap_and_move()/unmap_and_move_huge_page(), we will keep trying as well.
I think we should really detect these cases and fail with "goto failed_removal".
Something like
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1651,6 +1651,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
+ if (ret) {
+ if (ret != -ENOMEM)
+ ret = -EBUSY;
+ goto failed_removal;
+ }
goto repeat;
}
Now, unless I overlooked something
migrate_pages()->unmap_and_move()/unmap_and_move_huge_page() can return:
-ENOMEM
-EAGAIN
-EBUSY
-ENOSYS.
I am not sure if we should differentiate betweeen those errors.
For example, it is possible that in migrate_pages() we just get -EAGAIN,
and we return the number of "retry" we tried without having really failed.
Although, since we do 10 passes it might be considered as failed.
And I am not sure either if we want to propagate the error codes, or in case we fail
in migrate_pages(), whatever the error was (-ENOMEM, -EBUSY, etc.), we
just return -EBUSY.
What do you think?
Thanks
Oscar Salvador
next reply other threads:[~2018-05-23 7:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-23 7:35 Oscar Salvador [this message]
2018-05-23 7:52 ` [RFC] Checking for error code in __offline_pages Michal Hocko
2018-05-23 8:16 ` Michal Hocko
2018-05-23 8:19 ` Oscar Salvador
2018-05-23 9:28 ` Oscar Salvador
2018-05-23 10:26 ` Oscar Salvador
2018-05-23 11:38 ` Michal Hocko
2018-05-23 11:53 ` Oscar Salvador
2018-05-23 8:16 ` Oscar Salvador
2018-05-23 8:32 ` Michal Hocko
2018-05-23 14:51 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180523073547.GA29266@techadventures.net \
--to=osalvador@techadventures.net \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=pasha.tatashin@oracle.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.