* [RFC] Checking for error code in __offline_pages
@ 2018-05-23 7:35 Oscar Salvador
2018-05-23 7:52 ` Michal Hocko
2018-05-23 14:51 ` David Hildenbrand
0 siblings, 2 replies; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 7:35 UTC (permalink / raw)
To: linux-mm; +Cc: mhocko, vbabka, pasha.tatashin, akpm
Hi,
This is something I spotted while testing offlining memory.
__offline_pages() calls do_migrate_range() to try to migrate a range,
but we do not actually check for the error code.
This, besides of ignoring underlying failures, can led to a situations
where we never break up the loop because we are totally unaware of
what is going on.
They way I spotted this was when trying to offline all memblocks belonging
to a node.
Due to an unfortunate setting with movablecore, memblocks containing bootmem
memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
So while trying to remove that memory, the system failed in:
do_migrate_range()
{
...
if (PageLRU(page))
ret = isolate_lru_page(page);
else
ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
if (!ret)
// success: do something
else
if (page_count(page))
ret = -EBUSY;
...
}
Since the pages from bootmem are not LRU, we call isolate_movable_page()
but we fail when checking for __PageMovable().
Since the page_count is more than 0 we return -EBUSY, but we do not check this
in our caller, so we keep trying to migrate this memory over and over:
repeat:
...
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
goto repeat;
}
But this is not only situation where we can get stuck.
For example, if we fail with -ENOMEM in
migrate_pages()->unmap_and_move()/unmap_and_move_huge_page(), we will keep trying as well.
I think we should really detect these cases and fail with "goto failed_removal".
Something like
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1651,6 +1651,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
+ if (ret) {
+ if (ret != -ENOMEM)
+ ret = -EBUSY;
+ goto failed_removal;
+ }
goto repeat;
}
Now, unless I overlooked something
migrate_pages()->unmap_and_move()/unmap_and_move_huge_page() can return:
-ENOMEM
-EAGAIN
-EBUSY
-ENOSYS.
I am not sure if we should differentiate betweeen those errors.
For example, it is possible that in migrate_pages() we just get -EAGAIN,
and we return the number of "retry" we tried without having really failed.
Although, since we do 10 passes it might be considered as failed.
And I am not sure either if we want to propagate the error codes, or in case we fail
in migrate_pages(), whatever the error was (-ENOMEM, -EBUSY, etc.), we
just return -EBUSY.
What do you think?
Thanks
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 7:35 [RFC] Checking for error code in __offline_pages Oscar Salvador
@ 2018-05-23 7:52 ` Michal Hocko
2018-05-23 8:16 ` Michal Hocko
2018-05-23 8:16 ` Oscar Salvador
2018-05-23 14:51 ` David Hildenbrand
1 sibling, 2 replies; 11+ messages in thread
From: Michal Hocko @ 2018-05-23 7:52 UTC (permalink / raw)
To: Oscar Salvador; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed 23-05-18 09:35:47, Oscar Salvador wrote:
> Hi,
>
> This is something I spotted while testing offlining memory.
>
> __offline_pages() calls do_migrate_range() to try to migrate a range,
> but we do not actually check for the error code.
Yes, this is intentional. do_migrate_range doesn't distinguish between
temporal and permanent migration failure. Getting EBUSY would be just
too easy and that is why we retry. We rely on start_isolate_page_range
to tell us about any non-migrateable pages and we consider all other
failures as temporal.
> This, besides of ignoring underlying failures, can led to a situations
> where we never break up the loop because we are totally unaware of
> what is going on.
This shouldn't happen. If it does then start_isolate_page_range should
handle those non-migrateable pages.
> They way I spotted this was when trying to offline all memblocks belonging
> to a node.
> Due to an unfortunate setting with movablecore, memblocks containing bootmem
> memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
This is a bug as well. Zone movable shouldn't contain any
non-migrateable pages.
[...]
> Since the pages from bootmem are not LRU, we call isolate_movable_page()
> but we fail when checking for __PageMovable().
> Since the page_count is more than 0 we return -EBUSY, but we do not check this
> in our caller, so we keep trying to migrate this memory over and over:
>
> repeat:
> ...
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> goto repeat;
> }
>
> But this is not only situation where we can get stuck.
> For example, if we fail with -ENOMEM in
> migrate_pages()->unmap_and_move()/unmap_and_move_huge_page(), we will keep trying as well.
ENOMEM is highly unlikely because we are should be allocating only small
order pages and those do not fail unless the originator is killed by the
oom killer and we would break out of the loop in such a cace because of
signals pending.
> I think we should really detect these cases and fail with "goto failed_removal".
> Something like
>
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1651,6 +1651,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> + if (ret) {
> + if (ret != -ENOMEM)
> + ret = -EBUSY;
> + goto failed_removal;
> + }
> goto repeat;
> }
no, not really. As explained above this would allow to fail the
offlining way too easily. Yeah, the current code is far from optimal. We
used to have a retry count but that one was removed exactly because of
premature failures. There are three things here
1) zone_movable should contain any bootmem or otherwise non-migrateable
pages
2) start_isolate_page_range should fail when seeing such pages - maybe
has_unmovable_pages is overly optimistic and it should check all
pages even in movable zones.
3) migrate_pages should really tell us whether the failure is temporal
or permanent. I am not sure we can do that easily though.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 7:52 ` Michal Hocko
@ 2018-05-23 8:16 ` Michal Hocko
2018-05-23 8:19 ` Oscar Salvador
` (2 more replies)
2018-05-23 8:16 ` Oscar Salvador
1 sibling, 3 replies; 11+ messages in thread
From: Michal Hocko @ 2018-05-23 8:16 UTC (permalink / raw)
To: Oscar Salvador; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed 23-05-18 09:52:39, Michal Hocko wrote:
[...]
> Yeah, the current code is far from optimal. We
> used to have a retry count but that one was removed exactly because of
> premature failures. There are three things here
> 1) zone_movable should contain any bootmem or otherwise non-migrateable
> pages
> 2) start_isolate_page_range should fail when seeing such pages - maybe
> has_unmovable_pages is overly optimistic and it should check all
> pages even in movable zones.
> 3) migrate_pages should really tell us whether the failure is temporal
> or permanent. I am not sure we can do that easily though.
2) should be the most simple one for now. Could you give it a try? Btw.
the exact configuration that led to boothmem pages in zone_movable would
be really appreciated:
---
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 7:52 ` Michal Hocko
2018-05-23 8:16 ` Michal Hocko
@ 2018-05-23 8:16 ` Oscar Salvador
2018-05-23 8:32 ` Michal Hocko
1 sibling, 1 reply; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 8:16 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 09:52:39AM +0200, Michal Hocko wrote:
> On Wed 23-05-18 09:35:47, Oscar Salvador wrote:
> > Hi,
> >
> > This is something I spotted while testing offlining memory.
> >
> > __offline_pages() calls do_migrate_range() to try to migrate a range,
> > but we do not actually check for the error code.
>
> Yes, this is intentional. do_migrate_range doesn't distinguish between
> temporal and permanent migration failure. Getting EBUSY would be just
> too easy and that is why we retry. We rely on start_isolate_page_range
> to tell us about any non-migrateable pages and we consider all other
> failures as temporal.
>
> > This, besides of ignoring underlying failures, can led to a situations
> > where we never break up the loop because we are totally unaware of
> > what is going on.
>
> This shouldn't happen. If it does then start_isolate_page_range should
> handle those non-migrateable pages.
>
> > They way I spotted this was when trying to offline all memblocks belonging
> > to a node.
> > Due to an unfortunate setting with movablecore, memblocks containing bootmem
> > memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
>
> This is a bug as well. Zone movable shouldn't contain any
> non-migrateable pages.
>
> [...]
>
> > Since the pages from bootmem are not LRU, we call isolate_movable_page()
> > but we fail when checking for __PageMovable().
> > Since the page_count is more than 0 we return -EBUSY, but we do not check this
> > in our caller, so we keep trying to migrate this memory over and over:
> >
> > repeat:
> > ...
> > pfn = scan_movable_pages(start_pfn, end_pfn);
> > if (pfn) { /* We have movable pages */
> > ret = do_migrate_range(pfn, end_pfn);
> > goto repeat;
> > }
> >
> > But this is not only situation where we can get stuck.
> > For example, if we fail with -ENOMEM in
> > migrate_pages()->unmap_and_move()/unmap_and_move_huge_page(), we will keep trying as well.
>
> ENOMEM is highly unlikely because we are should be allocating only small
> order pages and those do not fail unless the originator is killed by the
> oom killer and we would break out of the loop in such a cace because of
> signals pending.
>
> > I think we should really detect these cases and fail with "goto failed_removal".
> > Something like
> >
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1651,6 +1651,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
> > pfn = scan_movable_pages(start_pfn, end_pfn);
> > if (pfn) { /* We have movable pages */
> > ret = do_migrate_range(pfn, end_pfn);
> > + if (ret) {
> > + if (ret != -ENOMEM)
> > + ret = -EBUSY;
> > + goto failed_removal;
> > + }
> > goto repeat;
> > }
>
> no, not really. As explained above this would allow to fail the
> offlining way too easily. Yeah, the current code is far from optimal. We
> used to have a retry count but that one was removed exactly because of
> premature failures. There are three things here
> 1) zone_movable should contain any bootmem or otherwise non-migrateable
> pages
> 2) start_isolate_page_range should fail when seeing such pages - maybe
> has_unmovable_pages is overly optimistic and it should check all
> pages even in movable zones.
I will see if I can work this out.
> 3) migrate_pages should really tell us whether the failure is temporal
> or permanent. I am not sure we can do that easily though.
AFAIU, permament errors are things like -EBUSY, -ENOSYS, -ENOMEM,
and a temporary one would be -EAGAIN?
Maybe it is overcomplicated, but what about adding another parameter to
migrate_pages() where we set the real error.
something like:
int migrate_pages(struct list_head *from, new_page_t get_new_page,
free_page_t put_new_page, unsigned long private,
enum migrate_mode mode, int reason, int *error)
Now it is not possible to find out why did we fail there.
We just get the number of pages that were not migrated (unless it is -ENOMEM,
which completely bails out and returns that)
For -EBUSY,-ENOSYS and -EAGAIN we just increment some value and return it.
Although as I said, this might be overcomplicating things.
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 8:16 ` Michal Hocko
@ 2018-05-23 8:19 ` Oscar Salvador
2018-05-23 9:28 ` Oscar Salvador
2018-05-23 10:26 ` Oscar Salvador
2 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 8:19 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> [...]
> > Yeah, the current code is far from optimal. We
> > used to have a retry count but that one was removed exactly because of
> > premature failures. There are three things here
> > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > pages
> > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > has_unmovable_pages is overly optimistic and it should check all
> > pages even in movable zones.
> > 3) migrate_pages should really tell us whether the failure is temporal
> > or permanent. I am not sure we can do that easily though.
>
> 2) should be the most simple one for now. Could you give it a try? Btw.
> the exact configuration that led to boothmem pages in zone_movable would
> be really appreciated:
I will try it out and I will paste the config.
> ---
> From 6aa144a9b1c01255c89a4592221d706ccc4b4eea Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 23 May 2018 10:04:20 +0200
> Subject: [PATCH] mm, memory_hotplug: make has_unmovable_pages more robust
>
> Oscar has reported:
> : Due to an unfortunate setting with movablecore, memblocks containing bootmem
> : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
> : So while trying to remove that memory, the system failed in do_migrate_range
> : and __offline_pages never returned.
>
> This is because we rely on start_isolate_page_range resp. has_unmovable_pages
> to do their jobb. The first one isolates the whole range to be offlined
> so that we do not allocate from it anymore and the later makes sure we
> are not stumbling over non-migrateable pages.
>
> has_unmovable_pages is overly optimistic, however. It doesn't check all
> the pages if we are withing zone_movable because we rely that those
> pages will be always migrateable. As it turns out we are still not
> perfect there. While bootmem pages in zonemovable sound like a clear bug
> which should be fixed let's remove the optimization for now and warn if
> we encounter unmovable pages in zone_movable in the meantime. That
> should help for now at least.
>
> Btw. this wasn't a real problem until 72b39cfc4d75 ("mm, memory_hotplug:
> do not fail offlining too early") because we used to have a small number
> of retries and then failed. This turned out to be too fragile though.
>
> Reported-by: Oscar Salvador <osalvador@techadventures.net>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/page_alloc.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c6f4008ea55..b9a45753244d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7629,11 +7629,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> unsigned long pfn, iter, found;
>
> /*
> - * For avoiding noise data, lru_add_drain_all() should be called
> - * If ZONE_MOVABLE, the zone never contains unmovable pages
> + * TODO we could make this much more efficient by not checking every
> + * page in the range if we know all of them are in MOVABLE_ZONE and
> + * that the movable zone guarantees that pages are migratable but
> + * the later is not the case right now unfortunatelly. E.g. movablecore
> + * can still lead to having bootmem allocations in zone_movable.
> */
> - if (zone_idx(zone) == ZONE_MOVABLE)
> - return false;
>
> /*
> * CMA allocations (alloc_contig_range) really need to mark isolate
> @@ -7654,7 +7655,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> page = pfn_to_page(check);
>
> if (PageReserved(page))
> - return true;
> + goto unmovable;
>
> /*
> * Hugepages are not in LRU lists, but they're movable.
> @@ -7704,9 +7705,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> * page at boot.
> */
> if (found > count)
> - return true;
> + goto unmovable;
> }
> return false;
> +unmovable:
> + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> + return true;
> }
>
> #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
> --
> 2.17.0
Thanks
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 8:16 ` Oscar Salvador
@ 2018-05-23 8:32 ` Michal Hocko
0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2018-05-23 8:32 UTC (permalink / raw)
To: Oscar Salvador; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed 23-05-18 10:16:49, Oscar Salvador wrote:
[...]
> AFAIU, permament errors are things like -EBUSY, -ENOSYS, -ENOMEM,
> and a temporary one would be -EAGAIN?
It would be really great to have EBUSY as permanent and ENOMEM and
EAGAIN as temporary failures. But this is not so easy. The migration
code failes on the elevated ref count usually and we simply do not know
whether this is a short term pin or somebody holding the reference
basically for ever (from the migration POV). There was some discussion
about longterm pins on pages at LSFMM this year but it will take quite
some time before we will get some working solution.
> Maybe it is overcomplicated, but what about adding another parameter to
> migrate_pages() where we set the real error.
> something like:
>
> int migrate_pages(struct list_head *from, new_page_t get_new_page,
> free_page_t put_new_page, unsigned long private,
> enum migrate_mode mode, int reason, int *error)
I am not sure we really need a new parameter. migrate_pages will tell us
the failure. We just do not know _which_ error to return currently.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 8:16 ` Michal Hocko
2018-05-23 8:19 ` Oscar Salvador
@ 2018-05-23 9:28 ` Oscar Salvador
2018-05-23 10:26 ` Oscar Salvador
2 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 9:28 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> [...]
> > Yeah, the current code is far from optimal. We
> > used to have a retry count but that one was removed exactly because of
> > premature failures. There are three things here
> > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > pages
> > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > has_unmovable_pages is overly optimistic and it should check all
> > pages even in movable zones.
> > 3) migrate_pages should really tell us whether the failure is temporal
> > or permanent. I am not sure we can do that easily though.
>
> 2) should be the most simple one for now. Could you give it a try? Btw.
> the exact configuration that led to boothmem pages in zone_movable would
> be really appreciated:
> ---
> From 6aa144a9b1c01255c89a4592221d706ccc4b4eea Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 23 May 2018 10:04:20 +0200
> Subject: [PATCH] mm, memory_hotplug: make has_unmovable_pages more robust
>
> Oscar has reported:
> : Due to an unfortunate setting with movablecore, memblocks containing bootmem
> : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
> : So while trying to remove that memory, the system failed in do_migrate_range
> : and __offline_pages never returned.
>
> This is because we rely on start_isolate_page_range resp. has_unmovable_pages
> to do their jobb. The first one isolates the whole range to be offlined
> so that we do not allocate from it anymore and the later makes sure we
> are not stumbling over non-migrateable pages.
>
> has_unmovable_pages is overly optimistic, however. It doesn't check all
> the pages if we are withing zone_movable because we rely that those
> pages will be always migrateable. As it turns out we are still not
> perfect there. While bootmem pages in zonemovable sound like a clear bug
> which should be fixed let's remove the optimization for now and warn if
> we encounter unmovable pages in zone_movable in the meantime. That
> should help for now at least.
>
> Btw. this wasn't a real problem until 72b39cfc4d75 ("mm, memory_hotplug:
> do not fail offlining too early") because we used to have a small number
> of retries and then failed. This turned out to be too fragile though.
>
> Reported-by: Oscar Salvador <osalvador@techadventures.net>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/page_alloc.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c6f4008ea55..b9a45753244d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7629,11 +7629,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> unsigned long pfn, iter, found;
>
> /*
> - * For avoiding noise data, lru_add_drain_all() should be called
> - * If ZONE_MOVABLE, the zone never contains unmovable pages
> + * TODO we could make this much more efficient by not checking every
> + * page in the range if we know all of them are in MOVABLE_ZONE and
> + * that the movable zone guarantees that pages are migratable but
> + * the later is not the case right now unfortunatelly. E.g. movablecore
> + * can still lead to having bootmem allocations in zone_movable.
> */
> - if (zone_idx(zone) == ZONE_MOVABLE)
> - return false;
>
> /*
> * CMA allocations (alloc_contig_range) really need to mark isolate
> @@ -7654,7 +7655,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> page = pfn_to_page(check);
>
> if (PageReserved(page))
> - return true;
> + goto unmovable;
>
> /*
> * Hugepages are not in LRU lists, but they're movable.
> @@ -7704,9 +7705,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> * page at boot.
> */
> if (found > count)
> - return true;
> + goto unmovable;
> }
> return false;
> +unmovable:
> + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> + return true;
> }
>
> #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
> --
> 2.17.0
Tested-by: Oscar Salvador <osalvador@techadventures.net>
thanks!
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 8:16 ` Michal Hocko
2018-05-23 8:19 ` Oscar Salvador
2018-05-23 9:28 ` Oscar Salvador
@ 2018-05-23 10:26 ` Oscar Salvador
2018-05-23 11:38 ` Michal Hocko
2 siblings, 1 reply; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 10:26 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> [...]
> > Yeah, the current code is far from optimal. We
> > used to have a retry count but that one was removed exactly because of
> > premature failures. There are three things here
> > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > pages
> > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > has_unmovable_pages is overly optimistic and it should check all
> > pages even in movable zones.
> > 3) migrate_pages should really tell us whether the failure is temporal
> > or permanent. I am not sure we can do that easily though.
>
> 2) should be the most simple one for now. Could you give it a try? Btw.
> the exact configuration that led to boothmem pages in zone_movable would
> be really appreciated:
Here is some information:
** Qemu cmdline:
# qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ...
# Option movablecore=4G (cmdline)
** e820 map and some numa information:
linux kernel: BIOS-provided physical RAM map:
linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
linux kernel: NX (Execute Disable) protection: active
linux kernel: SMBIOS 2.8 present.
linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
linux kernel: Hypervisor detected: KVM
linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
** /proc/zoneinfo
Node 0, zone DMA
per-node stats
nr_inactive_anon 2107
nr_active_anon 49560
nr_inactive_file 25375
nr_active_file 19038
nr_unevictable 12
nr_slab_reclaimable 5996
nr_slab_unreclaimable 7236
nr_isolated_anon 0
nr_isolated_file 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_pages 48910
nr_mapped 13780
nr_file_pages 46676
nr_dirty 13
nr_writeback 0
nr_writeback_temp 0
nr_shmem 2263
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 50
nr_unstable 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 17749
nr_written 17462
83328
pages free 3961
min 29
low 36
high 43
spanned 4095
present 3998
managed 3977
protection: (0, 2939, 2939, 3898, 3898)
nr_free_pages 3961
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 2
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 1
numa_other 1
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 4
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 4
node_unreclaimable: 0
start_pfn: 1
Node 0, zone DMA32
pages free 724414
min 5583
low 6978
high 8373
spanned 1044480
present 782304
managed 758516
protection: (0, 0, 0, 959, 959)
nr_free_pages 724414
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 1697
nr_zone_active_file 8915
nr_zone_unevictable 0
nr_zone_write_pending 12
nr_mlock 0
nr_page_table_pages 2976
nr_kernel_stack 4000
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 281025
numa_miss 0
numa_foreign 0
numa_interleave 8583
numa_local 135392
numa_other 145633
pagesets
cpu: 0
count: 164
high: 186
batch: 31
vm stats threshold: 24
cpu: 1
count: 32
high: 186
batch: 31
vm stats threshold: 24
node_unreclaimable: 0
start_pfn: 4096
Node 0, zone Normal
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 7677, 7677)
Node 0, zone Movable
pages free 160140
min 1823
low 2278
high 2733
spanned 262144
present 262144
managed 245670
protection: (0, 0, 0, 0, 0)
nr_free_pages 160140
nr_zone_inactive_anon 2107
nr_zone_active_anon 49560
nr_zone_inactive_file 23678
nr_zone_active_file 10123
nr_zone_unevictable 12
nr_zone_write_pending 1
nr_mlock 12
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 214370
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 214344
numa_other 26
pagesets
cpu: 0
count: 32
high: 42
batch: 7
vm stats threshold: 16
cpu: 1
count: 26
high: 42
batch: 7
vm stats threshold: 16
node_unreclaimable: 0
start_pfn: 1048576
Node 0, zone Device
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 0, 0)
Node 1, zone DMA
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 2014, 2014)
Node 1, zone DMA32
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 2014, 2014)
Node 1, zone Normal
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 16117, 16117)
Node 1, zone Movable
per-node stats
nr_inactive_anon 524
nr_active_anon 25734
nr_inactive_file 28733
nr_active_file 12316
nr_unevictable 8
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_isolated_anon 0
nr_isolated_file 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_pages 24656
nr_mapped 16871
nr_file_pages 41647
nr_dirty 1
nr_writeback 0
nr_writeback_temp 0
nr_shmem 598
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 8
nr_unstable 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 125
nr_written 98
0
pages free 448427
min 3827
low 4783
high 5739
spanned 524288
present 524288
managed 515766
protection: (0, 0, 0, 0, 0)
nr_free_pages 448427
nr_zone_inactive_anon 524
nr_zone_active_anon 25734
nr_zone_inactive_file 28733
nr_zone_active_file 12316
nr_zone_unevictable 8
nr_zone_write_pending 1
nr_mlock 8
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 199599
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 199599
numa_other 0
pagesets
cpu: 0
count: 9
high: 42
batch: 7
vm stats threshold: 20
cpu: 1
count: 2
high: 42
batch: 7
vm stats threshold: 20
node_unreclaimable: 0
start_pfn: 1310720
Node 1, zone Device
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 0, 0)
I hope this is enough.
Thanks
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 10:26 ` Oscar Salvador
@ 2018-05-23 11:38 ` Michal Hocko
2018-05-23 11:53 ` Oscar Salvador
0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2018-05-23 11:38 UTC (permalink / raw)
To: Oscar Salvador; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed 23-05-18 12:26:43, Oscar Salvador wrote:
> On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> > On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> > [...]
> > > Yeah, the current code is far from optimal. We
> > > used to have a retry count but that one was removed exactly because of
> > > premature failures. There are three things here
> > > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > > pages
> > > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > > has_unmovable_pages is overly optimistic and it should check all
> > > pages even in movable zones.
> > > 3) migrate_pages should really tell us whether the failure is temporal
> > > or permanent. I am not sure we can do that easily though.
> >
> > 2) should be the most simple one for now. Could you give it a try? Btw.
> > the exact configuration that led to boothmem pages in zone_movable would
> > be really appreciated:
>
> Here is some information:
>
> ** Qemu cmdline:
>
> # qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ...
> # Option movablecore=4G (cmdline)
>
> ** e820 map and some numa information:
>
> linux kernel: BIOS-provided physical RAM map:
> linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
> linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
> linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
> linux kernel: NX (Execute Disable) protection: active
> linux kernel: SMBIOS 2.8 present.
> linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
> linux kernel: Hypervisor detected: KVM
> linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
> linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
>
> linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
> linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
> linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
> linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
> linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
> linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
> linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
> linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
> linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
> linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
Could you also paste
"Zone ranges:"
and the follow up messages?
>From the zoneinfo it seems the movable zone got placed to both nodes.
And only Node0 is marked as hotplugable so early allocations can be
placed to Node1.
> ** /proc/zoneinfo
[...]
> Node 0, zone Movable
> pages free 160140
> min 1823
> low 2278
> high 2733
> spanned 262144
> present 262144
> managed 245670
it seems that 1G went to Node0
> Node 1, zone Movable
[...]
> pages free 448427
> min 3827
> low 4783
> high 5739
> spanned 524288
> present 524288
> managed 515766
and the rest to Node1. Guessing from spanned-managed it seems that used
memory is for memmaps (struct page arrays).
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 11:38 ` Michal Hocko
@ 2018-05-23 11:53 ` Oscar Salvador
0 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 11:53 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 01:38:57PM +0200, Michal Hocko wrote:
> On Wed 23-05-18 12:26:43, Oscar Salvador wrote:
> > On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> > > On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> > > [...]
> > > > Yeah, the current code is far from optimal. We
> > > > used to have a retry count but that one was removed exactly because of
> > > > premature failures. There are three things here
> > > > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > > > pages
> > > > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > > > has_unmovable_pages is overly optimistic and it should check all
> > > > pages even in movable zones.
> > > > 3) migrate_pages should really tell us whether the failure is temporal
> > > > or permanent. I am not sure we can do that easily though.
> > >
> > > 2) should be the most simple one for now. Could you give it a try? Btw.
> > > the exact configuration that led to boothmem pages in zone_movable would
> > > be really appreciated:
> >
> > Here is some information:
> >
> > ** Qemu cmdline:
> >
> > # qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ...
> > # Option movablecore=4G (cmdline)
> >
> > ** e820 map and some numa information:
> >
> > linux kernel: BIOS-provided physical RAM map:
> > linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
> > linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
> > linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
> > linux kernel: NX (Execute Disable) protection: active
> > linux kernel: SMBIOS 2.8 present.
> > linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
> > linux kernel: Hypervisor detected: KVM
> > linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> > linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
> > linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
> >
> > linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
> > linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
> > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
> > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
> > linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
> > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
> > linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
> > linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
> > linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
> > linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
>
> Could you also paste
> "Zone ranges:"
> and the follow up messages?
Michal, here is the output about "Zone ranges:"
linux kernel: Zone ranges:
linux kernel: DMA [mem 0x0000000000001000-0x0000000000ffffff]
linux kernel: DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
linux kernel: Normal [mem 0x0000000100000000-0x00000001bfffffff]
linux kernel: Device empty
linux kernel: Movable zone start for each node
linux kernel: Node 0: 0x0000000100000000
linux kernel: Node 1: 0x0000000140000000
linux kernel: Early memory node ranges
linux kernel: node 0: [mem 0x0000000000001000-0x000000000009efff]
linux kernel: node 0: [mem 0x0000000000100000-0x00000000bffdffff]
linux kernel: node 0: [mem 0x0000000100000000-0x000000013fffffff]
linux kernel: node 1: [mem 0x0000000140000000-0x00000001bfffffff]
linux kernel: Initmem setup node 0 [mem 0x0000000000001000-0x000000013fffffff]
linux kernel: On node 0 totalpages: 1048446
linux kernel: DMA zone: 64 pages used for memmap
linux kernel: DMA zone: 21 pages reserved
linux kernel: DMA zone: 3998 pages, LIFO batch:0
linux kernel: DMA32 zone: 12224 pages used for memmap
linux kernel: DMA32 zone: 782304 pages, LIFO batch:31
linux kernel: Movable zone: 4096 pages used for memmap
linux kernel: Movable zone: 262144 pages, LIFO batch:31
linux kernel: Initmem setup node 1 [mem 0x0000000140000000-0x00000001bfffffff]
linux kernel: On node 1 totalpages: 524288
linux kernel: Movable zone: 8192 pages used for memmap
linux kernel: Movable zone: 524288 pages, LIFO batch:31
linux kernel: Reserved but unavailable: 98 pages
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 7:35 [RFC] Checking for error code in __offline_pages Oscar Salvador
2018-05-23 7:52 ` Michal Hocko
@ 2018-05-23 14:51 ` David Hildenbrand
1 sibling, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2018-05-23 14:51 UTC (permalink / raw)
To: Oscar Salvador, linux-mm; +Cc: mhocko, vbabka, pasha.tatashin, akpm
On 23.05.2018 09:35, Oscar Salvador wrote:
> Hi,
>
> This is something I spotted while testing offlining memory.
>
> __offline_pages() calls do_migrate_range() to try to migrate a range,
> but we do not actually check for the error code.
> This, besides of ignoring underlying failures, can led to a situations
> where we never break up the loop because we are totally unaware of
> what is going on.
>
> They way I spotted this was when trying to offline all memblocks belonging
> to a node.
> Due to an unfortunate setting with movablecore, memblocks containing bootmem
> memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
> So while trying to remove that memory, the system failed in:
>
> do_migrate_range()
> {
> ...
> if (PageLRU(page))
> ret = isolate_lru_page(page);
> else
> ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
>
> if (!ret)
> // success: do something
> else
> if (page_count(page))
> ret = -EBUSY;
> ...
> }
>
> Since the pages from bootmem are not LRU, we call isolate_movable_page()
> but we fail when checking for __PageMovable().
> Since the page_count is more than 0 we return -EBUSY, but we do not check this
> in our caller, so we keep trying to migrate this memory over and over:
>
> repeat:
> ...
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> goto repeat;
> }
>
> But this is not only situation where we can get stuck.
> For example, if we fail with -ENOMEM in
> migrate_pages()->unmap_and_move()/unmap_and_move_huge_page(), we will keep trying as well.
> I think we should really detect these cases and fail with "goto failed_removal".
> Something like
>
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1651,6 +1651,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> + if (ret) {
> + if (ret != -ENOMEM)
> + ret = -EBUSY;
> + goto failed_removal;
> + }
> goto repeat;
> }
>
> Now, unless I overlooked something
> migrate_pages()->unmap_and_move()/unmap_and_move_huge_page() can return:
> -ENOMEM
> -EAGAIN
> -EBUSY
> -ENOSYS.
>
> I am not sure if we should differentiate betweeen those errors.
> For example, it is possible that in migrate_pages() we just get -EAGAIN,
> and we return the number of "retry" we tried without having really failed.
> Although, since we do 10 passes it might be considered as failed.
>
> And I am not sure either if we want to propagate the error codes, or in case we fail
> in migrate_pages(), whatever the error was (-ENOMEM, -EBUSY, etc.), we
> just return -EBUSY.
>
> What do you think?
Hi,
While working on onlining/offlining of 4MB subsections I also stumbled
over the return value of offline_pages(). It would be nice if the
interface could actually indicate if an error is permanent or only
temporary.
For now I have to live with the assumption, that whenever this function
is not -EAGAIN or 0, that I simply have to retry later.
David
>
> Thanks
> Oscar Salvador
>
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-05-23 14:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-23 7:35 [RFC] Checking for error code in __offline_pages Oscar Salvador
2018-05-23 7:52 ` Michal Hocko
2018-05-23 8:16 ` Michal Hocko
2018-05-23 8:19 ` Oscar Salvador
2018-05-23 9:28 ` Oscar Salvador
2018-05-23 10:26 ` Oscar Salvador
2018-05-23 11:38 ` Michal Hocko
2018-05-23 11:53 ` Oscar Salvador
2018-05-23 8:16 ` Oscar Salvador
2018-05-23 8:32 ` Michal Hocko
2018-05-23 14:51 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).