[PATCH] memcg, vmscan: Do not wait for writeback if killed

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] memcg, vmscan: Do not wait for writeback if killed
@ 2015-12-02 14:26 Michal Hocko
  2015-12-02 22:25 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2015-12-02 14:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Vladimir Davydov, Hugh Dickins, linux-mm, LKML,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Legacy memcg reclaim waits for pages under writeback to prevent from a
premature oom killer invocation because there was no memcg dirty limit
throttling implemented back then.

This heuristic might complicate situation when the writeback cannot make
forward progress because of the global OOM situation. E.g. filesystem
backed by the loop device relies on the underlying filesystem hosting
the image to make forward progress which cannot be guaranteed and so
we might end up triggering OOM killer to resolve the situation. If the
oom victim happens to be the task stuck in wait_on_page_writeback in the
memcg reclaim then we are basically deadlocked.

Introduce wait_on_page_writeback_killable and use it in this path to
prevent from the issue. shrink_page_list will back off if the wait
was interrupted.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Hi,
this came out from the discussion with Vladimir [1]. I haven't seen
a deadlock presented in the above description but I fail to see why
it would be impossible either. wait_on_page_writeback_killable is an
utterly long name but wait_on_page_bit_killable is quite long already
and I felt it is better to use the helper.

Thoughts?

[1] http://lkml.kernel.org/r/1448465801-3280-1-git-send-email-vdavydov@virtuozzo.com

 include/linux/pagemap.h |  8 ++++++++
 mm/vmscan.c             | 11 ++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4d08b6c33557..d3bb8963f8f0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -532,6 +532,14 @@ static inline void wait_on_page_writeback(struct page *page)
 		wait_on_page_bit(page, PG_writeback);
 }
 
+static inline int wait_on_page_writeback_killable(struct page *page)
+{
+	if (PageWriteback(page))
+		return wait_on_page_bit_killable(page, PG_writeback);
+
+	return 0;
+}
+
 extern void end_page_writeback(struct page *page);
 void wait_for_stable_page(struct page *page);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4589cfdbe405..98a1934493af 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1021,10 +1021,19 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 			/* Case 3 above */
 			} else {
+				int ret;
+
 				unlock_page(page);
-				wait_on_page_writeback(page);
+				ret = wait_on_page_writeback_killable(page);
 				/* then go back and try same page again */
 				list_add_tail(&page->lru, page_list);
+
+				/*
+				 * We've got killed while waiting here so
+				 * expedite our way out from the reclaim
+				 */
+				if (ret)
+					break;
 				continue;
 			}
 		}
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg, vmscan: Do not wait for writeback if killed
  2015-12-02 14:26 [PATCH] memcg, vmscan: Do not wait for writeback if killed Michal Hocko
@ 2015-12-02 22:25 ` Andrew Morton
  2015-12-03  9:08   ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2015-12-02 22:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Hugh Dickins, linux-mm, LKML,
	Michal Hocko

On Wed,  2 Dec 2015 15:26:18 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> Legacy memcg reclaim waits for pages under writeback to prevent from a
> premature oom killer invocation because there was no memcg dirty limit
> throttling implemented back then.
> 
> This heuristic might complicate situation when the writeback cannot make
> forward progress because of the global OOM situation. E.g. filesystem
> backed by the loop device relies on the underlying filesystem hosting
> the image to make forward progress which cannot be guaranteed and so
> we might end up triggering OOM killer to resolve the situation. If the
> oom victim happens to be the task stuck in wait_on_page_writeback in the
> memcg reclaim then we are basically deadlocked.
> 
> Introduce wait_on_page_writeback_killable and use it in this path to
> prevent from the issue. shrink_page_list will back off if the wait
> was interrupted.
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1021,10 +1021,19 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  
>  			/* Case 3 above */
>  			} else {
> +				int ret;
> +
>  				unlock_page(page);
> -				wait_on_page_writeback(page);
> +				ret = wait_on_page_writeback_killable(page);
>  				/* then go back and try same page again */
>  				list_add_tail(&page->lru, page_list);
> +
> +				/*
> +				 * We've got killed while waiting here so
> +				 * expedite our way out from the reclaim
> +				 */
> +				if (ret)
> +					break;
>  				continue;
>  			}
>  		}

This function is 350 lines long and it takes a bit of effort to work
out what that `break' is breaking from and where it goes next.  I think
you want a "goto keep_killed" here for consistency and sanity.

Also, there's high risk here of a pending signal causing the code to
fall into some busy loop where it repeatedly tries to do something but
then bales out without doing it.  It's unobvious how this change avoids
such things.  (Maybe it *does* avoid such things, but it should be
obvious!).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg, vmscan: Do not wait for writeback if killed
  2015-12-02 22:25 ` Andrew Morton
@ 2015-12-03  9:08   ` Michal Hocko
  2015-12-05  1:03     ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2015-12-03  9:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Vladimir Davydov, Hugh Dickins, linux-mm, LKML

On Wed 02-12-15 14:25:03, Andrew Morton wrote:
> On Wed,  2 Dec 2015 15:26:18 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Legacy memcg reclaim waits for pages under writeback to prevent from a
> > premature oom killer invocation because there was no memcg dirty limit
> > throttling implemented back then.
> > 
> > This heuristic might complicate situation when the writeback cannot make
> > forward progress because of the global OOM situation. E.g. filesystem
> > backed by the loop device relies on the underlying filesystem hosting
> > the image to make forward progress which cannot be guaranteed and so
> > we might end up triggering OOM killer to resolve the situation. If the
> > oom victim happens to be the task stuck in wait_on_page_writeback in the
> > memcg reclaim then we are basically deadlocked.
> > 
> > Introduce wait_on_page_writeback_killable and use it in this path to
> > prevent from the issue. shrink_page_list will back off if the wait
> > was interrupted.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1021,10 +1021,19 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  
> >  			/* Case 3 above */
> >  			} else {
> > +				int ret;
> > +
> >  				unlock_page(page);
> > -				wait_on_page_writeback(page);
> > +				ret = wait_on_page_writeback_killable(page);
> >  				/* then go back and try same page again */
> >  				list_add_tail(&page->lru, page_list);
> > +
> > +				/*
> > +				 * We've got killed while waiting here so
> > +				 * expedite our way out from the reclaim
> > +				 */
> > +				if (ret)
> > +					break;
> >  				continue;
> >  			}
> >  		}
> 
> This function is 350 lines long and it takes a bit of effort to work
> out what that `break' is breaking from and where it goes next.  I think
> you want a "goto keep_killed" here for consistency and sanity.

Yeah, sounds better. See an update below:

> Also, there's high risk here of a pending signal causing the code to
> fall into some busy loop where it repeatedly tries to do something but
> then bales out without doing it.  It's unobvious how this change avoids
> such things.  (Maybe it *does* avoid such things, but it should be
> obvious!).

shrink_page_list is called from __alloc_contig_migrate_range and
shrink_inactive_list. Both of them handle fatal_signal_pending and bail
out. I was relying on this behavior. I realize this is far from optimal
wrt. readability but I do not have a great idea how to improve it
without sticking more fatal_signal_pending checks into the reclaim path.

So you think a comment would be sufficient?
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 98a1934493af..2e8ee9e5fcb5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1031,9 +1031,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				/*
 				 * We've got killed while waiting here so
 				 * expedite our way out from the reclaim
+				 *
+				 * Our callers should make sure we do not
+				 * get here with fatal signals again.
 				 */
 				if (ret)
-					break;
+					goto keep_killed;
 				continue;
 			}
 		}
@@ -1227,6 +1230,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page);
 	}
 
+keep_killed:
 	mem_cgroup_uncharge_list(&free_pages);
 	try_to_unmap_flush();
 	free_hot_cold_page_list(&free_pages, true);
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg, vmscan: Do not wait for writeback if killed
  2015-12-03  9:08   ` Michal Hocko
@ 2015-12-05  1:03     ` Andrew Morton
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2015-12-05  1:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Hugh Dickins, linux-mm, LKML

On Thu, 3 Dec 2015 10:08:26 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> So you think a comment would be sufficient?
> ---
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 98a1934493af..2e8ee9e5fcb5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1031,9 +1031,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  				/*
>  				 * We've got killed while waiting here so
>  				 * expedite our way out from the reclaim
> +				 *
> +				 * Our callers should make sure we do not
> +				 * get here with fatal signals again.
>  				 */

Seems OK.  s/should/must/

Please resend it all after the usual exhaustive testing ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-12-05  1:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-02 14:26 [PATCH] memcg, vmscan: Do not wait for writeback if killed Michal Hocko
2015-12-02 22:25 ` Andrew Morton
2015-12-03  9:08   ` Michal Hocko
2015-12-05  1:03     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).