All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Stable tree <stable@vger.kernel.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	Antonio SJ Musumeci <trapexit@spawn.link>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [Stable 4.4 - NEEDS REVIEW - 1/3] mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
Date: Mon, 24 Oct 2016 14:52:23 -0400	[thread overview]
Message-ID: <20161024185223.GA28326@cmpxchg.org> (raw)
In-Reply-To: <20161024152605.11707-2-mhocko@kernel.org>

On Mon, Oct 24, 2016 at 05:26:03PM +0200, Michal Hocko wrote:
> From: Johannes Weiner <hannes@cmpxchg.org>
> 
> Commit 22f2ac51b6d643666f4db093f13144f773ff3f3a upstream.
> 
> Antonio reports the following crash when using fuse under memory pressure:
> 
>   kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
>   invalid opcode: 0000 [#1] SMP
>   Modules linked in: all of them
>   CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
>   Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>   task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
>   RIP: shadow_lru_isolate+0x181/0x190
>   Call Trace:
>     __list_lru_walk_one.isra.3+0x8f/0x130
>     list_lru_walk_one+0x23/0x30
>     scan_shadow_nodes+0x34/0x50
>     shrink_slab.part.40+0x1ed/0x3d0
>     shrink_zone+0x2ca/0x2e0
>     kswapd+0x51e/0x990
>     kthread+0xd8/0xf0
>     ret_from_fork+0x3f/0x70
> 
> which corresponds to the following sanity check in the shadow node
> tracking:
> 
>   BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
> 
> The workingset code tracks radix tree nodes that exclusively contain
> shadow entries of evicted pages in them, and this (somewhat obscure)
> line checks whether there are real pages left that would interfere with
> reclaim of the radix tree node under memory pressure.
> 
> While discussing ways how fuse might sneak pages into the radix tree
> past the workingset code, Miklos pointed to replace_page_cache_page(),
> and indeed there is a problem there: it properly accounts for the old
> page being removed - __delete_from_page_cache() does that - but then
> does a raw raw radix_tree_insert(), not accounting for the replacement
> page.  Eventually the page count bits in node->count underflow while
> leaving the node incorrectly linked to the shadow node LRU.
> 
> To address this, make sure replace_page_cache_page() uses the tracked
> page insertion code, page_cache_tree_insert().  This fixes the page
> accounting and makes sure page-containing nodes are properly unlinked
> from the shadow node LRU again.
> 
> Also, make the sanity checks a bit less obscure by using the helpers for
> checking the number of pages and shadows in a radix tree node.
> 
> Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
> Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
> Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
> Cc: <stable@vger.kernel.org>	[3.15+]
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/swap.h |  2 ++
>  mm/filemap.c         | 86 ++++++++++++++++++++++++++--------------------------
>  mm/workingset.c      | 10 +++---
>  3 files changed, 49 insertions(+), 49 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 7ba7dccaf0e7..b28de19aadbf 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -266,6 +266,7 @@ static inline void workingset_node_pages_inc(struct radix_tree_node *node)
>  
>  static inline void workingset_node_pages_dec(struct radix_tree_node *node)
>  {
> +	VM_BUG_ON(!workingset_node_pages(node));
>  	node->count--;
>  }

We should also pull 21f54ddae449 ("Using BUG_ON() as an assert() is
_never_ acceptable") into stable on top of this patch to replace the
BUG_ONs with warnings.

Otherwise this looks good to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Stable tree <stable@vger.kernel.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	Antonio SJ Musumeci <trapexit@spawn.link>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [Stable 4.4 - NEEDS REVIEW - 1/3] mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
Date: Mon, 24 Oct 2016 14:52:23 -0400	[thread overview]
Message-ID: <20161024185223.GA28326@cmpxchg.org> (raw)
In-Reply-To: <20161024152605.11707-2-mhocko@kernel.org>

On Mon, Oct 24, 2016 at 05:26:03PM +0200, Michal Hocko wrote:
> From: Johannes Weiner <hannes@cmpxchg.org>
> 
> Commit 22f2ac51b6d643666f4db093f13144f773ff3f3a upstream.
> 
> Antonio reports the following crash when using fuse under memory pressure:
> 
>   kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
>   invalid opcode: 0000 [#1] SMP
>   Modules linked in: all of them
>   CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
>   Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>   task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
>   RIP: shadow_lru_isolate+0x181/0x190
>   Call Trace:
>     __list_lru_walk_one.isra.3+0x8f/0x130
>     list_lru_walk_one+0x23/0x30
>     scan_shadow_nodes+0x34/0x50
>     shrink_slab.part.40+0x1ed/0x3d0
>     shrink_zone+0x2ca/0x2e0
>     kswapd+0x51e/0x990
>     kthread+0xd8/0xf0
>     ret_from_fork+0x3f/0x70
> 
> which corresponds to the following sanity check in the shadow node
> tracking:
> 
>   BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
> 
> The workingset code tracks radix tree nodes that exclusively contain
> shadow entries of evicted pages in them, and this (somewhat obscure)
> line checks whether there are real pages left that would interfere with
> reclaim of the radix tree node under memory pressure.
> 
> While discussing ways how fuse might sneak pages into the radix tree
> past the workingset code, Miklos pointed to replace_page_cache_page(),
> and indeed there is a problem there: it properly accounts for the old
> page being removed - __delete_from_page_cache() does that - but then
> does a raw raw radix_tree_insert(), not accounting for the replacement
> page.  Eventually the page count bits in node->count underflow while
> leaving the node incorrectly linked to the shadow node LRU.
> 
> To address this, make sure replace_page_cache_page() uses the tracked
> page insertion code, page_cache_tree_insert().  This fixes the page
> accounting and makes sure page-containing nodes are properly unlinked
> from the shadow node LRU again.
> 
> Also, make the sanity checks a bit less obscure by using the helpers for
> checking the number of pages and shadows in a radix tree node.
> 
> Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
> Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
> Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
> Cc: <stable@vger.kernel.org>	[3.15+]
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/swap.h |  2 ++
>  mm/filemap.c         | 86 ++++++++++++++++++++++++++--------------------------
>  mm/workingset.c      | 10 +++---
>  3 files changed, 49 insertions(+), 49 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 7ba7dccaf0e7..b28de19aadbf 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -266,6 +266,7 @@ static inline void workingset_node_pages_inc(struct radix_tree_node *node)
>  
>  static inline void workingset_node_pages_dec(struct radix_tree_node *node)
>  {
> +	VM_BUG_ON(!workingset_node_pages(node));
>  	node->count--;
>  }

We should also pull 21f54ddae449 ("Using BUG_ON() as an assert() is
_never_ acceptable") into stable on top of this patch to replace the
BUG_ONs with warnings.

Otherwise this looks good to me.

  reply	other threads:[~2016-10-24 18:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24 15:26 [Stable 4.4 - NEEDS REVIEW - 0/3] mm: working set fixes Michal Hocko
2016-10-24 15:26 ` Michal Hocko
2016-10-24 15:26 ` [Stable 4.4 - NEEDS REVIEW - 1/3] mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() Michal Hocko
2016-10-24 15:26   ` Michal Hocko
2016-10-24 18:52   ` Johannes Weiner [this message]
2016-10-24 18:52     ` Johannes Weiner
2016-10-24 18:58     ` Michal Hocko
2016-10-24 18:58       ` Michal Hocko
2016-10-24 15:26 ` [Stable 4.4 - NEEDS REVIEW - 2/3] mm: filemap: don't plant shadow entries without radix tree node Michal Hocko
2016-10-24 15:26   ` Michal Hocko
2016-10-24 18:56   ` Johannes Weiner
2016-10-24 18:56     ` Johannes Weiner
2016-10-24 19:08     ` Michal Hocko
2016-10-24 19:08       ` Michal Hocko
2016-10-24 15:26 ` [Stable 4.4 - NEEDS REVIEW - 3/3] mm: filemap: fix mapping->nrpages double accounting in fuse Michal Hocko
2016-10-24 15:26   ` Michal Hocko
2016-10-24 18:56   ` Johannes Weiner
2016-10-24 18:56     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161024185223.GA28326@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=miklos@szeredi.hu \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=trapexit@spawn.link \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.