From: Greg KH <gregkh@linuxfoundation.org>
To: henryburns@google.com, akpm@linux-foundation.org,
henrywolfeburns@gmail.com, jwadams@google.com,
minchan@kernel.org, sergey.senozhatsky@gmail.com,
shakeelb@google.com, stable@vger.kernel.org,
torvalds@linux-foundation.org
Subject: Re: FAILED: patch "[PATCH] mm/zsmalloc.c: fix race condition in zs_destroy_pool" failed to apply to 4.9-stable tree
Date: Mon, 26 Aug 2019 21:02:47 +0200 [thread overview]
Message-ID: <20190826190247.GB24930@kroah.com> (raw)
In-Reply-To: <156684443123212@kroah.com>
On Mon, Aug 26, 2019 at 08:33:51PM +0200, gregkh@linuxfoundation.org wrote:
>
> The patch below does not apply to the 4.9-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.
It actually applies, but it breaks the build :(
> ------------------ original commit in Linus's tree ------------------
>
> >From 701d678599d0c1623aaf4139c03eea260a75b027 Mon Sep 17 00:00:00 2001
> From: Henry Burns <henryburns@google.com>
> Date: Sat, 24 Aug 2019 17:55:06 -0700
> Subject: [PATCH] mm/zsmalloc.c: fix race condition in zs_destroy_pool
>
> In zs_destroy_pool() we call flush_work(&pool->free_work). However, we
> have no guarantee that migration isn't happening in the background at
> that time.
>
> Since migration can't directly free pages, it relies on free_work being
> scheduled to free the pages. But there's nothing preventing an
> in-progress migrate from queuing the work *after*
> zs_unregister_migration() has called flush_work(). Which would mean
> pages still pointing at the inode when we free it.
>
> Since we know at destroy time all objects should be free, no new
> migrations can come in (since zs_page_isolate() fails for fully-free
> zspages). This means it is sufficient to track a "# isolated zspages"
> count by class, and have the destroy logic ensure all such pages have
> drained before proceeding. Keeping that state under the class spinlock
> keeps the logic straightforward.
>
> In this case a memory leak could lead to an eventual crash if compaction
> hits the leaked page. This crash would only occur if people are
> changing their zswap backend at runtime (which eventually starts
> destruction).
>
> Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
> Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
> Signed-off-by: Henry Burns <henryburns@google.com>
> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Henry Burns <henrywolfeburns@gmail.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Jonathan Adams <jwadams@google.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 5105b9b66653..08def3a0d200 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -54,6 +54,7 @@
> #include <linux/mount.h>
> #include <linux/pseudo_fs.h>
> #include <linux/migrate.h>
> +#include <linux/wait.h>
> #include <linux/pagemap.h>
> #include <linux/fs.h>
>
> @@ -268,6 +269,10 @@ struct zs_pool {
> #ifdef CONFIG_COMPACTION
> struct inode *inode;
> struct work_struct free_work;
> + /* A wait queue for when migration races with async_free_zspage() */
> + struct wait_queue_head migration_wait;
> + atomic_long_t isolated_pages;
> + bool destroying;
> #endif
> };
>
> @@ -1874,6 +1879,19 @@ static void putback_zspage_deferred(struct zs_pool *pool,
>
> }
>
> +static inline void zs_pool_dec_isolated(struct zs_pool *pool)
> +{
> + VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0);
> + atomic_long_dec(&pool->isolated_pages);
> + /*
> + * There's no possibility of racing, since wait_for_isolated_drain()
> + * checks the isolated count under &class->lock after enqueuing
> + * on migration_wait.
> + */
> + if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying)
> + wake_up_all(&pool->migration_wait);
> +}
> +
> static void replace_sub_page(struct size_class *class, struct zspage *zspage,
> struct page *newpage, struct page *oldpage)
> {
> @@ -1943,6 +1961,7 @@ static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
> */
> if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) {
> get_zspage_mapping(zspage, &class_idx, &fullness);
> + atomic_long_inc(&pool->isolated_pages);
> remove_zspage(class, zspage, fullness);
> }
>
> @@ -2042,8 +2061,16 @@ static int zs_page_migrate(struct address_space *mapping, struct page *newpage,
> * Page migration is done so let's putback isolated zspage to
> * the list if @page is final isolated subpage in the zspage.
> */
> - if (!is_zspage_isolated(zspage))
> + if (!is_zspage_isolated(zspage)) {
> + /*
> + * We cannot race with zs_destroy_pool() here because we wait
> + * for isolation to hit zero before we start destroying.
> + * Also, we ensure that everyone can see pool->destroying before
> + * we start waiting.
> + */
> putback_zspage_deferred(pool, class, zspage);
> + zs_pool_dec_isolated(pool);
> + }
>
> reset_page(page);
> put_page(page);
> @@ -2094,8 +2121,8 @@ static void zs_page_putback(struct page *page)
> * so let's defer.
> */
> putback_zspage_deferred(pool, class, zspage);
> + zs_pool_dec_isolated(pool);
> }
> -
> spin_unlock(&class->lock);
> }
>
> @@ -2118,8 +2145,36 @@ static int zs_register_migration(struct zs_pool *pool)
> return 0;
> }
>
> +static bool pool_isolated_are_drained(struct zs_pool *pool)
> +{
> + return atomic_long_read(&pool->isolated_pages) == 0;
> +}
> +
> +/* Function for resolving migration */
> +static void wait_for_isolated_drain(struct zs_pool *pool)
> +{
> +
> + /*
> + * We're in the process of destroying the pool, so there are no
> + * active allocations. zs_page_isolate() fails for completely free
> + * zspages, so we need only wait for the zs_pool's isolated
> + * count to hit zero.
> + */
> + wait_event(pool->migration_wait,
> + pool_isolated_are_drained(pool));
> +}
> +
> static void zs_unregister_migration(struct zs_pool *pool)
> {
> + pool->destroying = true;
> + /*
> + * We need a memory barrier here to ensure global visibility of
> + * pool->destroying. Thus pool->isolated pages will either be 0 in which
> + * case we don't care, or it will be > 0 and pool->destroying will
> + * ensure that we wake up once isolation hits 0.
> + */
> + smp_mb();
> + wait_for_isolated_drain(pool); /* This can block */
> flush_work(&pool->free_work);
> iput(pool->inode);
> }
> @@ -2357,6 +2412,8 @@ struct zs_pool *zs_create_pool(const char *name)
> if (!pool->name)
> goto err;
>
> + init_waitqueue_head(&pool->migration_wait);
> +
> if (create_cache(pool))
> goto err;
>
>
next prev parent reply other threads:[~2019-08-26 19:02 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-26 18:33 FAILED: patch "[PATCH] mm/zsmalloc.c: fix race condition in zs_destroy_pool" failed to apply to 4.9-stable tree gregkh
2019-08-26 19:02 ` Greg KH [this message]
2019-09-03 3:14 ` Sasha Levin
2019-09-03 6:21 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190826190247.GB24930@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=henryburns@google.com \
--cc=henrywolfeburns@gmail.com \
--cc=jwadams@google.com \
--cc=minchan@kernel.org \
--cc=sergey.senozhatsky@gmail.com \
--cc=shakeelb@google.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).