From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 150F8C3A59F for ; Mon, 26 Aug 2019 19:02:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D92FD21848 for ; Mon, 26 Aug 2019 19:02:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1566846172; bh=omxtfLtd/IDEk8FXD3GfAIpsIYEgH+RYCTrdf2tgP6Y=; h=Date:From:To:Subject:References:In-Reply-To:List-ID:From; b=d03UmBskvSHF4QvqrrV005Q997OnM2xPeiGVoqxlCb08gH+489LIBiScMxySZc66z Obvnnlw/fFCrnAH1xuUhx49PJqSiJFNW0jkTACxVCav0Nr1eII9oyz2/S0Ac8WAwMm FLEBlLSS11C1Bf0sngRFjUOcNy72HjCtB3ReKOkw= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727555AbfHZTCv (ORCPT ); Mon, 26 Aug 2019 15:02:51 -0400 Received: from mail.kernel.org ([198.145.29.99]:49296 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727226AbfHZTCv (ORCPT ); Mon, 26 Aug 2019 15:02:51 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2A3612070B; Mon, 26 Aug 2019 19:02:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1566846169; bh=omxtfLtd/IDEk8FXD3GfAIpsIYEgH+RYCTrdf2tgP6Y=; h=Date:From:To:Subject:References:In-Reply-To:From; b=Tadiyj3HrCvZQxD9lEH+TgAnoGfIasg8PypHvslDIBgcArKDwUS+f825vmhO0/KZ3 QzZIFI/KU2InZ6RLFOR0EFswI4TEzldfcfgutGrA4cq3yIfOfUftD097ljNUZCRdKi QxDjQURRO6KIQ5mKdbg1lu1yqTPlHX9jNxxxBhIQ= Date: Mon, 26 Aug 2019 21:02:47 +0200 From: Greg KH To: henryburns@google.com, akpm@linux-foundation.org, henrywolfeburns@gmail.com, jwadams@google.com, minchan@kernel.org, sergey.senozhatsky@gmail.com, shakeelb@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: Re: FAILED: patch "[PATCH] mm/zsmalloc.c: fix race condition in zs_destroy_pool" failed to apply to 4.9-stable tree Message-ID: <20190826190247.GB24930@kroah.com> References: <156684443123212@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <156684443123212@kroah.com> User-Agent: Mutt/1.12.1 (2019-06-15) Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org On Mon, Aug 26, 2019 at 08:33:51PM +0200, gregkh@linuxfoundation.org wrote: > > The patch below does not apply to the 4.9-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git commit > id to . It actually applies, but it breaks the build :( > ------------------ original commit in Linus's tree ------------------ > > >From 701d678599d0c1623aaf4139c03eea260a75b027 Mon Sep 17 00:00:00 2001 > From: Henry Burns > Date: Sat, 24 Aug 2019 17:55:06 -0700 > Subject: [PATCH] mm/zsmalloc.c: fix race condition in zs_destroy_pool > > In zs_destroy_pool() we call flush_work(&pool->free_work). However, we > have no guarantee that migration isn't happening in the background at > that time. > > Since migration can't directly free pages, it relies on free_work being > scheduled to free the pages. But there's nothing preventing an > in-progress migrate from queuing the work *after* > zs_unregister_migration() has called flush_work(). Which would mean > pages still pointing at the inode when we free it. > > Since we know at destroy time all objects should be free, no new > migrations can come in (since zs_page_isolate() fails for fully-free > zspages). This means it is sufficient to track a "# isolated zspages" > count by class, and have the destroy logic ensure all such pages have > drained before proceeding. Keeping that state under the class spinlock > keeps the logic straightforward. > > In this case a memory leak could lead to an eventual crash if compaction > hits the leaked page. This crash would only occur if people are > changing their zswap backend at runtime (which eventually starts > destruction). > > Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com > Fixes: 48b4800a1c6a ("zsmalloc: page migration support") > Signed-off-by: Henry Burns > Reviewed-by: Sergey Senozhatsky > Cc: Henry Burns > Cc: Minchan Kim > Cc: Shakeel Butt > Cc: Jonathan Adams > Cc: > Signed-off-by: Andrew Morton > Signed-off-by: Linus Torvalds > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 5105b9b66653..08def3a0d200 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -54,6 +54,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -268,6 +269,10 @@ struct zs_pool { > #ifdef CONFIG_COMPACTION > struct inode *inode; > struct work_struct free_work; > + /* A wait queue for when migration races with async_free_zspage() */ > + struct wait_queue_head migration_wait; > + atomic_long_t isolated_pages; > + bool destroying; > #endif > }; > > @@ -1874,6 +1879,19 @@ static void putback_zspage_deferred(struct zs_pool *pool, > > } > > +static inline void zs_pool_dec_isolated(struct zs_pool *pool) > +{ > + VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0); > + atomic_long_dec(&pool->isolated_pages); > + /* > + * There's no possibility of racing, since wait_for_isolated_drain() > + * checks the isolated count under &class->lock after enqueuing > + * on migration_wait. > + */ > + if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying) > + wake_up_all(&pool->migration_wait); > +} > + > static void replace_sub_page(struct size_class *class, struct zspage *zspage, > struct page *newpage, struct page *oldpage) > { > @@ -1943,6 +1961,7 @@ static bool zs_page_isolate(struct page *page, isolate_mode_t mode) > */ > if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) { > get_zspage_mapping(zspage, &class_idx, &fullness); > + atomic_long_inc(&pool->isolated_pages); > remove_zspage(class, zspage, fullness); > } > > @@ -2042,8 +2061,16 @@ static int zs_page_migrate(struct address_space *mapping, struct page *newpage, > * Page migration is done so let's putback isolated zspage to > * the list if @page is final isolated subpage in the zspage. > */ > - if (!is_zspage_isolated(zspage)) > + if (!is_zspage_isolated(zspage)) { > + /* > + * We cannot race with zs_destroy_pool() here because we wait > + * for isolation to hit zero before we start destroying. > + * Also, we ensure that everyone can see pool->destroying before > + * we start waiting. > + */ > putback_zspage_deferred(pool, class, zspage); > + zs_pool_dec_isolated(pool); > + } > > reset_page(page); > put_page(page); > @@ -2094,8 +2121,8 @@ static void zs_page_putback(struct page *page) > * so let's defer. > */ > putback_zspage_deferred(pool, class, zspage); > + zs_pool_dec_isolated(pool); > } > - > spin_unlock(&class->lock); > } > > @@ -2118,8 +2145,36 @@ static int zs_register_migration(struct zs_pool *pool) > return 0; > } > > +static bool pool_isolated_are_drained(struct zs_pool *pool) > +{ > + return atomic_long_read(&pool->isolated_pages) == 0; > +} > + > +/* Function for resolving migration */ > +static void wait_for_isolated_drain(struct zs_pool *pool) > +{ > + > + /* > + * We're in the process of destroying the pool, so there are no > + * active allocations. zs_page_isolate() fails for completely free > + * zspages, so we need only wait for the zs_pool's isolated > + * count to hit zero. > + */ > + wait_event(pool->migration_wait, > + pool_isolated_are_drained(pool)); > +} > + > static void zs_unregister_migration(struct zs_pool *pool) > { > + pool->destroying = true; > + /* > + * We need a memory barrier here to ensure global visibility of > + * pool->destroying. Thus pool->isolated pages will either be 0 in which > + * case we don't care, or it will be > 0 and pool->destroying will > + * ensure that we wake up once isolation hits 0. > + */ > + smp_mb(); > + wait_for_isolated_drain(pool); /* This can block */ > flush_work(&pool->free_work); > iput(pool->inode); > } > @@ -2357,6 +2412,8 @@ struct zs_pool *zs_create_pool(const char *name) > if (!pool->name) > goto err; > > + init_waitqueue_head(&pool->migration_wait); > + > if (create_cache(pool)) > goto err; > >