From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Henry Burns <henryburns@google.com>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Henry Burns <henrywolfeburns@gmail.com>,
Minchan Kim <minchan@kernel.org>,
Shakeel Butt <shakeelb@google.com>,
Jonathan Adams <jwadams@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 64/83] mm/zsmalloc.c: fix race condition in zs_destroy_pool
Date: Wed, 4 Sep 2019 19:53:56 +0200 [thread overview]
Message-ID: <20190904175309.258095111@linuxfoundation.org> (raw)
In-Reply-To: <20190904175303.488266791@linuxfoundation.org>
[ Upstream commit 701d678599d0c1623aaf4139c03eea260a75b027 ]
In zs_destroy_pool() we call flush_work(&pool->free_work). However, we
have no guarantee that migration isn't happening in the background at
that time.
Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages. But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work(). Which would mean
pages still pointing at the inode when we free it.
Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages). This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding. Keeping that state under the class spinlock
keeps the logic straightforward.
In this case a memory leak could lead to an eventual crash if compaction
hits the leaked page. This crash would only occur if people are
changing their zswap backend at runtime (which eventually starts
destruction).
Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/zsmalloc.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 59 insertions(+), 2 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f624cc2d91d98..ad8a34bd15ca7 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -52,6 +52,7 @@
#include <linux/zpool.h>
#include <linux/mount.h>
#include <linux/migrate.h>
+#include <linux/wait.h>
#include <linux/pagemap.h>
#define ZSPAGE_MAGIC 0x58
@@ -265,6 +266,10 @@ struct zs_pool {
#ifdef CONFIG_COMPACTION
struct inode *inode;
struct work_struct free_work;
+ /* A wait queue for when migration races with async_free_zspage() */
+ wait_queue_head_t migration_wait;
+ atomic_long_t isolated_pages;
+ bool destroying;
#endif
};
@@ -1951,6 +1956,19 @@ static void putback_zspage_deferred(struct zs_pool *pool,
}
+static inline void zs_pool_dec_isolated(struct zs_pool *pool)
+{
+ VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0);
+ atomic_long_dec(&pool->isolated_pages);
+ /*
+ * There's no possibility of racing, since wait_for_isolated_drain()
+ * checks the isolated count under &class->lock after enqueuing
+ * on migration_wait.
+ */
+ if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying)
+ wake_up_all(&pool->migration_wait);
+}
+
static void replace_sub_page(struct size_class *class, struct zspage *zspage,
struct page *newpage, struct page *oldpage)
{
@@ -2020,6 +2038,7 @@ bool zs_page_isolate(struct page *page, isolate_mode_t mode)
*/
if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) {
get_zspage_mapping(zspage, &class_idx, &fullness);
+ atomic_long_inc(&pool->isolated_pages);
remove_zspage(class, zspage, fullness);
}
@@ -2108,8 +2127,16 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage,
* Page migration is done so let's putback isolated zspage to
* the list if @page is final isolated subpage in the zspage.
*/
- if (!is_zspage_isolated(zspage))
+ if (!is_zspage_isolated(zspage)) {
+ /*
+ * We cannot race with zs_destroy_pool() here because we wait
+ * for isolation to hit zero before we start destroying.
+ * Also, we ensure that everyone can see pool->destroying before
+ * we start waiting.
+ */
putback_zspage_deferred(pool, class, zspage);
+ zs_pool_dec_isolated(pool);
+ }
reset_page(page);
put_page(page);
@@ -2161,8 +2188,8 @@ void zs_page_putback(struct page *page)
* so let's defer.
*/
putback_zspage_deferred(pool, class, zspage);
+ zs_pool_dec_isolated(pool);
}
-
spin_unlock(&class->lock);
}
@@ -2185,8 +2212,36 @@ static int zs_register_migration(struct zs_pool *pool)
return 0;
}
+static bool pool_isolated_are_drained(struct zs_pool *pool)
+{
+ return atomic_long_read(&pool->isolated_pages) == 0;
+}
+
+/* Function for resolving migration */
+static void wait_for_isolated_drain(struct zs_pool *pool)
+{
+
+ /*
+ * We're in the process of destroying the pool, so there are no
+ * active allocations. zs_page_isolate() fails for completely free
+ * zspages, so we need only wait for the zs_pool's isolated
+ * count to hit zero.
+ */
+ wait_event(pool->migration_wait,
+ pool_isolated_are_drained(pool));
+}
+
static void zs_unregister_migration(struct zs_pool *pool)
{
+ pool->destroying = true;
+ /*
+ * We need a memory barrier here to ensure global visibility of
+ * pool->destroying. Thus pool->isolated pages will either be 0 in which
+ * case we don't care, or it will be > 0 and pool->destroying will
+ * ensure that we wake up once isolation hits 0.
+ */
+ smp_mb();
+ wait_for_isolated_drain(pool); /* This can block */
flush_work(&pool->free_work);
iput(pool->inode);
}
@@ -2433,6 +2488,8 @@ struct zs_pool *zs_create_pool(const char *name)
if (!pool->name)
goto err;
+ init_waitqueue_head(&pool->migration_wait);
+
if (create_cache(pool))
goto err;
--
2.20.1
next prev parent reply other threads:[~2019-09-04 18:01 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-04 17:52 [PATCH 4.9 00/83] 4.9.191-stable review Greg Kroah-Hartman
2019-09-04 17:52 ` [PATCH 4.9 01/83] HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT Greg Kroah-Hartman
2019-09-04 17:52 ` [PATCH 4.9 02/83] MIPS: kernel: only use i8253 clocksource with periodic clockevent Greg Kroah-Hartman
2019-09-04 17:52 ` [PATCH 4.9 03/83] netfilter: ebtables: fix a memory leak bug in compat Greg Kroah-Hartman
2019-09-04 17:52 ` [PATCH 4.9 04/83] ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks Greg Kroah-Hartman
2019-09-04 17:52 ` [PATCH 4.9 05/83] bonding: Force slave speed check after link state recovery for 802.3ad Greg Kroah-Hartman
2019-09-04 17:52 ` [PATCH 4.9 06/83] can: dev: call netif_carrier_off() in register_candev() Greg Kroah-Hartman
2019-09-04 17:52 ` [PATCH 4.9 07/83] ASoC: Fail card instantiation if DAI format setup fails Greg Kroah-Hartman
2019-09-04 18:10 ` Mark Brown
2019-09-04 18:35 ` Greg Kroah-Hartman
2019-09-04 19:05 ` Mark Brown
2019-09-05 18:56 ` Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 08/83] st21nfca_connectivity_event_received: null check the allocation Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 09/83] st_nci_hci_connectivity_event_received: " Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 10/83] ASoC: ti: davinci-mcasp: Correct slot_width posed constraint Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 11/83] net: usb: qmi_wwan: Add the BroadMobi BM818 card Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 12/83] isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain() Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 13/83] isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 14/83] perf bench numa: Fix cpu0 binding Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 15/83] can: sja1000: force the string buffer NULL-terminated Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 16/83] can: peak_usb: " Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 17/83] NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim() Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 18/83] HID: input: fix a4tech horizontal wheel custom usage Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 19/83] net: cxgb3_main: Fix a resource leak in a error path in init_one() Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 20/83] net: hisilicon: make hip04_tx_reclaim non-reentrant Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 21/83] net: hisilicon: fix hip04-xmit never return TX_BUSY Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 22/83] net: hisilicon: Fix dma_map_single failed on arm64 Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 23/83] libata: add SG safety checks in SFF pio transfers Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 24/83] x86/lib/cpu: Address missing prototypes warning Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 25/83] drm/vmwgfx: fix memory leak when too many retries have occurred Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 26/83] perf pmu-events: Fix missing "cpu_clk_unhalted.core" event Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 27/83] selftests: kvm: Adding config fragments Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 28/83] HID: wacom: correct misreported EKR ring values Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 29/83] HID: wacom: Correct distance scale for 2nd-gen Intuos devices Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 30/83] Revert "dm bufio: fix deadlock with loop device" Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 31/83] gpiolib: never report open-drain/source lines as input to user-space Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 32/83] userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 33/83] x86/retpoline: Dont clobber RFLAGS during CALL_NOSPEC on i386 Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 34/83] x86/apic: Handle missing global clockevent gracefully Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 35/83] x86/boot: Save fields explicitly, zero out everything else Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 36/83] x86/boot: Fix boot regression caused by bootparam sanitizing Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 37/83] dm btree: fix order of block initialization in btree_split_beneath Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 38/83] dm space map metadata: fix missing store of apply_bops() return value Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 39/83] dm table: fix invalid memory accesses with too high sector number Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 40/83] genirq: Properly pair kobject_del() with kobject_add() Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 41/83] mm, page_owner: handle THP splits correctly Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 42/83] mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 43/83] xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 44/83] Revert "perf test 6: Fix missing kvm module load for s390" Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 45/83] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 46/83] dmaengine: ste_dma40: fix unneeded variable warning Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 47/83] iommu/dma: Handle SG length overflow better Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 48/83] usb: gadget: composite: Clear "suspended" on reset/disconnect Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 49/83] xen/blkback: fix memory leaks Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 50/83] i2c: emev2: avoid race when unregistering slave client Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 51/83] usb: host: fotg2: restart hcd after port reset Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 52/83] tools: hv: fix KVP and VSS daemons exit code Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 53/83] watchdog: bcm2835_wdt: Fix module autoload Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 54/83] scsi: ufs: Fix RX_TERMINATION_FORCE_ENABLE define value Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 55/83] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 56/83] ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 57/83] ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 58/83] tcp: make sure EPOLLOUT wont be missed Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 59/83] ALSA: line6: Fix memory leak at line6_init_pcm() error path Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 60/83] ALSA: seq: Fix potential concurrent access to the deleted pool Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 61/83] KVM: x86: Dont update RIP or do single-step on faulting emulation Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 62/83] x86/apic: Do not initialize LDR and DFR for bigsmp Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 63/83] x86/apic: Include the LDR when clearing out APIC registers Greg Kroah-Hartman
2019-09-04 17:53 ` Greg Kroah-Hartman [this message]
2019-09-04 17:53 ` [PATCH 4.9 65/83] usb-storage: Add new JMS567 revision to unusual_devs Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 66/83] USB: cdc-wdm: fix race between write and disconnect due to flag abuse Greg Kroah-Hartman
2019-09-04 17:53 ` [PATCH 4.9 67/83] usb: chipidea: udc: dont do hardware access if gadget has stopped Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 68/83] usb: host: ohci: fix a race condition between shutdown and irq Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 69/83] usb: host: xhci: rcar: Fix typo in compatible string matching Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 70/83] USB: storage: ums-realtek: Update module parameter description for auto_delink_en Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 71/83] USB: storage: ums-realtek: Whitelist auto-delink support Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 72/83] ptrace,x86: Make user_64bit_mode() available to 32-bit builds Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 73/83] uprobes/x86: Fix detection of 32-bit user mode Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 74/83] mmc: sdhci-of-at91: add quirk for broken HS200 Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 75/83] mmc: core: Fix init of SD cards reporting an invalid VDD range Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 76/83] stm class: Fix a double free of stm_source_device Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 77/83] VMCI: Release resource if the work is already queued Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 78/83] Revert "cfg80211: fix processing world regdomain when non modular" Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 79/83] mac80211: fix possible sta leak Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 80/83] KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is long Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 81/83] KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 82/83] i2c: piix4: Fix port selection for AMD Family 16h Model 30h Greg Kroah-Hartman
2019-09-04 17:54 ` [PATCH 4.9 83/83] x86/ptrace: fix up botched merge of spectrev1 fix Greg Kroah-Hartman
2019-09-05 3:38 ` [PATCH 4.9 00/83] 4.9.191-stable review kernelci.org bot
2019-09-05 14:33 ` shuah
2019-09-05 16:55 ` Guenter Roeck
2019-09-05 17:26 ` Daniel Díaz
2019-09-05 19:53 ` Kelsey Skunberg
2019-09-06 7:36 ` Jon Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190904175309.258095111@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=henryburns@google.com \
--cc=henrywolfeburns@gmail.com \
--cc=jwadams@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=minchan@kernel.org \
--cc=sashal@kernel.org \
--cc=sergey.senozhatsky@gmail.com \
--cc=shakeelb@google.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).