From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Michal Hocko <mhocko@suse.com>, Wei Wang <wei.w.wang@intel.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Subject: [PATCH 4.14 43/45] virtio_balloon: fix deadlock on OOM
Date: Thu, 11 Oct 2018 17:40:10 +0200 [thread overview]
Message-ID: <20181011152510.807351989@linuxfoundation.org> (raw)
In-Reply-To: <20181011152508.885515042@linuxfoundation.org>
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael S. Tsirkin <mst@redhat.com>
commit c7cdff0e864713a089d7cb3a2b1136ba9a54881a upstream.
fill_balloon doing memory allocations under balloon_lock
can cause a deadlock when leak_balloon is called from
virtballoon_oom_notify and tries to take same lock.
To fix, split page allocation and enqueue and do allocations outside the lock.
Here's a detailed analysis of the deadlock by Tetsuo Handa:
In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
serialize against fill_balloon(). But in fill_balloon(),
alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
is specified, this allocation attempt might indirectly depend on somebody
else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
__GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
will cause OOM lockup.
Thread1 Thread2
fill_balloon()
takes a balloon_lock
balloon_page_enqueue()
alloc_page(GFP_HIGHUSER_MOVABLE)
direct reclaim (__GFP_FS context) takes a fs lock
waits for that fs lock alloc_page(GFP_NOFS)
__alloc_pages_may_oom()
takes the oom_lock
out_of_memory()
blocking_notifier_call_chain()
leak_balloon()
tries to take that balloon_lock and deadlocks
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/virtio/virtio_balloon.c | 24 +++++++++++++++++++-----
include/linux/balloon_compaction.h | 35 ++++++++++++++++++++++++++++++++++-
mm/balloon_compaction.c | 28 +++++++++++++++++++++-------
3 files changed, 74 insertions(+), 13 deletions(-)
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -143,16 +143,17 @@ static void set_page_pfns(struct virtio_
static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
{
- struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
unsigned num_allocated_pages;
+ unsigned num_pfns;
+ struct page *page;
+ LIST_HEAD(pages);
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
- mutex_lock(&vb->balloon_lock);
- for (vb->num_pfns = 0; vb->num_pfns < num;
- vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- struct page *page = balloon_page_enqueue(vb_dev_info);
+ for (num_pfns = 0; num_pfns < num;
+ num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+ struct page *page = balloon_page_alloc();
if (!page) {
dev_info_ratelimited(&vb->vdev->dev,
@@ -162,6 +163,19 @@ static unsigned fill_balloon(struct virt
msleep(200);
break;
}
+
+ balloon_page_push(&pages, page);
+ }
+
+ mutex_lock(&vb->balloon_lock);
+
+ vb->num_pfns = 0;
+
+ while ((page = balloon_page_pop(&pages))) {
+ balloon_page_enqueue(&vb->vb_dev_info, page);
+
+ vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
+
set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
if (!virtio_has_feature(vb->vdev,
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -50,6 +50,7 @@
#include <linux/gfp.h>
#include <linux/err.h>
#include <linux/fs.h>
+#include <linux/list.h>
/*
* Balloon device information descriptor.
@@ -67,7 +68,9 @@ struct balloon_dev_info {
struct inode *inode;
};
-extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
+extern struct page *balloon_page_alloc(void);
+extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
+ struct page *page);
extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
@@ -193,4 +196,34 @@ static inline gfp_t balloon_mapping_gfp_
}
#endif /* CONFIG_BALLOON_COMPACTION */
+
+/*
+ * balloon_page_push - insert a page into a page list.
+ * @head : pointer to list
+ * @page : page to be added
+ *
+ * Caller must ensure the page is private and protect the list.
+ */
+static inline void balloon_page_push(struct list_head *pages, struct page *page)
+{
+ list_add(&page->lru, pages);
+}
+
+/*
+ * balloon_page_pop - remove a page from a page list.
+ * @head : pointer to list
+ * @page : page to be added
+ *
+ * Caller must ensure the page is private and protect the list.
+ */
+static inline struct page *balloon_page_pop(struct list_head *pages)
+{
+ struct page *page = list_first_entry_or_null(pages, struct page, lru);
+
+ if (!page)
+ return NULL;
+
+ list_del(&page->lru);
+ return page;
+}
#endif /* _LINUX_BALLOON_COMPACTION_H */
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -11,22 +11,37 @@
#include <linux/balloon_compaction.h>
/*
+ * balloon_page_alloc - allocates a new page for insertion into the balloon
+ * page list.
+ *
+ * Driver must call it to properly allocate a new enlisted balloon page.
+ * Driver must call balloon_page_enqueue before definitively removing it from
+ * the guest system. This function returns the page address for the recently
+ * allocated page or NULL in the case we fail to allocate a new page this turn.
+ */
+struct page *balloon_page_alloc(void)
+{
+ struct page *page = alloc_page(balloon_mapping_gfp_mask() |
+ __GFP_NOMEMALLOC | __GFP_NORETRY);
+ return page;
+}
+EXPORT_SYMBOL_GPL(balloon_page_alloc);
+
+/*
* balloon_page_enqueue - allocates a new page and inserts it into the balloon
* page list.
* @b_dev_info: balloon device descriptor where we will insert a new page to
+ * @page: new page to enqueue - allocated using balloon_page_alloc.
*
- * Driver must call it to properly allocate a new enlisted balloon page
+ * Driver must call it to properly enqueue a new allocated balloon page
* before definitively removing it from the guest system.
* This function returns the page address for the recently enqueued page or
* NULL in the case we fail to allocate a new page this turn.
*/
-struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info)
+void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
+ struct page *page)
{
unsigned long flags;
- struct page *page = alloc_page(balloon_mapping_gfp_mask() |
- __GFP_NOMEMALLOC | __GFP_NORETRY);
- if (!page)
- return NULL;
/*
* Block others from accessing the 'page' when we get around to
@@ -39,7 +54,6 @@ struct page *balloon_page_enqueue(struct
__count_vm_event(BALLOON_INFLATE);
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
unlock_page(page);
- return page;
}
EXPORT_SYMBOL_GPL(balloon_page_enqueue);
next prev parent reply other threads:[~2018-10-11 15:47 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-11 15:39 [PATCH 4.14 00/45] 4.14.76-stable review Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 01/45] perf/core: Add sanity check to deal with pinned event failure Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 02/45] mm: migration: fix migration of huge PMD shared pages Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 03/45] mm, thp: fix mlocking THP page with migration enabled Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 04/45] mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 05/45] KVM: x86: fix L1TFs MMIO GFN calculation Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 06/45] blk-mq: I/O and timer unplugs are inverted in blktrace Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 07/45] clocksource/drivers/timer-atmel-pit: Properly handle error cases Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 08/45] fbdev/omapfb: fix omapfb_memory_read infoleak Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 09/45] xen-netback: fix input validation in xenvif_set_hash_mapping() Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 10/45] drm/amdgpu: Fix vce work queue was not cancelled when suspend Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 11/45] drm/syncobj: Dont leak fences when WAIT_FOR_SUBMIT is set Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 12/45] x86/vdso: Fix asm constraints on vDSO syscall fallbacks Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 13/45] selftests/x86: Add clock_gettime() tests to test_vdso Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 14/45] x86/vdso: Only enable vDSO retpolines when enabled and supported Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 15/45] x86/vdso: Fix vDSO syscall fallback asm constraint regression Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 16/45] PCI: Reprogram bridge prefetch registers on resume Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 17/45] mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 18/45] PM / core: Clear the direct_complete flag on errors Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 19/45] dm cache metadata: ignore hints array being too small during resize Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 20/45] dm cache: fix resize crash if user doesnt reload cache table Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 21/45] xhci: Add missing CAS workaround for Intel Sunrise Point xHCI Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 22/45] usb: xhci-mtk: resume USB3 roothub first Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 23/45] USB: serial: simple: add Motorola Tetra MTP6550 id Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 24/45] usb: cdc_acm: Do not leak URB buffers Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 25/45] tty: Drop tty->count on tty_reopen() failure Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 26/45] of: unittest: Disable interrupt node tests for old world MAC systems Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 27/45] perf annotate: Use asprintf when formatting objdump command line Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 28/45] perf tools: Fix python extension build for gcc 8 Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 29/45] cgroup/cpuset: remove circular dependency deadlock Greg Kroah-Hartman
2018-10-11 19:33 ` Sudip Mukherjee
2018-10-12 11:05 ` Greg Kroah-Hartman
2018-10-16 18:46 ` Amit Pundir
2018-10-11 15:39 ` [PATCH 4.14 30/45] ath10k: fix use-after-free in ath10k_wmi_cmd_send_nowait Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 31/45] ath10k: fix kernel panic issue during pci probe Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 32/45] nvme_fc: fix ctrl create failures racing with workq items Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 33/45] powerpc/lib/code-patching: refactor patch_instruction() Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 34/45] powerpc: Avoid code patching freed init sections Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 35/45] powerpc/lib: fix book3s/32 boot failure due to code patching Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 36/45] ARC: clone syscall to setp r25 as thread pointer Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 37/45] crypto: chelsio - Fix memory corruption in DMA Mapped buffers Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 38/45] perf utils: Move is_directory() to path.h Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 39/45] f2fs: fix invalid memory access Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 40/45] ucma: fix a use-after-free in ucma_resolve_ip() Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 41/45] ubifs: Check for name being NULL while mounting Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 42/45] rds: rds_ib_recv_alloc_cache() should call alloc_percpu_gfp() instead Greg Kroah-Hartman
2018-10-11 15:40 ` Greg Kroah-Hartman [this message]
2018-10-11 15:40 ` [PATCH 4.14 44/45] virtio_balloon: fix increment of vb->num_pfns in fill_balloon() Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 45/45] ath10k: fix scan crash due to incorrect length calculation Greg Kroah-Hartman
2018-10-11 22:37 ` [PATCH 4.14 00/45] 4.14.76-stable review Shuah Khan
2018-10-12 4:27 ` Naresh Kamboju
2018-10-12 7:50 ` Jon Hunter
2018-10-12 7:50 ` Jon Hunter
2018-10-12 10:24 ` Greg Kroah-Hartman
2018-10-12 15:43 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181011152510.807351989@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=mst@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=stable@vger.kernel.org \
--cc=sudipm.mukherjee@gmail.com \
--cc=wei.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.