* [PATCH v3] virtio_balloon: fix deadlock on OOM
@ 2017-11-08 15:17 Michael S. Tsirkin
[not found] ` <87o9nid3zn.fsf@concordia.ellerman.id.au>
0 siblings, 1 reply; 3+ messages in thread
From: Michael S. Tsirkin @ 2017-11-08 15:17 UTC (permalink / raw)
To: linux-kernel
Cc: Tetsuo Handa, Michal Hocko, Wei Wang, Jason Wang, virtualization,
linux-mm
fill_balloon doing memory allocations under balloon_lock
can cause a deadlock when leak_balloon is called from
virtballoon_oom_notify and tries to take same lock.
To fix, split page allocation and enqueue and do allocations outside the lock.
Here's a detailed analysis of the deadlock by Tetsuo Handa:
In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
serialize against fill_balloon(). But in fill_balloon(),
alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
is specified, this allocation attempt might indirectly depend on somebody
else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
__GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
will cause OOM lockup.
Thread1 Thread2
fill_balloon()
takes a balloon_lock
balloon_page_enqueue()
alloc_page(GFP_HIGHUSER_MOVABLE)
direct reclaim (__GFP_FS context) takes a fs lock
waits for that fs lock alloc_page(GFP_NOFS)
__alloc_pages_may_oom()
takes the oom_lock
out_of_memory()
blocking_notifier_call_chain()
leak_balloon()
tries to take that balloon_lock and deadlocks
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v2:
drop an unused declaration, reported by Tetsuo Hocko
Changes from v1:
fix a build warning
drivers/virtio/virtio_balloon.c | 24 +++++++++++++++++++-----
include/linux/balloon_compaction.h | 35 ++++++++++++++++++++++++++++++++++-
mm/balloon_compaction.c | 28 +++++++++++++++++++++-------
3 files changed, 74 insertions(+), 13 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f0b3a0b..7960746 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -143,16 +143,17 @@ static void set_page_pfns(struct virtio_balloon *vb,
static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
{
- struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
unsigned num_allocated_pages;
+ unsigned num_pfns;
+ struct page *page;
+ LIST_HEAD(pages);
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
- mutex_lock(&vb->balloon_lock);
- for (vb->num_pfns = 0; vb->num_pfns < num;
- vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- struct page *page = balloon_page_enqueue(vb_dev_info);
+ for (num_pfns = 0; num_pfns < num;
+ num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+ struct page *page = balloon_page_alloc();
if (!page) {
dev_info_ratelimited(&vb->vdev->dev,
@@ -162,6 +163,19 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
msleep(200);
break;
}
+
+ balloon_page_push(&pages, page);
+ }
+
+ mutex_lock(&vb->balloon_lock);
+
+ vb->num_pfns = 0;
+
+ while ((page = balloon_page_pop(&pages))) {
+ balloon_page_enqueue(&vb->vb_dev_info, page);
+
+ vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
+
set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
if (!virtio_has_feature(vb->vdev,
diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index fbbe6da..c4c8df9 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -50,6 +50,7 @@
#include <linux/gfp.h>
#include <linux/err.h>
#include <linux/fs.h>
+#include <linux/list.h>
/*
* Balloon device information descriptor.
@@ -67,7 +68,9 @@ struct balloon_dev_info {
struct inode *inode;
};
-extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
+extern struct page *balloon_page_alloc(void);
+extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
+ struct page *page);
extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
@@ -89,6 +92,36 @@ extern int balloon_page_migrate(struct address_space *mapping,
struct page *page, enum migrate_mode mode);
/*
+ * balloon_page_push - insert a page into a page list.
+ * @head : pointer to list
+ * @page : page to be added
+ *
+ * Caller must ensure the page is private and protect the list.
+ */
+static inline void balloon_page_push(struct list_head *pages, struct page *page)
+{
+ list_add(&page->lru, pages);
+}
+
+/*
+ * balloon_page_pop - remove a page from a page list.
+ * @head : pointer to list
+ * @page : page to be added
+ *
+ * Caller must ensure the page is private and protect the list.
+ */
+static inline struct page *balloon_page_pop(struct list_head *pages)
+{
+ struct page *page = list_first_entry_or_null(pages, struct page, lru);
+
+ if (!page)
+ return NULL;
+
+ list_del(&page->lru);
+ return page;
+}
+
+/*
* balloon_page_insert - insert a page into the balloon's page list and make
* the page->private assignment accordingly.
* @balloon : pointer to balloon device
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 68d2892..ef858d5 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -11,22 +11,37 @@
#include <linux/balloon_compaction.h>
/*
+ * balloon_page_alloc - allocates a new page for insertion into the balloon
+ * page list.
+ *
+ * Driver must call it to properly allocate a new enlisted balloon page.
+ * Driver must call balloon_page_enqueue before definitively removing it from
+ * the guest system. This function returns the page address for the recently
+ * allocated page or NULL in the case we fail to allocate a new page this turn.
+ */
+struct page *balloon_page_alloc(void)
+{
+ struct page *page = alloc_page(balloon_mapping_gfp_mask() |
+ __GFP_NOMEMALLOC | __GFP_NORETRY);
+ return page;
+}
+EXPORT_SYMBOL_GPL(balloon_page_alloc);
+
+/*
* balloon_page_enqueue - allocates a new page and inserts it into the balloon
* page list.
* @b_dev_info: balloon device descriptor where we will insert a new page to
+ * @page: new page to enqueue - allocated using balloon_page_alloc.
*
- * Driver must call it to properly allocate a new enlisted balloon page
+ * Driver must call it to properly enqueue a new allocated balloon page
* before definitively removing it from the guest system.
* This function returns the page address for the recently enqueued page or
* NULL in the case we fail to allocate a new page this turn.
*/
-struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info)
+void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
+ struct page *page)
{
unsigned long flags;
- struct page *page = alloc_page(balloon_mapping_gfp_mask() |
- __GFP_NOMEMALLOC | __GFP_NORETRY);
- if (!page)
- return NULL;
/*
* Block others from accessing the 'page' when we get around to
@@ -39,7 +54,6 @@ struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info)
__count_vm_event(BALLOON_INFLATE);
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
unlock_page(page);
- return page;
}
EXPORT_SYMBOL_GPL(balloon_page_enqueue);
--
MST
^ permalink raw reply related [flat|nested] 3+ messages in thread[parent not found: <87o9nid3zn.fsf@concordia.ellerman.id.au>]
* Re: Memory corruption in powerpc guests with virtio_balloon (was Re: [PATCH v3] virtio_balloon: fix deadlock on OOM) [not found] ` <87o9nid3zn.fsf@concordia.ellerman.id.au> @ 2017-12-01 14:42 ` Michael S. Tsirkin 2017-12-03 10:04 ` Michael Ellerman 0 siblings, 1 reply; 3+ messages in thread From: Michael S. Tsirkin @ 2017-12-01 14:42 UTC (permalink / raw) To: Michael Ellerman Cc: linux-kernel, Tetsuo Handa, Michal Hocko, Wei Wang, Jason Wang, virtualization, linux-mm, linuxppc-dev@lists.ozlabs.org On Fri, Dec 01, 2017 at 11:31:08PM +1100, Michael Ellerman wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > fill_balloon doing memory allocations under balloon_lock > > can cause a deadlock when leak_balloon is called from > > virtballoon_oom_notify and tries to take same lock. > > > > To fix, split page allocation and enqueue and do allocations outside the lock. > > > > Here's a detailed analysis of the deadlock by Tetsuo Handa: > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to > > serialize against fill_balloon(). But in fill_balloon(), > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY > > is specified, this allocation attempt might indirectly depend on somebody > > else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect > > __GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via > > virtballoon_oom_notify() via blocking_notifier_call_chain() callback via > > out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock > > mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it > > will cause OOM lockup. > > > > Thread1 Thread2 > > fill_balloon() > > takes a balloon_lock > > balloon_page_enqueue() > > alloc_page(GFP_HIGHUSER_MOVABLE) > > direct reclaim (__GFP_FS context) takes a fs lock > > waits for that fs lock alloc_page(GFP_NOFS) > > __alloc_pages_may_oom() > > takes the oom_lock > > out_of_memory() > > blocking_notifier_call_chain() > > leak_balloon() > > tries to take that balloon_lock and deadlocks > > > > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Wei Wang <wei.w.wang@intel.com> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > > > drivers/virtio/virtio_balloon.c | 24 +++++++++++++++++++----- > > include/linux/balloon_compaction.h | 35 ++++++++++++++++++++++++++++++++++- > > mm/balloon_compaction.c | 28 +++++++++++++++++++++------- > > 3 files changed, 74 insertions(+), 13 deletions(-) > > > Somehow this commit seems to be killing powerpc guests. > > The symptom is that the first page (64K) of the guests memory gets over > written with zeroes, which is where our interrupt handlers are, so the > system rapidly locks up due to illegal instructions in the illegal > instruction handler. > > There seems to be some element of a race, because it doesn't always > crash. Sometimes I can boot to a shell, but not often. When it does > happen it's fairly late in boot, but before I get to a shell. > > I had a few bisects go off into the weeds due to the intermittent > nature. But once I realised that I changed my script to boot 5 times > before declaring a kernel good, and that bisected straight here. > > I can also revert this commit on v4.15-rc1 and everything's fine again - > I got through ~250 boots with that kernel. > > So I'm pretty sure this commit is triggering/exposing/causing the bug. > > The other data point is that the page that's overwritten is mapped read > only in the guest kernel. So either the guest kernel is writing to it in > real mode (MMU off), or the hypervisor/DMA is doing it. > > I haven't isolated if it's host kernel/qemu version dependent, at the > moment I'm just using distro packaged versions of both. > > Anyway I'll try and dig further into it on Monday, but I thought I'd let > you know in case this is a known bug with a fix in the pipeline, or > rings any bells or whatever. > > cheers Thanks for the report! A fix was just posted: virtio_balloon: fix increment of vb->num_pfns in fill_balloon() Would appreciate testing. > > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > > index f0b3a0b..7960746 100644 > > --- a/drivers/virtio/virtio_balloon.c > > +++ b/drivers/virtio/virtio_balloon.c > > @@ -143,16 +143,17 @@ static void set_page_pfns(struct virtio_balloon *vb, > > > > static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > > { > > - struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; > > unsigned num_allocated_pages; > > + unsigned num_pfns; > > + struct page *page; > > + LIST_HEAD(pages); > > > > /* We can only do one array worth at a time. */ > > num = min(num, ARRAY_SIZE(vb->pfns)); > > > > - mutex_lock(&vb->balloon_lock); > > - for (vb->num_pfns = 0; vb->num_pfns < num; > > - vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > > - struct page *page = balloon_page_enqueue(vb_dev_info); > > + for (num_pfns = 0; num_pfns < num; > > + num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > > + struct page *page = balloon_page_alloc(); > > > > if (!page) { > > dev_info_ratelimited(&vb->vdev->dev, > > @@ -162,6 +163,19 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > > msleep(200); > > break; > > } > > + > > + balloon_page_push(&pages, page); > > + } > > + > > + mutex_lock(&vb->balloon_lock); > > + > > + vb->num_pfns = 0; > > + > > + while ((page = balloon_page_pop(&pages))) { > > + balloon_page_enqueue(&vb->vb_dev_info, page); > > + > > + vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE; > > + > > set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > > if (!virtio_has_feature(vb->vdev, > > diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h > > index fbbe6da..c4c8df9 100644 > > --- a/include/linux/balloon_compaction.h > > +++ b/include/linux/balloon_compaction.h > > @@ -50,6 +50,7 @@ > > #include <linux/gfp.h> > > #include <linux/err.h> > > #include <linux/fs.h> > > +#include <linux/list.h> > > > > /* > > * Balloon device information descriptor. > > @@ -67,7 +68,9 @@ struct balloon_dev_info { > > struct inode *inode; > > }; > > > > -extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info); > > +extern struct page *balloon_page_alloc(void); > > +extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info, > > + struct page *page); > > extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info); > > > > static inline void balloon_devinfo_init(struct balloon_dev_info *balloon) > > @@ -89,6 +92,36 @@ extern int balloon_page_migrate(struct address_space *mapping, > > struct page *page, enum migrate_mode mode); > > > > /* > > + * balloon_page_push - insert a page into a page list. > > + * @head : pointer to list > > + * @page : page to be added > > + * > > + * Caller must ensure the page is private and protect the list. > > + */ > > +static inline void balloon_page_push(struct list_head *pages, struct page *page) > > +{ > > + list_add(&page->lru, pages); > > +} > > + > > +/* > > + * balloon_page_pop - remove a page from a page list. > > + * @head : pointer to list > > + * @page : page to be added > > + * > > + * Caller must ensure the page is private and protect the list. > > + */ > > +static inline struct page *balloon_page_pop(struct list_head *pages) > > +{ > > + struct page *page = list_first_entry_or_null(pages, struct page, lru); > > + > > + if (!page) > > + return NULL; > > + > > + list_del(&page->lru); > > + return page; > > +} > > + > > +/* > > * balloon_page_insert - insert a page into the balloon's page list and make > > * the page->private assignment accordingly. > > * @balloon : pointer to balloon device > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c > > index 68d2892..ef858d5 100644 > > --- a/mm/balloon_compaction.c > > +++ b/mm/balloon_compaction.c > > @@ -11,22 +11,37 @@ > > #include <linux/balloon_compaction.h> > > > > /* > > + * balloon_page_alloc - allocates a new page for insertion into the balloon > > + * page list. > > + * > > + * Driver must call it to properly allocate a new enlisted balloon page. > > + * Driver must call balloon_page_enqueue before definitively removing it from > > + * the guest system. This function returns the page address for the recently > > + * allocated page or NULL in the case we fail to allocate a new page this turn. > > + */ > > +struct page *balloon_page_alloc(void) > > +{ > > + struct page *page = alloc_page(balloon_mapping_gfp_mask() | > > + __GFP_NOMEMALLOC | __GFP_NORETRY); > > + return page; > > +} > > +EXPORT_SYMBOL_GPL(balloon_page_alloc); > > + > > +/* > > * balloon_page_enqueue - allocates a new page and inserts it into the balloon > > * page list. > > * @b_dev_info: balloon device descriptor where we will insert a new page to > > + * @page: new page to enqueue - allocated using balloon_page_alloc. > > * > > - * Driver must call it to properly allocate a new enlisted balloon page > > + * Driver must call it to properly enqueue a new allocated balloon page > > * before definitively removing it from the guest system. > > * This function returns the page address for the recently enqueued page or > > * NULL in the case we fail to allocate a new page this turn. > > */ > > -struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info) > > +void balloon_page_enqueue(struct balloon_dev_info *b_dev_info, > > + struct page *page) > > { > > unsigned long flags; > > - struct page *page = alloc_page(balloon_mapping_gfp_mask() | > > - __GFP_NOMEMALLOC | __GFP_NORETRY); > > - if (!page) > > - return NULL; > > > > /* > > * Block others from accessing the 'page' when we get around to > > @@ -39,7 +54,6 @@ struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info) > > __count_vm_event(BALLOON_INFLATE); > > spin_unlock_irqrestore(&b_dev_info->pages_lock, flags); > > unlock_page(page); > > - return page; > > } > > EXPORT_SYMBOL_GPL(balloon_page_enqueue); > > > > -- > > MST ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Memory corruption in powerpc guests with virtio_balloon (was Re: [PATCH v3] virtio_balloon: fix deadlock on OOM) 2017-12-01 14:42 ` Memory corruption in powerpc guests with virtio_balloon (was Re: [PATCH v3] virtio_balloon: fix deadlock on OOM) Michael S. Tsirkin @ 2017-12-03 10:04 ` Michael Ellerman 0 siblings, 0 replies; 3+ messages in thread From: Michael Ellerman @ 2017-12-03 10:04 UTC (permalink / raw) To: Michael S. Tsirkin Cc: linux-kernel, Tetsuo Handa, Michal Hocko, Wei Wang, Jason Wang, virtualization, linux-mm, linuxppc-dev@lists.ozlabs.org "Michael S. Tsirkin" <mst@redhat.com> writes: > On Fri, Dec 01, 2017 at 11:31:08PM +1100, Michael Ellerman wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> >> > fill_balloon doing memory allocations under balloon_lock >> > can cause a deadlock when leak_balloon is called from >> > virtballoon_oom_notify and tries to take same lock. ... >> >> >> Somehow this commit seems to be killing powerpc guests. ... > > Thanks for the report! > A fix was just posted: > virtio_balloon: fix increment of vb->num_pfns in fill_balloon() > > Would appreciate testing. Yep that fixes it. Thanks. Tested-by: Michael Ellerman <mpe@ellerman.id.au> cheers ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-12-03 10:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-08 15:17 [PATCH v3] virtio_balloon: fix deadlock on OOM Michael S. Tsirkin
[not found] ` <87o9nid3zn.fsf@concordia.ellerman.id.au>
2017-12-01 14:42 ` Memory corruption in powerpc guests with virtio_balloon (was Re: [PATCH v3] virtio_balloon: fix deadlock on OOM) Michael S. Tsirkin
2017-12-03 10:04 ` Michael Ellerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox