From: "Michael S. Tsirkin" <mst@redhat.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Konstantin Khlebnikov <koct9i@gmail.com>,
Rafael Aquini <aquini@redhat.com>,
stable@vger.kernel.org
Subject: Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
Date: Fri, 1 Jan 2016 11:36:13 +0200 [thread overview]
Message-ID: <20160101102756-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <1451259313-26353-2-git-send-email-minchan@kernel.org>
On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> In balloon_page_dequeue, pages_lock should cover the loop
> (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> be isolated by compaction and then list_del by isolation could
> poison the page->lru.{prev,next} so the loop finally could
> access wrong address like this. This patch fixes the bug.
>
> general protection fault: 0000 [#1] SMP
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> RIP: 0010:[<ffffffff8115e754>] [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> RSP: 0018:ffff8800a7fefdc0 EFLAGS: 00010246
> RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> FS: 0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> Stack:
> 0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
> 0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
> ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> Call Trace:
> [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
> [<ffffffff812c8bc7>] balloon+0x217/0x2a0
> [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
> [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
> [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
> [<ffffffff8105b6e9>] kthread+0xc9/0xe0
> [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
> [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> RIP [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> RSP <ffff8800a7fefdc0>
> ---[ end trace 43cf28060d708d5f ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Kernel Offset: disabled
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
> mm/balloon_compaction.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index d3116be5a00f..300117f1a08f 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> bool dequeued_page;
>
> dequeued_page = false;
> + spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> /*
> * Block others from accessing the 'page' while we get around
> @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> continue;
> }
> #endif
> - spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> balloon_page_delete(page);
> __count_vm_event(BALLOON_DEFLATE);
> - spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> unlock_page(page);
> dequeued_page = true;
> break;
> }
> }
> + spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>
> if (!dequeued_page) {
> /*
I think this will cause deadlocks.
pages_lock now nests within page lock, balloon_page_putback
nests them in the reverse order.
Did you test this with lockdep? You really should for
locking changes, and I'd expect it to warn about this.
Also, there's another issue there I think: after isolation page could
also get freed before we try to lock it.
We really must take a page reference before touching
the page.
I think we need something like the below to fix this issue.
Could you please try this out, and send Tested-by?
I will repost as a proper patch if this works for you.
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index d3116be..66d69c5 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -56,12 +56,34 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
*/
struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
{
- struct page *page, *tmp;
+ struct page *page;
unsigned long flags;
bool dequeued_page;
+ LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
dequeued_page = false;
- list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
+ /*
+ * We need to go over b_dev_info->pages and lock each page,
+ * but b_dev_info->pages_lock must nest within page lock.
+ *
+ * To make this safe, remove each page from b_dev_info->pages list
+ * under b_dev_info->pages_lock, then drop this lock. Once list is
+ * empty, re-add them also under b_dev_info->pages_lock.
+ */
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+ while (!list_empty(&b_dev_info->pages)) {
+ page = list_first_entry(&b_dev_info->pages, typeof(*page), lru);
+ /* move to processed list to avoid going over it another time */
+ list_move(&page->lru, &processed);
+
+ if (!get_page_unless_zero(page))
+ continue;
+ /*
+ * pages_lock nests within page lock,
+ * so drop it before trylock_page
+ */
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+
/*
* Block others from accessing the 'page' while we get around
* establishing additional references and preparing the 'page'
@@ -72,6 +94,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
if (!PagePrivate(page)) {
/* raced with isolation */
unlock_page(page);
+ put_page(page);
continue;
}
#endif
@@ -80,11 +103,18 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
__count_vm_event(BALLOON_DEFLATE);
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
unlock_page(page);
+ put_page(page);
dequeued_page = true;
break;
}
+ put_page(page);
+ spin_lock_irqsave(&b_dev_info->pages_lock, flags);
}
+ /* re-add remaining entries */
+ list_splice(&processed, &b_dev_info->pages);
+ spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+
if (!dequeued_page) {
/*
* If we are unable to dequeue a balloon page because the page
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-01-01 9:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-27 23:35 [PATCH 1/2] virtio_balloon: fix race by fill and leak Minchan Kim
2015-12-27 23:35 ` [PATCH 2/2] virtio_balloon: fix race between migration and ballooning Minchan Kim
2015-12-27 23:36 ` Rafael Aquini
2016-01-01 9:36 ` Michael S. Tsirkin [this message]
2016-01-04 0:27 ` Minchan Kim
2016-01-10 21:40 ` Michael S. Tsirkin
2016-01-10 23:54 ` Minchan Kim
2016-01-08 19:56 ` Rafael Aquini
2016-01-08 23:43 ` Minchan Kim
2016-01-09 21:43 ` Michael S. Tsirkin
2016-01-09 23:03 ` Rafael Aquini
2015-12-27 23:36 ` [PATCH 1/2] virtio_balloon: fix race by fill and leak Rafael Aquini
2016-01-01 8:26 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160101102756-mutt-send-email-mst@redhat.com \
--to=mst@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aquini@redhat.com \
--cc=koct9i@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=stable@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).