KVM: memory ballooning bug?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Rafael Aquini <aquini@redhat.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	linux-kernel@vger.kernel.org
Subject: KVM: memory ballooning bug?
Date: Wed, 23 Dec 2015 14:22:28 +0900	[thread overview]
Message-ID: <20151223052228.GA31269@bbox> (raw)

During my compaction-related stuff, I encountered some problems with
ballooning.

Firstly, with repeated inflating and deflating cycle, guest memory(ie,
cat /proc/meminfo | grep MemTotal) decreased and couldn't recover.

When I review source code, balloon_lock should cover release_pages_balloon.
Otherwise, struct virtio_balloon fields could be overwritten by race
of fill_balloon(e,g, vb->*pfns could be critical).
Below patch fixed the problem.

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7efc32945810..7d3e5d0e9aa4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -209,8 +209,8 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
         */
        if (vb->num_pfns != 0)
                tell_host(vb, vb->deflate_vq);
-       mutex_unlock(&vb->balloon_lock);
        release_pages_balloon(vb);
+       mutex_unlock(&vb->balloon_lock);
        return num_freed_pages;
 }
 
Secondly, in balloon_page_dequeue, pages_lock should cover
list_for_each_entry_safe loop. Otherwise, the cursor page
could be isolated by compaction and then list_del by isolation
could poison the page->lru so the loop could access wrong address
like this.

general protection fault: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1+ #1906
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
Stack:
 0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
 0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
 ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
Call Trace:
 [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
 [<ffffffff812c8bc7>] balloon+0x217/0x2a0
 [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
 [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
 [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
 [<ffffffff8105b6e9>] kthread+0xc9/0xe0
 [<ffffffff8105b620>] ? kthread_park+0x60/0x60
 [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
 [<ffffffff8105b620>] ? kthread_park+0x60/0x60
Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89 
RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
 RSP <ffff8800a7fefdc0>
---[ end trace 43cf28060d708d5f ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

We could fix it by protecting the entire loop by pages_lock but
problem is irq latency during walking the list.
But I doubt how often such worst scenario happens because
in normal situation, the loop would exit easily via succeeding
trylock_page.

Any comments?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Minchan Kim <minchan@kernel.org>
To: Rafael Aquini <aquini@redhat.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	linux-kernel@vger.kernel.org
Subject: KVM: memory ballooning bug?
Date: Wed, 23 Dec 2015 14:22:28 +0900	[thread overview]
Message-ID: <20151223052228.GA31269@bbox> (raw)

During my compaction-related stuff, I encountered some problems with
ballooning.

Firstly, with repeated inflating and deflating cycle, guest memory(ie,
cat /proc/meminfo | grep MemTotal) decreased and couldn't recover.

When I review source code, balloon_lock should cover release_pages_balloon.
Otherwise, struct virtio_balloon fields could be overwritten by race
of fill_balloon(e,g, vb->*pfns could be critical).
Below patch fixed the problem.

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7efc32945810..7d3e5d0e9aa4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -209,8 +209,8 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
         */
        if (vb->num_pfns != 0)
                tell_host(vb, vb->deflate_vq);
-       mutex_unlock(&vb->balloon_lock);
        release_pages_balloon(vb);
+       mutex_unlock(&vb->balloon_lock);
        return num_freed_pages;
 }
 
Secondly, in balloon_page_dequeue, pages_lock should cover
list_for_each_entry_safe loop. Otherwise, the cursor page
could be isolated by compaction and then list_del by isolation
could poison the page->lru so the loop could access wrong address
like this.

general protection fault: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1+ #1906
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
Stack:
 0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
 0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
 ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
Call Trace:
 [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
 [<ffffffff812c8bc7>] balloon+0x217/0x2a0
 [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
 [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
 [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
 [<ffffffff8105b6e9>] kthread+0xc9/0xe0
 [<ffffffff8105b620>] ? kthread_park+0x60/0x60
 [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
 [<ffffffff8105b620>] ? kthread_park+0x60/0x60
Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89 
RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
 RSP <ffff8800a7fefdc0>
---[ end trace 43cf28060d708d5f ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

We could fix it by protecting the entire loop by pages_lock but
problem is irq latency during walking the list.
But I doubt how often such worst scenario happens because
in normal situation, the loop would exit easily via succeeding
trylock_page.

Any comments?

next             reply	other threads:[~2015-12-23  5:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-23  5:22 Minchan Kim [this message]
2015-12-23  5:22 ` KVM: memory ballooning bug? Minchan Kim
2015-12-23 11:14 ` Rafael Aquini
2015-12-23 11:14   ` Rafael Aquini
2015-12-23 14:17   ` Minchan Kim
2015-12-23 14:17     ` Minchan Kim
2015-12-23 15:14     ` Rafael Aquini
2015-12-23 15:14       ` Rafael Aquini
2015-12-27 17:23 ` Konstantin Khlebnikov
2015-12-27 17:23   ` Konstantin Khlebnikov
2015-12-27 23:49   ` Minchan Kim
2015-12-27 23:49     ` Minchan Kim

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7efc3294581 dfblob:7d3e5d0e9aa dfblob:7efc3294581
dfblob:7d3e5d0e9aa )
 OR (
bs:"KVM: memory ballooning bug?" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151223052228.GA31269@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.