From mboxrd@z Thu Jan 1 00:00:00 1970 From: zhanghailiang Subject: Re: [BUG] Balloon malfunctions with memory hotplug Date: Tue, 3 Mar 2015 14:27:05 +0800 Message-ID: <54F55439.8050306@huawei.com> References: <20150226142629.749f4f1f@redhat.com> <54EFEDF0.10803@huawei.com> <20150302062234.GG26196@grmbl.mre> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , Luiz Capitulino , , , , , , To: Amit Shah Return-path: Received: from szxga01-in.huawei.com ([119.145.14.64]:17106 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752697AbbCCG1f (ORCPT ); Tue, 3 Mar 2015 01:27:35 -0500 In-Reply-To: <20150302062234.GG26196@grmbl.mre> Sender: kvm-owner@vger.kernel.org List-ID: On 2015/3/2 14:22, Amit Shah wrote: > On (Fri) 27 Feb 2015 [12:09:20], zhanghailiang wrote: >> On 2015/2/27 3:26, Luiz Capitulino wrote: >>> Hello, >>> >>> Reproducer: >>> >>> 1. Start QEMU with balloon and memory hotplug support: >>> >>> # qemu [...] -m 1G,slots=2,maxmem=2G -balloon virtio >>> >>> 2. Check balloon size: >>> >>> (qemu) info balloon >>> balloon: actual=1024 >>> (qemu) >>> >>> 3. Hotplug some memory: >>> >>> (qemu) object_add memory-backend-ram,id=mem1,size=1G >>> (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 >>> >>> 4. This is step is _not_ needed to reproduce the problem, >>> but you may need to online memory manually on Linux so >>> that it becomes available in the guest >>> >>> 5. Check balloon size again: >>> >>> (qemu) info balloon >>> balloon: actual=1024 >>> (qemu) >>> >>> BUG: The guest now has 2GB of memory, but the balloon thinks >>> the guest has 1GB >>> >>> One may think that the problem is that the balloon driver is >>> ignoring hotplugged memory. This is not what's happening. If >>> you do balloon your guest, there's nothing stopping the >>> balloon driver in the guest from ballooning hotplugged memory. >>> >>> The problem is that the balloon device in QEMU needs to know >>> the current amount of memory available to the guest. >>> >>> Before memory hotplug this information was easy to obtain: the >>> current amount of memory available to the guest is the memory the >>> guest was booted with. This value is stored in the ram_size global >>> variable in QEMU and this is what the balloon device emulation >>> code uses today. However, when memory is hotplugged ram_size is >>> _not_ updated and the balloon device breaks. >>> >>> I see two possible solutions for this problem: >>> >>> 1. In addition to reading ram_size, the balloon device in QEMU >>> could scan pc-dimm devices to account for hotplugged memory. >>> >>> This solution was already implemented by zhanghailiang: >>> >>> http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg02362.html >>> >>> It works, except that on Linux memory hotplug is a two-step >>> procedure: first memory is inserted then it has to be onlined >>> from user-space. So, if memory is inserted but not onlined >>> this solution gives the opposite problem: the balloon device >>> will report a larger memory amount than the guest actually has. >>> >>> Can we live with that? I guess not, but I'm open for discussion. >>> >>> If QEMU could be notified when Linux makes memory online, then >>> the problem would be gone. But I guess this can't be done. >>> >> >> Yes, it is really a problem, balloon can't work well with memory block online/offline now. >> virtio-balloon can't be notified when memory block online/offline now, actually, we can >> add this capability by using the exist kernel memory hotplug/unplug notifier mechanism. ( >> just a simple register_memory_notifier().) > > The Linux driver can come to know, but it can't tell the host > out-of-band about it. A new feature / config option can be added so > that a guest can update the host on what the current available RAM is. > This is a feasible scenario. Maybe Luiz could consider this. He is working on it now. >>> 2. Modify the balloon driver in the guest to inform the balloon >>> device on the host about the current memory available to the >>> guest. This way, whenever the balloon device in QEMU needs >>> to know the current amount of memory in the guest, it asks >>> the guest. This drops any usage of ram_size in the balloon >>> device >>> >>> I'm not completely sure this is feasible though. For example, >>> what happens if the guest reports a memory amount to QEMU and >>> right after this more memory is plugged? >>> >> >> Hmm, i wonder why we notify the number of pages which should be adjusted to virtio-balloon, >> why not the memory 'target' size ? Is there any special reason ? > > This is just how the design / code was done for balloon. I've > proposed we move to a target-based solution rather than the current > way. While drafting the new virtio spec, this was considered, but I OK, it is really a good idea to redesign virtio-balloon which is target-based. > lost track of it. The proposal was to just ditch the current balloon, > and come up with a new one with a saner design. Don't know who's > keeping track of that, though. > :( > BTW another problem for Luiz's option 2 here is we don't want to wait > for the guest to reply before making decisions. E.g. the guest could > be in S3 mode, and we may wait indefinitely for a reply, blocking > everything (the situation is slightly better with more threads, but in > older days, blocking for the guest to reply for balloon stats meant > the entire qemu froze till the guest replied. That's the reason the > feature was disabled). > Got it. thanks. >> For linux guest, it can always know exactly its current real memory size, but QEMU may not, because >> guest can do online/offline memory block by themselves. >> >> If virtio-balloon in guest know the balloon's 'target' size, it can calculate the exact memory size >> that should be adjuested. and also can do corresponding action (fill or leak balloon) >> when there is online/offline memory block occurred. >> >>> Besides, this solution is more complex than solution 1 and >>> won't address older guests. >>> >>> Another important detail is that, I *suspect* that a very similar >>> bug already exists with 32-bit guests even without memory >>> hotplug: what happens if you assign 6GB to a 32-bit without PAE >>> support? I think the same problem we're seeing with memory >>> hotplug will happen and solution 1 won't fix this, although >>> no one seems to care about 32-bit guests... > > Not just 32-bit guests; even 64-bit guests restricted with mem= on the > cmdline. I know we've discussed this in the past, and I recall > virtio-balloon v2 was going to address this all; sadly I've not kept > uptodate with it. > > Amit > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36255) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSgIo-0001DF-3C for qemu-devel@nongnu.org; Tue, 03 Mar 2015 01:27:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YSgIk-0003bZ-9W for qemu-devel@nongnu.org; Tue, 03 Mar 2015 01:27:34 -0500 Received: from szxga01-in.huawei.com ([119.145.14.64]:36305) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YSgIj-0003ZK-DJ for qemu-devel@nongnu.org; Tue, 03 Mar 2015 01:27:30 -0500 Message-ID: <54F55439.8050306@huawei.com> Date: Tue, 3 Mar 2015 14:27:05 +0800 From: zhanghailiang MIME-Version: 1.0 References: <20150226142629.749f4f1f@redhat.com> <54EFEDF0.10803@huawei.com> <20150302062234.GG26196@grmbl.mre> In-Reply-To: <20150302062234.GG26196@grmbl.mre> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [BUG] Balloon malfunctions with memory hotplug List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Amit Shah Cc: hangaohuai@huawei.com, pkrempa@redhat.com, kvm@vger.kernel.org, mst@redhat.com, qemu-devel@nongnu.org, peter.huangpeng@huawei.com, imammedo@redhat.com, Luiz Capitulino On 2015/3/2 14:22, Amit Shah wrote: > On (Fri) 27 Feb 2015 [12:09:20], zhanghailiang wrote: >> On 2015/2/27 3:26, Luiz Capitulino wrote: >>> Hello, >>> >>> Reproducer: >>> >>> 1. Start QEMU with balloon and memory hotplug support: >>> >>> # qemu [...] -m 1G,slots=2,maxmem=2G -balloon virtio >>> >>> 2. Check balloon size: >>> >>> (qemu) info balloon >>> balloon: actual=1024 >>> (qemu) >>> >>> 3. Hotplug some memory: >>> >>> (qemu) object_add memory-backend-ram,id=mem1,size=1G >>> (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 >>> >>> 4. This is step is _not_ needed to reproduce the problem, >>> but you may need to online memory manually on Linux so >>> that it becomes available in the guest >>> >>> 5. Check balloon size again: >>> >>> (qemu) info balloon >>> balloon: actual=1024 >>> (qemu) >>> >>> BUG: The guest now has 2GB of memory, but the balloon thinks >>> the guest has 1GB >>> >>> One may think that the problem is that the balloon driver is >>> ignoring hotplugged memory. This is not what's happening. If >>> you do balloon your guest, there's nothing stopping the >>> balloon driver in the guest from ballooning hotplugged memory. >>> >>> The problem is that the balloon device in QEMU needs to know >>> the current amount of memory available to the guest. >>> >>> Before memory hotplug this information was easy to obtain: the >>> current amount of memory available to the guest is the memory the >>> guest was booted with. This value is stored in the ram_size global >>> variable in QEMU and this is what the balloon device emulation >>> code uses today. However, when memory is hotplugged ram_size is >>> _not_ updated and the balloon device breaks. >>> >>> I see two possible solutions for this problem: >>> >>> 1. In addition to reading ram_size, the balloon device in QEMU >>> could scan pc-dimm devices to account for hotplugged memory. >>> >>> This solution was already implemented by zhanghailiang: >>> >>> http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg02362.html >>> >>> It works, except that on Linux memory hotplug is a two-step >>> procedure: first memory is inserted then it has to be onlined >>> from user-space. So, if memory is inserted but not onlined >>> this solution gives the opposite problem: the balloon device >>> will report a larger memory amount than the guest actually has. >>> >>> Can we live with that? I guess not, but I'm open for discussion. >>> >>> If QEMU could be notified when Linux makes memory online, then >>> the problem would be gone. But I guess this can't be done. >>> >> >> Yes, it is really a problem, balloon can't work well with memory block online/offline now. >> virtio-balloon can't be notified when memory block online/offline now, actually, we can >> add this capability by using the exist kernel memory hotplug/unplug notifier mechanism. ( >> just a simple register_memory_notifier().) > > The Linux driver can come to know, but it can't tell the host > out-of-band about it. A new feature / config option can be added so > that a guest can update the host on what the current available RAM is. > This is a feasible scenario. Maybe Luiz could consider this. He is working on it now. >>> 2. Modify the balloon driver in the guest to inform the balloon >>> device on the host about the current memory available to the >>> guest. This way, whenever the balloon device in QEMU needs >>> to know the current amount of memory in the guest, it asks >>> the guest. This drops any usage of ram_size in the balloon >>> device >>> >>> I'm not completely sure this is feasible though. For example, >>> what happens if the guest reports a memory amount to QEMU and >>> right after this more memory is plugged? >>> >> >> Hmm, i wonder why we notify the number of pages which should be adjusted to virtio-balloon, >> why not the memory 'target' size ? Is there any special reason ? > > This is just how the design / code was done for balloon. I've > proposed we move to a target-based solution rather than the current > way. While drafting the new virtio spec, this was considered, but I OK, it is really a good idea to redesign virtio-balloon which is target-based. > lost track of it. The proposal was to just ditch the current balloon, > and come up with a new one with a saner design. Don't know who's > keeping track of that, though. > :( > BTW another problem for Luiz's option 2 here is we don't want to wait > for the guest to reply before making decisions. E.g. the guest could > be in S3 mode, and we may wait indefinitely for a reply, blocking > everything (the situation is slightly better with more threads, but in > older days, blocking for the guest to reply for balloon stats meant > the entire qemu froze till the guest replied. That's the reason the > feature was disabled). > Got it. thanks. >> For linux guest, it can always know exactly its current real memory size, but QEMU may not, because >> guest can do online/offline memory block by themselves. >> >> If virtio-balloon in guest know the balloon's 'target' size, it can calculate the exact memory size >> that should be adjuested. and also can do corresponding action (fill or leak balloon) >> when there is online/offline memory block occurred. >> >>> Besides, this solution is more complex than solution 1 and >>> won't address older guests. >>> >>> Another important detail is that, I *suspect* that a very similar >>> bug already exists with 32-bit guests even without memory >>> hotplug: what happens if you assign 6GB to a 32-bit without PAE >>> support? I think the same problem we're seeing with memory >>> hotplug will happen and solution 1 won't fix this, although >>> no one seems to care about 32-bit guests... > > Not just 32-bit guests; even 64-bit guests restricted with mem= on the > cmdline. I know we've discussed this in the past, and I recall > virtio-balloon v2 was going to address this all; sadly I've not kept > uptodate with it. > > Amit > > . >