From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43726) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3rlh-0006iA-Br for qemu-devel@nongnu.org; Sat, 13 Jun 2015 16:11:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z3rlU-0001XQ-4T for qemu-devel@nongnu.org; Sat, 13 Jun 2015 16:11:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51389) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3rlT-0001XF-Sk for qemu-devel@nongnu.org; Sat, 13 Jun 2015 16:10:52 -0400 Date: Sat, 13 Jun 2015 22:10:46 +0200 From: "Michael S. Tsirkin" Message-ID: <20150612185256-mutt-send-email-mst@redhat.com> References: <1433845144-26889-1-git-send-email-den@openvz.org> <1433845144-26889-2-git-send-email-den@openvz.org> <5576C1CF.40305@de.ibm.com> <5578274D.6070900@openvz.org> <20150610151113-mutt-send-email-mst@redhat.com> <557AC8F5.6040105@de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <557AC8F5.6040105@de.ibm.com> Subject: Re: [Qemu-devel] [PATCH 1/1] balloon: add a feature bit to let Guest OS deflate balloon on oom List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger Cc: James.Bottomley@HansenPartnership.com, "Denis V. Lunev" , qemu-devel@nongnu.org, Raushaniya Maksudova , Anthony Liguori On Fri, Jun 12, 2015 at 01:56:37PM +0200, Christian Borntraeger wrote: > Am 10.06.2015 um 15:13 schrieb Michael S. Tsirkin: > > On Wed, Jun 10, 2015 at 03:02:21PM +0300, Denis V. Lunev wrote: > >> On 09/06/15 13:37, Christian Borntraeger wrote: > >>> Am 09.06.2015 um 12:19 schrieb Denis V. Lunev: > >>>> Excessive virtio_balloon inflation can cause invocation of OOM-killer, > >>>> when Linux is under severe memory pressure. Various mechanisms are > >>>> responsible for correct virtio_balloon memory management. Nevertheless it > >>>> is often the case that these control tools does not have enough time to > >>>> react on fast changing memory load. As a result OS runs out of memory and > >>>> invokes OOM-killer. The balancing of memory by use of the virtio balloon > >>>> should not cause the termination of processes while there are pages in the > >>>> balloon. Now there is no way for virtio balloon driver to free memory at > >>>> the last moment before some process get killed by OOM-killer. > >>>> > >>>> This does not provide a security breach as balloon itself is running > >>>> inside Guest OS and is working in the cooperation with the host. Thus > >>>> some improvements from Guest side should be considered as normal. > >>>> > >>>> To solve the problem, introduce a virtio_balloon callback which is > >>>> expected to be called from the oom notifier call chain in out_of_memory() > >>>> function. If virtio balloon could release some memory, it will make the > >>>> system return and retry the allocation that forced the out of memory > >>>> killer to run. > >>>> > >>>> This behavior should be enabled if and only if appropriate feature bit > >>>> is set on the device. It is off by default. > >>> The balloon frees pages in this way > >>> > >>> static void balloon_page(void *addr, int deflate) > >>> { > >>> #if defined(__linux__) > >>> if (!kvm_enabled() || kvm_has_sync_mmu()) > >>> qemu_madvise(addr, TARGET_PAGE_SIZE, > >>> deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED); > >>> #endif > >>> } > >>> > >>> The guest can re-touch that page and get a empty zero or the old page back without > >>> tampering the host integrity. This should work for all cases I am aware of (without sync_mmu its a nop anyway) so why not enable that by default? Anything that I missed? > >>> > >>> Christian > >> > >> I'd like to do that :) Actually original version of kernel patch > >> has enabled this unconditionally. But Michael asked to make > >> it configurable and off by default. > >> > >> Den > > > > That's not the question here. The question is why is it limited by kvm_has_sync_mmu. > > Well we have two interesting options here: > > VIRTIO_BALLOON_F_MUST_TELL_HOST and VIRTIO_BALLOON_F_DEFLATE_ON_OOM > > For any sane host with ondemand paging just re-accessing the page > should simply work. So the common case could be > VIRTIO_BALLOON_F_MUST_TELL_HOST == off Disabling this breaks useful optimizations such as ability not to migrate memory in the balloon. > VIRTIO_BALLOON_F_DEFLATE_ON_OOM == on AFAIK management tools depend on balloon not deflating below host-specified threshold to avoid OOM on the host. So I don't think we can make this a default, management needs to enable this explicitly. > Only for the rare case of hypervisors without paging or other memory > related restrictions we have to enable MUST_TELL_HOST. > Now: QEMU knows exactly which case we have, so why not let QEMU tell > the guest what the capabilities are. (e.g. sync_mmu ---> no need to > tell the host). > > I can at least imaging that some admin wants to make the the oom case > configurable, but a sane default seems to be to not kill random > guest processes. > > Christian -- MST