From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM Date: Mon, 27 Apr 2015 15:49:46 +0200 Message-ID: <553E3E7A.8090604@suse.de> References: <1429787297-9292-1-git-send-email-borntraeger@de.ibm.com> <1429787297-9292-2-git-send-email-borntraeger@de.ibm.com> <2E97DEE6-47EA-484A-9F02-9F031DCA8F36@suse.de> <5538DAD4.4060505@de.ibm.com> <5538E0B8.2070106@de.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: Paolo Bonzini , KVM , Cornelia Huck , Jens Freimann , Martin Schwidefsky To: Christian Borntraeger Return-path: Received: from cantor2.suse.de ([195.135.220.15]:59126 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932628AbbD0Nvl (ORCPT ); Mon, 27 Apr 2015 09:51:41 -0400 In-Reply-To: <5538E0B8.2070106@de.ibm.com> Sender: kvm-owner@vger.kernel.org List-ID: On 04/23/2015 02:08 PM, Christian Borntraeger wrote: > Am 23.04.2015 um 14:01 schrieb Alexander Graf: >> >>> Am 23.04.2015 um 13:43 schrieb Christian Borntraeger : >>> >>>> Am 23.04.2015 um 13:37 schrieb Alexander Graf: >>>> >>>> >>>>> Am 23.04.2015 um 13:08 schrieb Christian Borntraeger : >>>>> >>>>> From: Martin Schwidefsky >>>>> >>>>> Replacing a 2K page table with a 4K page table while a VMA is active >>>>> for the affected memory region is fundamentally broken. Rip out the >>>>> page table reallocation code and replace it with a simple system >>>>> control 'vm.allocate_pgste'. If the system control is set the page >>>>> tables for all processes are allocated as full 4K pages, even for >>>>> processes that do not need it. >>>>> >>>>> Signed-off-by: Martin Schwidefsky >>>>> Signed-off-by: Christian Borntraeger >>>> Couldn't you make this a hidden kconfig option that gets automatically selected when kvm is enabled? Or is there a non-kvm case that needs it too? >>> For things like RHEV the default could certainly be "enabled", but for normal >>> distros like SLES/RHEL, the idea was to NOT enable that by default, as the non-KVM >>> case is more common and might suffer from the additional memory consumption of >>> the page tables. (big databases come to mind) >>> >>> We could think about having rpms like kvm to provide a sysctl file that sets it if we >>> want to minimize the impact. Other ideas? >> Oh, I'm sorry, I misread the ifdef. I don't think it makes sense to have a config option for the default value then, just rely only on sysctl.conf for changed defaults. >> >> As far as mechanisms to change it go, every distribution has their own ways of dealing with this. RH has a "profile" thing, we don't really have anything central, but individual sysctl.d files for example that a kvm package could provide. >> Either way, the default choosing shouldn't happen in .config ;). > So you vote for getting rid of the Kconfig? > > Also, please add some helpful error message in qemu to guide users to the sysctl. > > Yes, we will provide a qemu patch (cc stable) after this hits the kernel. > >> As far as alternative approaches go, I don't have a great idea otoh. We could have an elf flag indicating that this process needs 4k page tables to limit the impact to a single process. > This approach was actually Martins first fix. The problem is that the decision takes place on execve, > but we need an answer at fork time. So we always started with 4k page tables and freed the 2nd halv on > execve. Now this did not work for processes that only fork (without execve). > >> In fact, could we maybe still limit the scope to non-global? A personality may work as well. Or ulimit? > I think we will go for now with the sysctl and see if we can come up with some automatic way as additional > patch later on. Sounds perfectly reasonable to me. You can for example also just set the sysctl bit in libvirtd :). Alex