From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM Date: Mon, 27 Apr 2015 15:48:42 +0200 Message-ID: <553E3E3A.9010107@suse.de> References: <1429787297-9292-1-git-send-email-borntraeger@de.ibm.com> <1429787297-9292-2-git-send-email-borntraeger@de.ibm.com> <2E97DEE6-47EA-484A-9F02-9F031DCA8F36@suse.de> <5538DAD4.4060505@de.ibm.com> <20150423141309.7c500236@mschwide> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: Christian Borntraeger , Paolo Bonzini , KVM , Cornelia Huck , Jens Freimann To: Martin Schwidefsky Return-path: Received: from cantor2.suse.de ([195.135.220.15]:59085 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964826AbbD0Nuj (ORCPT ); Mon, 27 Apr 2015 09:50:39 -0400 In-Reply-To: <20150423141309.7c500236@mschwide> Sender: kvm-owner@vger.kernel.org List-ID: On 04/23/2015 02:13 PM, Martin Schwidefsky wrote: > On Thu, 23 Apr 2015 14:01:23 +0200 > Alexander Graf wrote: > >> As far as alternative approaches go, I don't have a great idea otoh. >> We could have an elf flag indicating that this process needs 4k page >> tables to limit the impact to a single process. In fact, could we >> maybe still limit the scope to non-global? A personality may work >> as well. Or ulimit? > I tried the ELF flag approach, does not work. The trouble is that > allocate_mm() has to create the page tables with 4K tables if you > want to change the page table layout later on. We have learned the > hard way that the direction 2K to 4K does not work due to races > in the mm. > > Now there are two major cases: 1) fork + execve and 2) fork only. > The ELF flag can be used to reduce from 4K to 2K for 1) but not 2). > 2) is required for apps that use lots of forking, e.g. database or > web servers. Same goes for the approach with a personality flag or > ulimit. > > We would have to distinguish the two cases for allocate_mm(), > if the new mm is allocated for a fork the current mm decides > 2K vs. 4K. If the new mm is allocated by binfmt_elf, then start > with 4K and do the downgrade after the ELF flag has been evaluated. Well, you could also make it a personality flag for example, no? Then every new process below a certain one always gets 4k page tables until they drop the personality, at which point each child would only get 2k page tables again. I'm mostly concerned that people will end up mixing VMs and other workloads on the same LPAR, so I don't think there's a one-shoe-fits-all solution. Alex