From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM Date: Mon, 27 Apr 2015 16:03:26 +0200 Message-ID: <553E41AE.10604@suse.de> References: <1429787297-9292-1-git-send-email-borntraeger@de.ibm.com> <1429787297-9292-2-git-send-email-borntraeger@de.ibm.com> <2E97DEE6-47EA-484A-9F02-9F031DCA8F36@suse.de> <5538DAD4.4060505@de.ibm.com> <20150423141309.7c500236@mschwide> <553E3E3A.9010107@suse.de> <20150427155745.64729393@mschwide> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: Christian Borntraeger , Paolo Bonzini , KVM , Cornelia Huck , Jens Freimann To: Martin Schwidefsky Return-path: Received: from cantor2.suse.de ([195.135.220.15]:60444 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964784AbbD0OFW (ORCPT ); Mon, 27 Apr 2015 10:05:22 -0400 In-Reply-To: <20150427155745.64729393@mschwide> Sender: kvm-owner@vger.kernel.org List-ID: On 04/27/2015 03:57 PM, Martin Schwidefsky wrote: > On Mon, 27 Apr 2015 15:48:42 +0200 > Alexander Graf wrote: > >> On 04/23/2015 02:13 PM, Martin Schwidefsky wrote: >>> On Thu, 23 Apr 2015 14:01:23 +0200 >>> Alexander Graf wrote: >>> >>>> As far as alternative approaches go, I don't have a great idea otoh. >>>> We could have an elf flag indicating that this process needs 4k page >>>> tables to limit the impact to a single process. In fact, could we >>>> maybe still limit the scope to non-global? A personality may work >>>> as well. Or ulimit? >>> I tried the ELF flag approach, does not work. The trouble is that >>> allocate_mm() has to create the page tables with 4K tables if you >>> want to change the page table layout later on. We have learned the >>> hard way that the direction 2K to 4K does not work due to races >>> in the mm. >>> >>> Now there are two major cases: 1) fork + execve and 2) fork only. >>> The ELF flag can be used to reduce from 4K to 2K for 1) but not 2). >>> 2) is required for apps that use lots of forking, e.g. database or >>> web servers. Same goes for the approach with a personality flag or >>> ulimit. >>> >>> We would have to distinguish the two cases for allocate_mm(), >>> if the new mm is allocated for a fork the current mm decides >>> 2K vs. 4K. If the new mm is allocated by binfmt_elf, then start >>> with 4K and do the downgrade after the ELF flag has been evaluated. >> Well, you could also make it a personality flag for example, no? Then >> every new process below a certain one always gets 4k page tables until >> they drop the personality, at which point each child would only get 2k >> page tables again. >> >> I'm mostly concerned that people will end up mixing VMs and other >> workloads on the same LPAR, so I don't think there's a one-shoe-fits-all >> solution. > If I add an argument to mm_init() to indicate if this context > is for fork() or execve() then the ELF header flag approach works. So you don't need the sysctl? Alex