From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexander Graf <agraf@suse.de>
Subject: Re: [PATCH] KVM: s390: remove delayed reallocation of page tables
 for KVM
Date: Mon, 27 Apr 2015 16:03:26 +0200
Message-ID: <553E41AE.10604@suse.de>
References: <1429787297-9292-1-git-send-email-borntraeger@de.ibm.com>	<1429787297-9292-2-git-send-email-borntraeger@de.ibm.com>	<2E97DEE6-47EA-484A-9F02-9F031DCA8F36@suse.de>	<5538DAD4.4060505@de.ibm.com>	<E97BD94D-88F9-4B99-A5E4-D707DC0E2325@suse.de>	<20150423141309.7c500236@mschwide>	<553E3E3A.9010107@suse.de> <20150427155745.64729393@mschwide>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
	Paolo Bonzini <pbonzini@redhat.com>, KVM <kvm@vger.kernel.org>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	Jens Freimann <jfrei@linux.vnet.ibm.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from cantor2.suse.de ([195.135.220.15]:60444 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S964784AbbD0OFW (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 27 Apr 2015 10:05:22 -0400
In-Reply-To: <20150427155745.64729393@mschwide>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 04/27/2015 03:57 PM, Martin Schwidefsky wrote:
> On Mon, 27 Apr 2015 15:48:42 +0200
> Alexander Graf <agraf@suse.de> wrote:
>
>> On 04/23/2015 02:13 PM, Martin Schwidefsky wrote:
>>> On Thu, 23 Apr 2015 14:01:23 +0200
>>> Alexander Graf <agraf@suse.de> wrote:
>>>
>>>> As far as alternative approaches go, I don't have a great idea otoh.
>>>> We could have an elf flag indicating that this process needs 4k page
>>>> tables to limit the impact to a single process. In fact, could we
>>>> maybe still limit the scope to non-global? A personality may work
>>>> as well. Or ulimit?
>>> I tried the ELF flag approach, does not work. The trouble is that
>>> allocate_mm() has to create the page tables with 4K tables if you
>>> want to change the page table layout later on. We have learned the
>>> hard way that the direction 2K to 4K does not work due to races
>>> in the mm.
>>>
>>> Now there are two major cases: 1) fork + execve and 2) fork only.
>>> The ELF flag can be used to reduce from 4K to 2K for 1) but not 2).
>>> 2) is required for apps that use lots of forking, e.g. database or
>>> web servers. Same goes for the approach with a personality flag or
>>> ulimit.
>>>
>>> We would have to distinguish the two cases for allocate_mm(),
>>> if the new mm is allocated for a fork the current mm decides
>>> 2K vs. 4K. If the new mm is allocated by binfmt_elf, then start
>>> with 4K and do the downgrade after the ELF flag has been evaluated.
>> Well, you could also make it a personality flag for example, no? Then
>> every new process below a certain one always gets 4k page tables until
>> they drop the personality, at which point each child would only get 2k
>> page tables again.
>>
>> I'm mostly concerned that people will end up mixing VMs and other
>> workloads on the same LPAR, so I don't think there's a one-shoe-fits-all
>> solution.
> If I add an argument to mm_init() to indicate if this context
> is for fork() or execve() then the ELF header flag approach works.

So you don't need the sysctl?


Alex