From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Alexander Graf <agraf@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>, KVM <kvm@vger.kernel.org>,
Cornelia Huck <cornelia.huck@de.ibm.com>,
Jens Freimann <jfrei@linux.vnet.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM
Date: Thu, 23 Apr 2015 14:08:24 +0200 [thread overview]
Message-ID: <5538E0B8.2070106@de.ibm.com> (raw)
In-Reply-To: <E97BD94D-88F9-4B99-A5E4-D707DC0E2325@suse.de>
Am 23.04.2015 um 14:01 schrieb Alexander Graf:
>
>
>> Am 23.04.2015 um 13:43 schrieb Christian Borntraeger <borntraeger@de.ibm.com>:
>>
>>> Am 23.04.2015 um 13:37 schrieb Alexander Graf:
>>>
>>>
>>>> Am 23.04.2015 um 13:08 schrieb Christian Borntraeger <borntraeger@de.ibm.com>:
>>>>
>>>> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
>>>>
>>>> Replacing a 2K page table with a 4K page table while a VMA is active
>>>> for the affected memory region is fundamentally broken. Rip out the
>>>> page table reallocation code and replace it with a simple system
>>>> control 'vm.allocate_pgste'. If the system control is set the page
>>>> tables for all processes are allocated as full 4K pages, even for
>>>> processes that do not need it.
>>>>
>>>> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>
>>> Couldn't you make this a hidden kconfig option that gets automatically selected when kvm is enabled? Or is there a non-kvm case that needs it too?
>>
>> For things like RHEV the default could certainly be "enabled", but for normal
>> distros like SLES/RHEL, the idea was to NOT enable that by default, as the non-KVM
>> case is more common and might suffer from the additional memory consumption of
>> the page tables. (big databases come to mind)
>>
>> We could think about having rpms like kvm to provide a sysctl file that sets it if we
>> want to minimize the impact. Other ideas?
>
> Oh, I'm sorry, I misread the ifdef. I don't think it makes sense to have a config option for the default value then, just rely only on sysctl.conf for changed defaults.
>
> As far as mechanisms to change it go, every distribution has their own ways of dealing with this. RH has a "profile" thing, we don't really have anything central, but individual sysctl.d files for example that a kvm package could provide.
> Either way, the default choosing shouldn't happen in .config ;).
So you vote for getting rid of the Kconfig?
Also, please add some helpful error message in qemu to guide users to the sysctl.
Yes, we will provide a qemu patch (cc stable) after this hits the kernel.
> As far as alternative approaches go, I don't have a great idea otoh. We could have an elf flag indicating that this process needs 4k page tables to limit the impact to a single process.
This approach was actually Martins first fix. The problem is that the decision takes place on execve,
but we need an answer at fork time. So we always started with 4k page tables and freed the 2nd halv on
execve. Now this did not work for processes that only fork (without execve).
> In fact, could we maybe still limit the scope to non-global? A personality may work as well. Or ulimit?
I think we will go for now with the sysctl and see if we can come up with some automatic way as additional
patch later on.
Christian
next prev parent reply other threads:[~2015-04-23 12:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-23 11:08 [PATCH] page table bugfix for s390/kvm Christian Borntraeger
2015-04-23 11:08 ` [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM Christian Borntraeger
2015-04-23 11:37 ` Alexander Graf
2015-04-23 11:43 ` Christian Borntraeger
2015-04-23 12:01 ` Alexander Graf
2015-04-23 12:08 ` Christian Borntraeger [this message]
2015-04-27 13:49 ` Alexander Graf
2015-04-23 12:13 ` Martin Schwidefsky
2015-04-27 13:48 ` Alexander Graf
2015-04-27 13:52 ` Paolo Bonzini
2015-04-27 13:57 ` Martin Schwidefsky
2015-04-27 14:03 ` Alexander Graf
2015-04-27 14:08 ` Christian Borntraeger
2015-04-23 12:07 ` Paolo Bonzini
2015-04-23 13:57 ` Cole Robinson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5538E0B8.2070106@de.ibm.com \
--to=borntraeger@de.ibm.com \
--cc=agraf@suse.de \
--cc=cornelia.huck@de.ibm.com \
--cc=jfrei@linux.vnet.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.