All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@qumranet.com>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	Christian Ehrhardt <EHRHARDT@de.ibm.com>,
	hollisb@us.ibm.com, arnd@arndb.de, carsteno@de.ibm.com,
	heicars2@linux.vnet.ibm.com, jeroney@us.ibm.com,
	borntrae@linux.vnet.ibm.com,
	virtualization@lists.linux-foundation.org,
	Linux Memory Management List <linux-mm@kvack.org>,
	mschwid2@linux.vnet.ibm.com, rvdheij@gmail.com,
	Olaf Schnapper <os@de.ibm.com>,
	jblunck@suse.de, "Zhang,  Xiantao" <xiantao.zhang@intel.com>,
	kvm-devel@lists.sourceforge.net
Subject: Re: [RFC/PATCH 01/15] preparation: provide hook	to	enable pgstes in user pagetable
Date: Sun, 23 Mar 2008 12:15:22 +0200	[thread overview]
Message-ID: <47E62DBA.4050102@qumranet.com> (raw)
In-Reply-To: <20080322175705.GD6367@osiris.boeblingen.de.ibm.com>

Heiko Carstens wrote:
>> What you've done with dup_mm() is probably the brute-force way that I
>> would have done it had I just been trying to make a proof of concept or
>> something.  I'm worried that there are a bunch of corner cases that
>> haven't been considered.
>>
>> What if someone else is poking around with ptrace or something similar
>> and they bump the mm_users:
>>
>> +       if (tsk->mm->context.pgstes)
>> +               return 0;
>> +       if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 ||
>> +           tsk->mm != tsk->active_mm || tsk->mm->ioctx_list)
>> +               return -EINVAL;
>> -------->HERE
>> +       tsk->mm->context.pgstes = 1;    /* dirty little tricks .. */
>> +       mm = dup_mm(tsk);
>>
>> It'll race, possibly fault in some other pages, and those faults will be
>> lost during the dup_mm().  I think you need to be able to lock out all
>> of the users of access_process_vm() before you go and do this.  You also
>> need to make sure that anyone who has looked at task->mm doesn't go and
>> get a reference to it and get confused later when it isn't the task->mm
>> any more.
>>
>>     
>>> Therefore, we need to reallocate the page table after fork() 
>>> once we know that task is going to be a hypervisor. That's what this 
>>> code does: reallocate a bigger page table to accomondate the extra 
>>> information. The task needs to be single-threaded when calling for 
>>> extended page tables.
>>>
>>> Btw: at fork() time, we cannot tell whether or not the user's going to 
>>> be a hypervisor. Therefore we cannot do this in fork.
>>>       
>> Can you convert the page tables at a later time without doing a
>> wholesale replacement of the mm?  It should be a bit easier to keep
>> people off the pagetables than keep their grubby mitts off the mm
>> itself.
>>     
>
> Yes, as far as I can see you're right. And whatever we do in arch code,
> after all it's just a work around to avoid a new clone flag.
> If something like clone() with CLONE_KVM would be useful for more
> architectures than just s390 then maybe we should try to get a flag.
>
> Oh... there are just two unused clone flag bits left. Looks like the
> namespace changes ate up a lot of them lately.
>
> Well, we could still play dirty tricks like setting a bit in current
> via whatever mechanism which indicates child-wants-extended-page-tables
> and then just fork and be happy.
>   

How about taking mmap_sem for write and converting all page tables 
in-place?  I'd rather avoid the need to fork() when creating a VM.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@qumranet.com>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	Christian Ehrhardt <EHRHARDT@de.ibm.com>,
	hollisb@us.ibm.com, arnd@arndb.de,
	Linux Memory Management List <linux-mm@kvack.org>,
	carsteno@de.ibm.com, heicars2@linux.vnet.ibm.com,
	mschwid2@linux.vnet.ibm.com, jeroney@us.ibm.com,
	borntrae@linux.vnet.ibm.com,
	virtualization@lists.linux-foundation.org,
	kvm-devel@lists.sourceforge.net, rvdheij@gmail.com,
	Olaf Schnapper <os@de.ibm.com>,
	jblunck@suse.de, "Zhang, Xiantao" <xiantao.zhang@intel.com>
Subject: Re: [kvm-devel] [RFC/PATCH 01/15] preparation: provide hook	to	enable pgstes in user pagetable
Date: Sun, 23 Mar 2008 12:15:22 +0200	[thread overview]
Message-ID: <47E62DBA.4050102@qumranet.com> (raw)
In-Reply-To: <20080322175705.GD6367@osiris.boeblingen.de.ibm.com>

Heiko Carstens wrote:
>> What you've done with dup_mm() is probably the brute-force way that I
>> would have done it had I just been trying to make a proof of concept or
>> something.  I'm worried that there are a bunch of corner cases that
>> haven't been considered.
>>
>> What if someone else is poking around with ptrace or something similar
>> and they bump the mm_users:
>>
>> +       if (tsk->mm->context.pgstes)
>> +               return 0;
>> +       if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 ||
>> +           tsk->mm != tsk->active_mm || tsk->mm->ioctx_list)
>> +               return -EINVAL;
>> -------->HERE
>> +       tsk->mm->context.pgstes = 1;    /* dirty little tricks .. */
>> +       mm = dup_mm(tsk);
>>
>> It'll race, possibly fault in some other pages, and those faults will be
>> lost during the dup_mm().  I think you need to be able to lock out all
>> of the users of access_process_vm() before you go and do this.  You also
>> need to make sure that anyone who has looked at task->mm doesn't go and
>> get a reference to it and get confused later when it isn't the task->mm
>> any more.
>>
>>     
>>> Therefore, we need to reallocate the page table after fork() 
>>> once we know that task is going to be a hypervisor. That's what this 
>>> code does: reallocate a bigger page table to accomondate the extra 
>>> information. The task needs to be single-threaded when calling for 
>>> extended page tables.
>>>
>>> Btw: at fork() time, we cannot tell whether or not the user's going to 
>>> be a hypervisor. Therefore we cannot do this in fork.
>>>       
>> Can you convert the page tables at a later time without doing a
>> wholesale replacement of the mm?  It should be a bit easier to keep
>> people off the pagetables than keep their grubby mitts off the mm
>> itself.
>>     
>
> Yes, as far as I can see you're right. And whatever we do in arch code,
> after all it's just a work around to avoid a new clone flag.
> If something like clone() with CLONE_KVM would be useful for more
> architectures than just s390 then maybe we should try to get a flag.
>
> Oh... there are just two unused clone flag bits left. Looks like the
> namespace changes ate up a lot of them lately.
>
> Well, we could still play dirty tricks like setting a bit in current
> via whatever mechanism which indicates child-wants-extended-page-tables
> and then just fork and be happy.
>   

How about taking mmap_sem for write and converting all page tables 
in-place?  I'd rather avoid the need to fork() when creating a VM.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-03-23 10:15 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1206028710.6690.21.camel@cotte.boeblingen.de.ibm.com>
2008-03-20 16:24 ` [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable Carsten Otte
2008-03-20 16:24   ` Carsten Otte, Martin Schwidefsky
2008-03-20 17:28   ` Jeremy Fitzhardinge
2008-03-20 17:28   ` Jeremy Fitzhardinge
2008-03-20 17:28     ` Jeremy Fitzhardinge
2008-03-20 19:13     ` Dave Hansen
2008-03-20 19:13     ` Dave Hansen
2008-03-20 19:13       ` Dave Hansen
2008-03-20 20:35       ` Carsten Otte
2008-03-20 20:35         ` [kvm-devel] " Carsten Otte
2008-03-21 18:29         ` Dave Hansen
2008-03-21 18:29           ` Dave Hansen
2008-03-21 19:03           ` Carsten Otte
2008-03-21 19:03             ` Carsten Otte
2008-03-22 17:57           ` Heiko Carstens
2008-03-22 17:57             ` [kvm-devel] " Heiko Carstens
2008-03-23 10:15             ` Avi Kivity [this message]
2008-03-23 10:15               ` Avi Kivity
2008-03-23 18:23               ` Martin Schwidefsky
2008-03-23 18:23                 ` Martin Schwidefsky
2008-03-24  6:57                 ` Avi Kivity
2008-03-24  6:57                   ` [kvm-devel] " Avi Kivity
2008-03-25  6:08                   ` Carsten Otte
2008-03-25  6:08                     ` [kvm-devel] " Carsten Otte
2008-03-25  6:12                     ` Avi Kivity
2008-03-25  6:12                       ` [kvm-devel] " Avi Kivity
2008-03-25  6:12                     ` Avi Kivity
2008-03-25  6:08                   ` Carsten Otte
2008-03-24  6:57                 ` Avi Kivity
2008-03-23 10:15             ` Avi Kivity
2008-03-22 17:57           ` Heiko Carstens
2008-03-25 15:37           ` Carsten Otte
2008-03-25 15:37             ` [kvm-devel] " Carsten Otte
2008-03-25 15:37           ` Carsten Otte
2008-03-20 20:35       ` Carsten Otte
2008-03-20 16:24 ` Carsten Otte
2008-03-20 16:24 ` [RFC/PATCH 02/15] preparation: host memory management changes for s390 kvm Carsten Otte
2008-03-20 16:24 ` Carsten Otte
2008-03-20 16:24   ` Carsten Otte, Heiko Carstens, Christian Borntraeger
2008-03-20 16:24 ` [RFC/PATCH 03/15] preparation: address of the 64bit extint parm in lowcore Carsten Otte
2008-03-20 16:24 ` Carsten Otte
2008-03-20 16:24 ` [RFC/PATCH 04/15] preparation: split sysinfo defintions for kvm use Carsten Otte
2008-03-20 16:24 ` Carsten Otte
2008-03-20 16:24 ` [RFC/PATCH 05/15] kvm-s390: s390 arch backend for the kvm kernel module Carsten Otte
2008-03-20 16:43   ` [RFC/PATCH 05/15] KVM_MAX_VCPUS Hollis Blanchard
2008-03-20 16:48     ` Carsten Otte
2008-03-21 10:41       ` [kvm-devel] " Avi Kivity
2008-03-21 11:13         ` Carsten Otte
2008-03-21 11:13         ` Carsten Otte
2008-03-20 16:48     ` Carsten Otte
2008-03-20 16:43   ` Hollis Blanchard
2008-03-20 16:24 ` [RFC/PATCH 05/15] kvm-s390: s390 arch backend for the kvm kernel module Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 06/15] kvm-s390: sie intercept handling Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-21 10:53   ` [kvm-devel] " Avi Kivity
2008-03-21 11:26     ` Carsten Otte
2008-03-21 11:26     ` Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 07/15] kvm-s390: interrupt subsystem, cpu timer, waitpsw Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 08/15] kvm-s390: intercepts for privileged instructions Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 09/15] kvm-s390: interprocessor communication via sigp Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 10/15] kvm-s390: intercepts for diagnose instructions Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 11/15] kvm-s390: add kvm to kconfig on s390 Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 12/15] kvm-s390: API documentation Carsten Otte
2008-03-20 17:22   ` Randy Dunlap
2008-03-21 10:33     ` [kvm-devel] " Carsten Otte
2008-03-20 17:22   ` Randy Dunlap
2008-03-20 16:25 ` Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 13/15] kvm-s390: update maintainers Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-20 16:25 ` [RFC/PATCH 14/15] guest: detect when running on kvm Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-20 17:16   ` Randy Dunlap
2008-03-20 17:16   ` Randy Dunlap
2008-03-20 17:27     ` Carsten Otte
2008-03-20 17:27     ` Carsten Otte
2008-03-20 17:53   ` Christoph Hellwig
2008-03-20 20:37     ` Carsten Otte
2008-03-20 19:41       ` Christoph Hellwig
2008-03-20 20:59         ` Carsten Otte
2008-03-20 21:22           ` Heiko Carstens
2008-03-21 11:12             ` [kvm-devel] " Carsten Otte
2008-03-21 11:12             ` Carsten Otte
2008-03-21 14:06               ` Heiko Carstens
2008-03-21 14:33                 ` Carsten Otte
2008-03-22 17:25                   ` Heiko Carstens
2008-03-22 17:25                   ` [kvm-devel] " Heiko Carstens
2008-03-21 14:33                 ` Carsten Otte
2008-03-21 14:06               ` Heiko Carstens
2008-03-20 21:22           ` Heiko Carstens
2008-03-20 20:59         ` Carsten Otte
2008-03-20 19:41       ` Christoph Hellwig
2008-03-20 20:37     ` Carsten Otte
2008-03-20 17:53   ` Christoph Hellwig
2008-03-20 16:25 ` [RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls Carsten Otte
2008-03-20 16:25 ` Carsten Otte
2008-03-21  0:24   ` Rusty Russell
2008-03-21  0:24   ` Rusty Russell
2008-03-21  7:12     ` Carsten Otte
2008-03-21  7:12     ` [kvm-devel] " Carsten Otte
2008-03-21  8:15     ` Christian Borntraeger
2008-03-21 23:30       ` Rusty Russell
2008-03-21 23:30       ` Rusty Russell
2008-03-22  7:36         ` Carsten Otte
2008-03-22  7:36         ` Carsten Otte
2008-03-21  8:15     ` Christian Borntraeger
2008-03-21 10:44   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47E62DBA.4050102@qumranet.com \
    --to=avi@qumranet.com \
    --cc=EHRHARDT@de.ibm.com \
    --cc=arnd@arndb.de \
    --cc=borntrae@linux.vnet.ibm.com \
    --cc=carsteno@de.ibm.com \
    --cc=heicars2@linux.vnet.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hollisb@us.ibm.com \
    --cc=jblunck@suse.de \
    --cc=jeremy@goop.org \
    --cc=jeroney@us.ibm.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-mm@kvack.org \
    --cc=mschwid2@linux.vnet.ibm.com \
    --cc=os@de.ibm.com \
    --cc=rvdheij@gmail.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xiantao.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.