From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:46051)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1RgFWR-0008AW-35
	for qemu-devel@nongnu.org; Thu, 29 Dec 2011 07:55:52 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1RgFWP-0002Uv-Py
	for qemu-devel@nongnu.org; Thu, 29 Dec 2011 07:55:51 -0500
Received: from mx1.redhat.com ([209.132.183.28]:38525)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1RgFWP-0002Up-JB
	for qemu-devel@nongnu.org; Thu, 29 Dec 2011 07:55:49 -0500
Message-ID: <4EFC634E.10406@redhat.com>
Date: Thu, 29 Dec 2011 14:55:42 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <cover.1325055065.git.yamahata@valinux.co.jp>
	<4EFC4DF0.2040708@redhat.com>
	<20111229123922.GG19274@valinux.co.jp>
In-Reply-To: <20111229123922.GG19274@valinux.co.jp>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 0/2][RFC] postcopy migration: Linux char
	device for postcopy
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Andrea Arcangeli <aarcange@redhat.com>, t.hirofuchi@aist.go.jp, qemu-devel@nongnu.org, kvm@vger.kernel.org, satoshi.itoh@aist.go.jp

On 12/29/2011 02:39 PM, Isaku Yamahata wrote:
> > > ioctl commands:
> > >
> > > UMEM_DEV_CRATE_UMEM: create umem device for qemu
> > > UMEM_DEV_LIST: list created umem devices
> > > UMEM_DEV_REATTACH: re-attach the created umem device
> > > 		  UMEM_DEV_LIST and UMEM_DEV_REATTACH are used when
> > > 		  the process that services page fault disappears or get stack.
> > > 		  Then, administrator can list the umem devices and unblock
> > > 		  the process which is waiting for page.
> > 
> > Ah, I asked about this in my patch comments.  I think this is done
> > better by using SCM_RIGHTS to pass fds along, or asking qemu to launch a
> > new process.
>
> Can you please elaborate? I think those ways you are suggesting doesn't solve
> the issue. Let me clarify the problem.
>
>   process A (typically incoming qemu)
>      |
>      | mmap("/dev/umem") and access those pages triggering page faults
>      | (the file descriptor might be closed after mmap() before page faults)
>      |
>      V
>    /dev/umem
>      ^
>      |
>      |
>    daemon X resolving page faults triggered by process A
>    (typically this daemon forked from incoming qemu:process A)
>
> If daemon X disappears accidentally, there is no one that resolves
> page faults of process A. At this moment process A is blocked due to page
> fault. There is no file descriptor available corresponding to the VMA.
> Here there is no way to kill process A, but system reboot.

qemu can have an extra thread that wait4()s the daemon, and relaunch
it.  This extra thread would not be blocked by the page fault.  It can
keep the fd so it isn't lost.

The unkillability of process A is a security issue; it could be done on
purpose.  Is it possible to change umem to sleep with
TASK_INTERRUPTIBLE, so it can be killed?

> > Introducing a global namespace has a lot of complications attached.
> > 
> > >
> > > UMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process
> > > UMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source
> > >                        for daemon
> > >
> > > UMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process
> > > 			 This is _NOT_ implemented yet.
> > >                          anonymous I'm not sure whether this can be implemented
> > >                          or not.
> > 
> > How do we find out?  This is fairly important, stuff like transparent
> > hugepages and ksm only works on anonymous memory.
>
> I agree that this is important.
> At KVM-forum 2011, Andrea said THP and KSM works with non-anonymous VMA.
> (Or at lease he'll look into those stuff. My memory is vague, though.
>  Please correct me if I'm wrong)

+= Andrea (who can also provide feedback on umem in general)

-- 
error compiling committee.c: too many arguments to function