From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:43502)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1UIYik-0002iW-2Z
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 02:11:29 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1UIYih-0004EP-L2
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 02:11:25 -0400
Received: from mx1.redhat.com ([209.132.183.28]:58609)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1UIYih-0004ED-Ct
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 02:11:23 -0400
Date: Thu, 21 Mar 2013 08:11:59 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20130321061159.GA28328@redhat.com>
References: <20130318212646.GB20406@redhat.com>
	<5147A209.80202@linux.vnet.ibm.com>
	<20130319081939.GC11259@redhat.com>
	<51487F68.2060305@linux.vnet.ibm.com>
	<20130319151606.GA13649@redhat.com>
	<51488521.4010909@linux.vnet.ibm.com>
	<20130319153658.GA14317@redhat.com>
	<51489BC3.3030504@linux.vnet.ibm.com> <51489D05.2000400@redhat.com>
	<5148A52E.6020208@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5148A52E.6020208@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose
 documentation of the RDMA transport
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, Paolo Bonzini <pbonzini@redhat.com>

On Tue, Mar 19, 2013 at 01:49:34PM -0400, Michael R. Hines wrote:
> I also did a test using RDMA + cgroup, and the kernel killed my QEMU :)
> 
> So, infiniband is not smart enough to know how to avoid pinning a
> zero page, I guess.
> 
> - Michael
> 
> On 03/19/2013 01:14 PM, Paolo Bonzini wrote:
> >Il 19/03/2013 18:09, Michael R. Hines ha scritto:
> >>Allowing QEMU to swap due to a cgroup limit during migration is a viable
> >>overcommit option?
> >>
> >>I'm trying to keep an open mind, but that would kill the migration
> >>time.....
> >Would it swap?  Doesn't the kernel back all zero pages with a single
> >copy-on-write page?  If that still accounts towards cgroup limits, it
> >would be a bug.
> >
> >Old kernels do not have a shared zero hugepage, and that includes some
> >distro kernels.  Perhaps that's the problem.
> >
> >Paolo
> >

I really shouldn't break COW if you don't request LOCAL_WRITE.
I think it's a kernel bug, and apparently has been there in the code since the
first version: get_user_pages parameters swapped.

I'll send a patch. If it's applied, you should also
change your code from

+                                IBV_ACCESS_LOCAL_WRITE |
+                                IBV_ACCESS_REMOTE_WRITE |
+                                IBV_ACCESS_REMOTE_READ);

to

+                                IBV_ACCESS_REMOTE_READ);

on send side.
Then, each time we detect a page has changed we must make sure to
unregister and re-register it. Or if you want to be very
smart, check that the PFN didn't change and reregister
if it did.

This will make overcommit work.

-- 
MST