From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37219) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aqRlD-0001qo-Ag for qemu-devel@nongnu.org; Wed, 13 Apr 2016 16:51:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aqRlA-0004He-37 for qemu-devel@nongnu.org; Wed, 13 Apr 2016 16:51:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38246) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aqRl9-0004HG-Uv for qemu-devel@nongnu.org; Wed, 13 Apr 2016 16:51:36 -0400 Date: Wed, 13 Apr 2016 16:51:32 -0400 From: Andrea Arcangeli Message-ID: <20160413205132.GG26364@redhat.com> References: <20160412175501.GB6415@work-vm> <20160413080545.GA2270@work-vm> <20160413114103.GB2270@work-vm> <20160413125053.GC2270@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160413125053.GC2270@work-vm> Subject: Re: [Qemu-devel] post-copy is broken? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: "Li, Liang Z" , Amit Shah , "qemu-devel@nongnu.org" , "quintela@redhat.com" On Wed, Apr 13, 2016 at 01:50:53PM +0100, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > > > + if ( ((b + 1) % 255) == last_byte && !hit_edge) { > > Ahem, that should be 256. > > I'm going to bisect the kernel and see where we get to. > Andrea's userfaultfd self-test passes on 2.5, so it's something more > subtle. > David already tracked down 1df59b8497f47495e873c23abd6d3d290c730505 good and 984065055e6e39f8dd812529e11922374bd39352 bad. git diff 1df59b8497f47495e873c23abd6d3d290c730505..984065055e6e39f8dd812529e11922374bd39352 fs/userfaultfd.c mm/userfaultfd.c Nothing that could break it in the diff of the relevant two files. The only other userfault related change in this commit range that comes to mind is in fixup_user_fault, but if that was buggy you don't userfault into futexes with postcopy so you couldn't notice, so the only other user of that is s390. The next suspect is the massive THP refcounting change that went upstream recently: mm/filemap.c | 34 +- mm/slab.c | 48 +- mm/hugetlb.c | 51 +- mm/util.c | 55 +- mm/vmscan.c | 56 +- mm/swapfile.c | 57 +- mm/internal.h | 70 +- mm/memblock.c | 73 +- mm/mempolicy.c | 75 ++- mm/sparse-vmemmap.c | 76 ++- mm/vmpressure.c | 78 ++- mm/vmstat.c | 86 ++- mm/ksm.c | 89 +-- mm/mmap.c | 106 +-- mm/memory_hotplug.c | 107 ++- mm/memory-failure.c | 125 ++-- mm/memory.c | 148 ++-- mm/gup.c | 172 +++-- mm/madvise.c | 201 ++++++ mm/page_alloc.c | 205 +++--- mm/shmem.c | 289 ++++---- mm/swap.c | 319 ++------- mm/rmap.c | 387 +++++++---- mm/memcontrol.c | 478 +++++++------ mm/huge_memory.c | 1814 ++++++++++++++++++++++++++++++++------------------ As further debug hint, can you try to disable THP and see if that makes the problem go away? Thanks, Andrea