From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH 2/2] KVM: Prevent internal slots from being COWed Date: Tue, 06 Jul 2010 17:53:43 +0300 Message-ID: <4C334377.7080506@redhat.com> References: <1277108293-9918-1-git-send-email-avi@redhat.com> <1277108293-9918-3-git-send-email-avi@redhat.com> <20100621202334.GA17362@amt.cnet> <4C209BD8.8030104@redhat.com> <20100706144519.GC16195@random.random> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm@vger.kernel.org, Glauber Costa To: Andrea Arcangeli Return-path: Received: from mx1.redhat.com ([209.132.183.28]:30352 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751147Ab0GFOxp (ORCPT ); Tue, 6 Jul 2010 10:53:45 -0400 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o66ErjJa020479 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 6 Jul 2010 10:53:45 -0400 Received: from cleopatra.tlv.redhat.com (cleopatra.tlv.redhat.com [10.35.255.11]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o66ErilW003664 for ; Tue, 6 Jul 2010 10:53:44 -0400 In-Reply-To: <20100706144519.GC16195@random.random> Sender: kvm-owner@vger.kernel.org List-ID: On 07/06/2010 05:45 PM, Andrea Arcangeli wrote: > >> Ouch, corrected and applied. >> > I think I tracked down the corruption during swapping with THP enabled > to this bug. The real bug is that the mmu notifier fires (it's not > like fork isn't covered by the mmu notifier) but KVM ignores it and > keeps writing to the old location. Shared pages can also be swapped > out and if the dirty bit on the spte isn't set faster than the time it > takes to write the page, the page can be relocated. Basically if > do_swap_page decides to make a copy of the page (like in ksm-swapin > case, erratically triggered now even for non-ksm pages in current > upstream by a bug in the new anon-vma code which I fixed already in > aa.git) and the dirty bit on the spte is ignored because of lumpy > reclaim (which also I removed now and that makes the bug stop > triggering too), eventually what happens is that the page is unmapped > and during swapin it is relocated to a different page. > > The bug really is in KVM that ignores the mmu_notifier_invalidate_page > and keeps using the old page. > Right. It's the same problem as O_DIRECT + fork() - kvm did a get_user_pages() long ago on this page, it got forked, and on the next write the mm duplicated the page and assigned qemu the new page, which kvm ignored. > It should have rang a bell that fork was breaking anything... fork > must not break anything since KVM is mmu notifier > capable. MADV_DONTFORK must only be a performance optimization > now. And the above change should be unnecessary (and I doubt the above > really fixes the swapping case as tmpfs can also be swapped out, at > least unless the page is pinned). > That particular page is pinned. > The way I'd like to fix it is to allocate those magic pages by hand > and not add them to lru and have page->mapping null. Then they will > remain pinned in the pte, and all problems will go away. > Yes, that's the correct solution. It shouldn't be a user page in the first place. Problem is that this is a very intrusive change. > The other way would be to have a lookup hashtable that when mmu > notifier invalidate fires, we lookup the hash and we call a method to > have kvm stop using the page. And then something is needed during the > page fault, if the gfn in the hash is paged-in another method is > called to set the magic host user address to point the new pfn. > > I think pinning the pages and allocating them by hand is simpler, > hopefully we can do it in a way that munmap will collect them > automatically like now. > I'd like to remove it completely from the memslot mechanism, unfortunately it may affect many paths. -- error compiling committee.c: too many arguments to function