From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Re: [Xen-users] rebased openSUSE Xen dom0 Patches Date: Tue, 20 Apr 2010 15:01:48 -0400 Message-ID: <20100420190148.GF32720@phenom.dumpdata.com> References: <4BC834C4020000780003A947@vpn.id2.novell.com> <4BCC335C020000780003ACD0@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Simon Graham Cc: Andrew Lyon , xen-devel@lists.xensource.com, Jan Beulich List-Id: xen-devel@lists.xenproject.org On Tue, Apr 20, 2010 at 11:07:54AM -0500, Simon Graham wrote: > > > > > > But that code is precisely what guarantees that the pages *can* be > > > converted to page table pages (by completely unmapping them from > > > the kernel image part of the address space). So your explanation is > > > rather confusing than clarifying to me... > > > > I agree that that is the intent of this code -- what we _seem_ to > > observe (and this > > is hard to prove) is that the page type ref count is not being > > decremented by this > > code which would imply that the unmapping is not happening for some > > reason. The only > > real evidence I have for this is that the failure always occurs on one > > of these pages. > > > > We now think we've found the problem which seems to be due to the > following two calls in Linux within mark_rodata_ro(): > > free_init_pages("unused kernel memory", > (unsigned long) > page_address(virt_to_page(text_end)), > (unsigned long) > page_address(virt_to_page(rodata_start))); > free_init_pages("unused kernel memory", > (unsigned long) > page_address(virt_to_page(rodata_end)), > (unsigned long) > page_address(virt_to_page(data_start))); > > The first of these calls is trying to free the range > page_address(virt_to_page(text_end)) through > page_address(virt_to_page(rodata_start)). > > With text_end == 0xffffffff80610000 and rodata_start == > 0xffffffff80800000 the actual values received by free_init_pages() are > 0xffff880000610000 and 0xffff880000800000 (i.e. within the 64-bit direct > mapping region). > > In free_init_pages() there is a test of addr >= __start_kernel_map > (which is 0xffffffff80000000). Because of this test, the two calls to > HYPERVISOR_update_va_mapping() are not made. > > The net effect (we believe) is that this range of pages is freed from > Linux's viewpoint but the pages are still marked as PGT_writable_page > with a non-zero page type ref count in the hypervisor. When Linux tries > to use these pages later on for page table pages, the hypervisor traps. > > Note, we have traced all uses of the pages in question. Apparently they > are never used by Linux prior to the trap. Our traces show them being > initialized in the hypervisor by construct_dom0(), marked as readonly in > Linux by mark_rodata_ro() and then causing the hypervisor trap when > Linux tries to use one them for a page tables. Oh man, I remember this one. I submitted an initial patch for this. https://patchwork.kernel.org/patch/79086/ > > Presumably the correct fix will be to change the address range test in > free_init_pages... And this was the final fix: http://marc.info/?l=linux-kernel&m=126652277705569&w=2 The end result was that the a different mechanism to get the kernel address and use that to set the _PAGE_RW on them. And ignore the other mapping. I think, this has been some time ago.