From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [PATCH] Handle vma regions with no backing page Date: Wed, 04 Jun 2008 14:41:20 -0500 Message-ID: <4846EFE0.5090304@codemonkey.ws> References: <20080603113937.GE8158@duo.random> <1212592164.26322.10.camel@lnx-benami> <20080604161755.GC7089@il.ibm.com> <20080604193402.GO21613@duo.random> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Muli Ben-Yehuda , Ben-Ami Yassour1 , Han Weidong , "Kay, Allen M" , Amit Shah , kvm@vger.kernel.org, avi@qumranet.com, Dave Hansen To: Andrea Arcangeli Return-path: Received: from an-out-0708.google.com ([209.85.132.247]:51617 "EHLO an-out-0708.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758681AbYFDTlg (ORCPT ); Wed, 4 Jun 2008 15:41:36 -0400 Received: by an-out-0708.google.com with SMTP id d40so59006and.103 for ; Wed, 04 Jun 2008 12:41:35 -0700 (PDT) In-Reply-To: <20080604193402.GO21613@duo.random> Sender: kvm-owner@vger.kernel.org List-ID: Andrea Arcangeli wrote: > On Wed, Jun 04, 2008 at 07:17:55PM +0300, Muli Ben-Yehuda wrote: > >> On Wed, Jun 04, 2008 at 06:09:24PM +0300, Ben-Ami Yassour1 wrote: >> >> >>>>> We noticed that pfn_valid does not always works as expected by >>>>> this patch to indicate that a pfn has a backing page. We have >>>>> seen a case where CONFIG_NUMA was not set and then where >>>>> pfn_valid returned 1 for an mmio pfn. We then changed the >>>>> config file with CONFIG_NUMA set and it worked fine as expected >>>>> (since a different implementation of pfn_valid was used). >>>>> >>>>> How should we overcome this issue? >>>>> >>>> There's a page_is_ram() too, but that's the e820 map check and it >>>> means it's RAM not that there's a page backing store. Certainly if >>>> it's not ram we should go ahead with just the pfn but it'd be a >>>> workaround. >>>> >>>> I really think it'd be better off to fix pfn_valid to work for >>>> NUMA. >>>> >>> It does work for NUMA, it does not work without the NUMA option. >>> >> Andrea, how would you suggest to fix pfn_valid for the CONFIG_NUMA >> disabled case? >> > > I'm very surprised that this is broken for non-numa. > > To be sure I understand, what exactly means that "pfn has not a > backing page"? > The pfn does have a backing page. When using CONFIG_FLATMEM, pfn_valid() is simply: #ifdef CONFIG_FLATMEM #define pfn_valid(pfn) ((pfn) < end_pfn) #endif And this is true, pfn_valid() just indicates whether there is a space in mem_map[], and there certainly is. Note this only happens when there is a valid PFN that's greater than the PCI memory (using 4GB+ of memory). > I think we must fix pfn_valid, so that when it returns true, > pfn_to_page returns an entry that is contained inside some mem_map > array allocated by mm/page_alloc.c. pfn_valid is totally right to > return true on mmio regions, as long as a 'struct page' exists. Like > for example the 640k-1M area. It'd be a waste to pay the price of a > discontiguous array to save a few struct pages (or perhaps these days > they're inefficiently using 4k tlb entries to map struct page dunno, I > surely prefer to waste a few struct pages and stay with 2M tlb). > > As long as a 'struct page' exists and pfn_to_page returns a 'struct > page' checking PageReserved() should be enough to know if it's > pageable RAM owned by the VM or an mmio region. I don't know why but > when I did reserved-ram patch, the PageReserved check wasn't returning > true on the reserved-ram. Perhaps it's because it was ram, dunno, so I > had to use the page_count() hack. But for mmio !pfn_valid || > PageReserved should work. Dave mentioned that SetPageReserved() doesn't necessarily get called for zones with bad alignment. Regards, Anthony Liguori > Said that this mem_map code tends to change > all the time for whatever reason, so I wouldn't be shocked if > PageReserved doesn't actually work either. But mixing pfn_valid and > PageReserved sounds the right objective. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >