public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Handle vma regions with no backing page
@ 2008-04-29 14:32 Anthony Liguori
  2008-04-29 14:54 ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: Anthony Liguori @ 2008-04-29 14:32 UTC (permalink / raw)
  To: kvm-devel; +Cc: Anthony Liguori, Andrea Arcangeli, Ben-Ami Yassour, Avi Kivity

This patch allows VMA's that contain no backing page to be used for guest
memory.  This is a drop-in replacement for Ben-Ami's first page in his direct
mmio series.  Here, we continue to allow mmio pages to be represented in the
rmap.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f095b73..11b26f5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -531,6 +531,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 	struct page *page[1];
 	unsigned long addr;
 	int npages;
+	pfn_t pfn;
 
 	might_sleep();
 
@@ -543,19 +544,36 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 	npages = get_user_pages(current, current->mm, addr, 1, 1, 1, page,
 				NULL);
 
-	if (npages != 1) {
-		get_page(bad_page);
-		return page_to_pfn(bad_page);
-	}
+	if (unlikely(npages != 1)) {
+		struct vm_area_struct *vma;
+
+		vma = find_vma(current->mm, addr);
+		if (vma == NULL) {
+			get_page(bad_page);
+			return page_to_pfn(bad_page);
+		}
+
+		BUG_ON(!(vma->vm_flags & VM_IO));
 
-	return page_to_pfn(page[0]);
+		pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+		BUG_ON(pfn_valid(pfn));
+	} else
+		pfn = page_to_pfn(page[0]);
+
+	return pfn;
 }
 
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	return pfn_to_page(gfn_to_pfn(kvm, gfn));
+	pfn_t pfn;
+
+	pfn = gfn_to_pfn(kvm, gfn);
+	if (pfn_valid(pfn))
+		return pfn_to_page(pfn);
+
+	return NULL;
 }
 
 EXPORT_SYMBOL_GPL(gfn_to_page);
@@ -568,7 +586,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
 
 void kvm_release_pfn_clean(pfn_t pfn)
 {
-	put_page(pfn_to_page(pfn));
+	if (pfn_valid(pfn))
+		put_page(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
 
@@ -593,21 +612,25 @@ EXPORT_SYMBOL_GPL(kvm_set_page_dirty);
 
 void kvm_set_pfn_dirty(pfn_t pfn)
 {
-	struct page *page = pfn_to_page(pfn);
-	if (!PageReserved(page))
-		SetPageDirty(page);
+	if (pfn_valid(pfn)) {
+		struct page *page = pfn_to_page(pfn);
+		if (!PageReserved(page))
+			SetPageDirty(page);
+	}
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
 
 void kvm_set_pfn_accessed(pfn_t pfn)
 {
-	mark_page_accessed(pfn_to_page(pfn));
+	if (pfn_valid(pfn))
+		mark_page_accessed(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
 void kvm_get_pfn(pfn_t pfn)
 {
-	get_page(pfn_to_page(pfn));
+	if (pfn_valid(pfn))
+		get_page(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_get_pfn);
 

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-04-29 14:32 Anthony Liguori
@ 2008-04-29 14:54 ` Andrea Arcangeli
  2008-04-29 15:14   ` Anthony Liguori
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2008-04-29 14:54 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel, Ben-Ami Yassour, Avi Kivity

On Tue, Apr 29, 2008 at 09:32:09AM -0500, Anthony Liguori wrote:
> +		vma = find_vma(current->mm, addr);
> +		if (vma == NULL) {
> +			get_page(bad_page);
> +			return page_to_pfn(bad_page);
> +		}

Here you must check vm_start address, find_vma only checks addr <
vm_end but there's no guarantee addr >= vm_start yet.

> +
> +		BUG_ON(!(vma->vm_flags & VM_IO));

For consistency we should return bad_page and not bug on, VM_IO and
VM_PFNMAP can theoretically not be set at the same time, otherwise
get_user_pages would be buggy checking against VM_PFNMAP|VM_IO. I
doubt anybody isn't setting VM_IO before calling remap_pfn_range but
anyway...

Secondly the really correct check is against VM_PFNMAP. This is
because PFNMAP is set at the same time of vm_pgoff = pfn. VM_IO is not
even if in theory if a driver uses ->fault instead of remap_pfn_range,
shouldn't set VM_IO and it should only set VM_RESERVED. VM_IO is about
keeping gdb/coredump out as they could mess with the hardware if they
read, PFNMAP is about remap_pfn_range having been called and pgoff
pointing to the first pfn mapped at vm_start address.

Patch is in the right direction, way to go!

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-04-29 14:54 ` Andrea Arcangeli
@ 2008-04-29 15:14   ` Anthony Liguori
  0 siblings, 0 replies; 13+ messages in thread
From: Anthony Liguori @ 2008-04-29 15:14 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: kvm-devel, Ben-Ami Yassour, Avi Kivity

Andrea Arcangeli wrote:
> On Tue, Apr 29, 2008 at 09:32:09AM -0500, Anthony Liguori wrote:
>   
>> +		vma = find_vma(current->mm, addr);
>> +		if (vma == NULL) {
>> +			get_page(bad_page);
>> +			return page_to_pfn(bad_page);
>> +		}
>>     
>
> Here you must check vm_start address, find_vma only checks addr <
> vm_end but there's no guarantee addr >= vm_start yet.
>   

Indeed.

>> +
>> +		BUG_ON(!(vma->vm_flags & VM_IO));
>>     
>
> For consistency we should return bad_page and not bug on, VM_IO and
> VM_PFNMAP can theoretically not be set at the same time, otherwise
> get_user_pages would be buggy checking against VM_PFNMAP|VM_IO. I
> doubt anybody isn't setting VM_IO before calling remap_pfn_range but
> anyway...
>
> Secondly the really correct check is against VM_PFNMAP. This is
> because PFNMAP is set at the same time of vm_pgoff = pfn. VM_IO is not
> even if in theory if a driver uses ->fault instead of remap_pfn_range,
> shouldn't set VM_IO and it should only set VM_RESERVED. VM_IO is about
> keeping gdb/coredump out as they could mess with the hardware if they
> read, PFNMAP is about remap_pfn_range having been called and pgoff
> pointing to the first pfn mapped at vm_start address.
>   

Good point.  I've updated the patch.  Will send out again once I've 
gotten to test it.

Regards,

Anthony Liguori

> Patch is in the right direction, way to go!
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
@ 2008-06-03 11:17 Ben-Ami Yassour
  2008-06-03 11:39 ` Andrea Arcangeli
  2008-06-04  9:48 ` Avi Kivity
  0 siblings, 2 replies; 13+ messages in thread
From: Ben-Ami Yassour @ 2008-06-03 11:17 UTC (permalink / raw)
  To: aliguori
  Cc: Han Weidong, Kay, Allen M, Muli Ben-Yehuda, Amit Shah, andrea,
	kvm, avi

Anthony Liguori <aliguori@us.ibm.com> wrote on 04/29/2008 05:32:09 PM:

> Subject
>
> [PATCH] Handle vma regions with no backing page
>
> This patch allows VMA's that contain no backing page to be used for guest
> memory.  This is a drop-in replacement for Ben-Ami's first page in his direct
> mmio series.  Here, we continue to allow mmio pages to be represented in the
> rmap.

>  struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
>  {
> -   return pfn_to_page(gfn_to_pfn(kvm, gfn));
> +   pfn_t pfn;
> +
> +   pfn = gfn_to_pfn(kvm, gfn);
> +   if (pfn_valid(pfn))
> +      return pfn_to_page(pfn);
> +
> +   return NULL;
>  }

We noticed that pfn_valid does not always works as expected by this patch 
to indicate that a pfn has a backing page.
We have seen a case where CONFIG_NUMA was not set and then where pfn_valid 
returned 1 for an mmio pfn.
We then changed the config file with CONFIG_NUMA set and it worked fine as 
expected (since a different implementation of pfn_valid was used).

How should we overcome this issue?

Thanks,
Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-03 11:17 [PATCH] Handle vma regions with no backing page Ben-Ami Yassour
@ 2008-06-03 11:39 ` Andrea Arcangeli
  2008-06-04 15:09   ` Ben-Ami Yassour
  2008-06-04  9:48 ` Avi Kivity
  1 sibling, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2008-06-03 11:39 UTC (permalink / raw)
  To: Ben-Ami Yassour
  Cc: aliguori, Han Weidong, Kay, Allen M, Muli Ben-Yehuda, Amit Shah,
	kvm, avi

On Tue, Jun 03, 2008 at 02:17:55PM +0300, Ben-Ami Yassour wrote:
> Anthony Liguori <aliguori@us.ibm.com> wrote on 04/29/2008 05:32:09 PM:
>
>> Subject
>>
>> [PATCH] Handle vma regions with no backing page
>>
>> This patch allows VMA's that contain no backing page to be used for guest
>> memory.  This is a drop-in replacement for Ben-Ami's first page in his 
>> direct
>> mmio series.  Here, we continue to allow mmio pages to be represented in 
>> the
>> rmap.
>
>>  struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
>>  {
>> -   return pfn_to_page(gfn_to_pfn(kvm, gfn));
>> +   pfn_t pfn;
>> +
>> +   pfn = gfn_to_pfn(kvm, gfn);
>> +   if (pfn_valid(pfn))
>> +      return pfn_to_page(pfn);
>> +
>> +   return NULL;
>>  }
>
> We noticed that pfn_valid does not always works as expected by this patch 
> to indicate that a pfn has a backing page.
> We have seen a case where CONFIG_NUMA was not set and then where pfn_valid 
> returned 1 for an mmio pfn.
> We then changed the config file with CONFIG_NUMA set and it worked fine as 
> expected (since a different implementation of pfn_valid was used).
>
> How should we overcome this issue?

There's a page_is_ram() too, but that's the e820 map check and it
means it's RAM not that there's a page backing store. Certainly if
it's not ram we should go ahead with just the pfn but it'd be a
workaround.

I really think it'd be better off to fix pfn_valid to work for NUMA. I
can't see how pfn_valid can be ok to return true when there's no
backing page... Probably pfn_valid was used for debugging todate, but
if you check vm_normal_page you'll see that it is not used just for
debugging and it seems VM_MIXEDMAP will break as much as KVM.

I can't see how VM_MIXEDMAP can be sane doing pfn_to_page(pfn) and
pretending this is a normal page, when there's no 'struct page'
backing the pfn.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-03 11:17 [PATCH] Handle vma regions with no backing page Ben-Ami Yassour
  2008-06-03 11:39 ` Andrea Arcangeli
@ 2008-06-04  9:48 ` Avi Kivity
  2008-06-04 16:48   ` Anthony Liguori
  1 sibling, 1 reply; 13+ messages in thread
From: Avi Kivity @ 2008-06-04  9:48 UTC (permalink / raw)
  To: Ben-Ami Yassour
  Cc: aliguori, Han Weidong, Kay, Allen M, Muli Ben-Yehuda, Amit Shah,
	andrea, kvm

Ben-Ami Yassour wrote:
> Anthony Liguori <aliguori@us.ibm.com> wrote on 04/29/2008 05:32:09 PM:
>
>> Subject
>>
>> [PATCH] Handle vma regions with no backing page
>>
>> This patch allows VMA's that contain no backing page to be used for 
>> guest
>> memory.  This is a drop-in replacement for Ben-Ami's first page in 
>> his direct
>> mmio series.  Here, we continue to allow mmio pages to be represented 
>> in the
>> rmap.
>
>>  struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
>>  {
>> -   return pfn_to_page(gfn_to_pfn(kvm, gfn));
>> +   pfn_t pfn;
>> +
>> +   pfn = gfn_to_pfn(kvm, gfn);
>> +   if (pfn_valid(pfn))
>> +      return pfn_to_page(pfn);
>> +
>> +   return NULL;
>>  }
>
> We noticed that pfn_valid does not always works as expected by this 
> patch to indicate that a pfn has a backing page.
> We have seen a case where CONFIG_NUMA was not set and then where 
> pfn_valid returned 1 for an mmio pfn.
> We then changed the config file with CONFIG_NUMA set and it worked 
> fine as expected (since a different implementation of pfn_valid was 
> used).
>
> How should we overcome this issue?
>

Looks like we need to reintroduce a refcount bit in the pte, and check 
the page using the VMA.

Nick Piggin's lockless pagecache patches, which have the same issue, 
also introduce a pte_special bit.  We could follow a similar route.

http://www.mail-archive.com/linux-arch@vger.kernel.org/msg04789.html

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-03 11:39 ` Andrea Arcangeli
@ 2008-06-04 15:09   ` Ben-Ami Yassour
  2008-06-04 16:17     ` Muli Ben-Yehuda
  0 siblings, 1 reply; 13+ messages in thread
From: Ben-Ami Yassour @ 2008-06-04 15:09 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: aliguori, Han Weidong, Kay, Allen M, Muli Ben-Yehuda, Amit Shah,
	kvm, avi

On Tue, 2008-06-03 at 13:39 +0200, Andrea Arcangeli wrote:
> On Tue, Jun 03, 2008 at 02:17:55PM +0300, Ben-Ami Yassour wrote:
> > Anthony Liguori <aliguori@us.ibm.com> wrote on 04/29/2008 05:32:09 PM:
> >
> >> Subject
> >>
> >> [PATCH] Handle vma regions with no backing page
> >>
> >> This patch allows VMA's that contain no backing page to be used for guest
> >> memory.  This is a drop-in replacement for Ben-Ami's first page in his 
> >> direct
> >> mmio series.  Here, we continue to allow mmio pages to be represented in 
> >> the
> >> rmap.
> >
> >>  struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
> >>  {
> >> -   return pfn_to_page(gfn_to_pfn(kvm, gfn));
> >> +   pfn_t pfn;
> >> +
> >> +   pfn = gfn_to_pfn(kvm, gfn);
> >> +   if (pfn_valid(pfn))
> >> +      return pfn_to_page(pfn);
> >> +
> >> +   return NULL;
> >>  }
> >
> > We noticed that pfn_valid does not always works as expected by this patch 
> > to indicate that a pfn has a backing page.
> > We have seen a case where CONFIG_NUMA was not set and then where pfn_valid 
> > returned 1 for an mmio pfn.
> > We then changed the config file with CONFIG_NUMA set and it worked fine as 
> > expected (since a different implementation of pfn_valid was used).
> >
> > How should we overcome this issue?
> 
> There's a page_is_ram() too, but that's the e820 map check and it
> means it's RAM not that there's a page backing store. Certainly if
> it's not ram we should go ahead with just the pfn but it'd be a
> workaround.
> 
> I really think it'd be better off to fix pfn_valid to work for NUMA.

It does work for NUMA, it does not work without the NUMA option.

>  I
> can't see how pfn_valid can be ok to return true when there's no
> backing page... Probably pfn_valid was used for debugging todate, but
> if you check vm_normal_page you'll see that it is not used just for
> debugging and it seems VM_MIXEDMAP will break as much as KVM.
> 
> I can't see how VM_MIXEDMAP can be sane doing pfn_to_page(pfn) and
> pretending this is a normal page, when there's no 'struct page'
> backing the pfn.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-04 15:09   ` Ben-Ami Yassour
@ 2008-06-04 16:17     ` Muli Ben-Yehuda
  2008-06-04 19:34       ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: Muli Ben-Yehuda @ 2008-06-04 16:17 UTC (permalink / raw)
  To: Ben-Ami Yassour1
  Cc: Andrea Arcangeli, aliguori, Han Weidong, Kay, Allen M, Amit Shah,
	kvm, avi

On Wed, Jun 04, 2008 at 06:09:24PM +0300, Ben-Ami Yassour1 wrote:

> > > We noticed that pfn_valid does not always works as expected by
> > > this patch to indicate that a pfn has a backing page.  We have
> > > seen a case where CONFIG_NUMA was not set and then where
> > > pfn_valid returned 1 for an mmio pfn.  We then changed the
> > > config file with CONFIG_NUMA set and it worked fine as expected
> > > (since a different implementation of pfn_valid was used).
> > >
> > > How should we overcome this issue?
> > 
> > There's a page_is_ram() too, but that's the e820 map check and it
> > means it's RAM not that there's a page backing store. Certainly if
> > it's not ram we should go ahead with just the pfn but it'd be a
> > workaround.
> > 
> > I really think it'd be better off to fix pfn_valid to work for
> > NUMA.
> 
> It does work for NUMA, it does not work without the NUMA option.

Andrea, how would you suggest to fix pfn_valid for the CONFIG_NUMA
disabled case?

Cheers,
Muli

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-04  9:48 ` Avi Kivity
@ 2008-06-04 16:48   ` Anthony Liguori
  0 siblings, 0 replies; 13+ messages in thread
From: Anthony Liguori @ 2008-06-04 16:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ben-Ami Yassour, Han Weidong, Kay, Allen M, Muli Ben-Yehuda,
	Amit Shah, andrea, kvm, Dave Hansen

Avi Kivity wrote:
>
> Looks like we need to reintroduce a refcount bit in the pte, and check 
> the page using the VMA.

I don't think mucking with the VMA is going to help.  We're already 
using the VMA to determine that the region is MMIO.  What we need to be 
able to do is figure out, given a PFN, if that PFN is an MMIO page or 
not.  Really what we're looking for is whether we have to release a 
reference to the page.

I think it would be sufficient to change kvm_release_pfn_clean() to 
something like:

void kvm_release_pfn_clean(pfn_t pfn)
{
    struct page *page;

    if (!pfn_valid(pfn))
        return;

    page = pfn_to_page(pfn);
    if (paeg_count(page))
        put_page(page);
}


A couple other places need updating (like kvm_set_pfn_dirty()), but I 
think the general idea would work.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-04 16:17     ` Muli Ben-Yehuda
@ 2008-06-04 19:34       ` Andrea Arcangeli
  2008-06-04 19:41         ` Anthony Liguori
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2008-06-04 19:34 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Ben-Ami Yassour1, aliguori, Han Weidong, Kay, Allen M, Amit Shah,
	kvm, avi

On Wed, Jun 04, 2008 at 07:17:55PM +0300, Muli Ben-Yehuda wrote:
> On Wed, Jun 04, 2008 at 06:09:24PM +0300, Ben-Ami Yassour1 wrote:
> 
> > > > We noticed that pfn_valid does not always works as expected by
> > > > this patch to indicate that a pfn has a backing page.  We have
> > > > seen a case where CONFIG_NUMA was not set and then where
> > > > pfn_valid returned 1 for an mmio pfn.  We then changed the
> > > > config file with CONFIG_NUMA set and it worked fine as expected
> > > > (since a different implementation of pfn_valid was used).
> > > >
> > > > How should we overcome this issue?
> > > 
> > > There's a page_is_ram() too, but that's the e820 map check and it
> > > means it's RAM not that there's a page backing store. Certainly if
> > > it's not ram we should go ahead with just the pfn but it'd be a
> > > workaround.
> > > 
> > > I really think it'd be better off to fix pfn_valid to work for
> > > NUMA.
> > 
> > It does work for NUMA, it does not work without the NUMA option.
> 
> Andrea, how would you suggest to fix pfn_valid for the CONFIG_NUMA
> disabled case?

I'm very surprised that this is broken for non-numa.

To be sure I understand, what exactly means that "pfn has not a
backing page"?

I think we must fix pfn_valid, so that when it returns true,
pfn_to_page returns an entry that is contained inside some mem_map
array allocated by mm/page_alloc.c. pfn_valid is totally right to
return true on mmio regions, as long as a 'struct page' exists. Like
for example the 640k-1M area. It'd be a waste to pay the price of a
discontiguous array to save a few struct pages (or perhaps these days
they're inefficiently using 4k tlb entries to map struct page dunno, I
surely prefer to waste a few struct pages and stay with 2M tlb).

As long as a 'struct page' exists and pfn_to_page returns a 'struct
page' checking PageReserved() should be enough to know if it's
pageable RAM owned by the VM or an mmio region. I don't know why but
when I did reserved-ram patch, the PageReserved check wasn't returning
true on the reserved-ram. Perhaps it's because it was ram, dunno, so I
had to use the page_count() hack. But for mmio !pfn_valid ||
PageReserved should work. Said that this mem_map code tends to change
all the time for whatever reason, so I wouldn't be shocked if
PageReserved doesn't actually work either. But mixing pfn_valid and
PageReserved sounds the right objective.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-04 19:34       ` Andrea Arcangeli
@ 2008-06-04 19:41         ` Anthony Liguori
  2008-06-04 19:51           ` Andrea Arcangeli
  0 siblings, 1 reply; 13+ messages in thread
From: Anthony Liguori @ 2008-06-04 19:41 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Muli Ben-Yehuda, Ben-Ami Yassour1, Han Weidong, Kay, Allen M,
	Amit Shah, kvm, avi, Dave Hansen

Andrea Arcangeli wrote:
> On Wed, Jun 04, 2008 at 07:17:55PM +0300, Muli Ben-Yehuda wrote:
>   
>> On Wed, Jun 04, 2008 at 06:09:24PM +0300, Ben-Ami Yassour1 wrote:
>>
>>     
>>>>> We noticed that pfn_valid does not always works as expected by
>>>>> this patch to indicate that a pfn has a backing page.  We have
>>>>> seen a case where CONFIG_NUMA was not set and then where
>>>>> pfn_valid returned 1 for an mmio pfn.  We then changed the
>>>>> config file with CONFIG_NUMA set and it worked fine as expected
>>>>> (since a different implementation of pfn_valid was used).
>>>>>
>>>>> How should we overcome this issue?
>>>>>           
>>>> There's a page_is_ram() too, but that's the e820 map check and it
>>>> means it's RAM not that there's a page backing store. Certainly if
>>>> it's not ram we should go ahead with just the pfn but it'd be a
>>>> workaround.
>>>>
>>>> I really think it'd be better off to fix pfn_valid to work for
>>>> NUMA.
>>>>         
>>> It does work for NUMA, it does not work without the NUMA option.
>>>       
>> Andrea, how would you suggest to fix pfn_valid for the CONFIG_NUMA
>> disabled case?
>>     
>
> I'm very surprised that this is broken for non-numa.
>
> To be sure I understand, what exactly means that "pfn has not a
> backing page"?
>   

The pfn does have a backing page.  When using CONFIG_FLATMEM, 
pfn_valid() is simply:

#ifdef CONFIG_FLATMEM
#define pfn_valid(pfn)          ((pfn) < end_pfn)
#endif

And this is true, pfn_valid() just indicates whether there is a space in 
mem_map[], and there certainly is.  Note this only happens when there is 
a valid PFN that's greater than the PCI memory (using 4GB+ of memory).

> I think we must fix pfn_valid, so that when it returns true,
> pfn_to_page returns an entry that is contained inside some mem_map
> array allocated by mm/page_alloc.c. pfn_valid is totally right to
> return true on mmio regions, as long as a 'struct page' exists. Like
> for example the 640k-1M area. It'd be a waste to pay the price of a
> discontiguous array to save a few struct pages (or perhaps these days
> they're inefficiently using 4k tlb entries to map struct page dunno, I
> surely prefer to waste a few struct pages and stay with 2M tlb).
>
> As long as a 'struct page' exists and pfn_to_page returns a 'struct
> page' checking PageReserved() should be enough to know if it's
> pageable RAM owned by the VM or an mmio region. I don't know why but
> when I did reserved-ram patch, the PageReserved check wasn't returning
> true on the reserved-ram. Perhaps it's because it was ram, dunno, so I
> had to use the page_count() hack. But for mmio !pfn_valid ||
> PageReserved should work.

Dave mentioned that SetPageReserved() doesn't necessarily get called for 
zones with bad alignment.

Regards,

Anthony Liguori


>  Said that this mem_map code tends to change
> all the time for whatever reason, so I wouldn't be shocked if
> PageReserved doesn't actually work either. But mixing pfn_valid and
> PageReserved sounds the right objective.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-04 19:41         ` Anthony Liguori
@ 2008-06-04 19:51           ` Andrea Arcangeli
  2008-06-04 19:59             ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2008-06-04 19:51 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Muli Ben-Yehuda, Ben-Ami Yassour1, Han Weidong, Kay, Allen M,
	Amit Shah, kvm, avi, Dave Hansen

On Wed, Jun 04, 2008 at 02:41:20PM -0500, Anthony Liguori wrote:
> The pfn does have a backing page.  When using CONFIG_FLATMEM, pfn_valid() 
> is simply:
>
> #ifdef CONFIG_FLATMEM
> #define pfn_valid(pfn)          ((pfn) < end_pfn)
> #endif
>
> And this is true, pfn_valid() just indicates whether there is a space in 
> mem_map[], and there certainly is.  Note this only happens when there is a 
> valid PFN that's greater than the PCI memory (using 4GB+ of memory).

So everything is fine with pfn_valid. The check against end_pfn with
flatmem is what I also the one I've looked while doing the
reserved-ram patch.

pfn_valid must only signal if pfn_to_page(pfn) returns garbage or a
struct page (you can't call pfn_to_page if pfn_valid is 0). That's all.

> Dave mentioned that SetPageReserved() doesn't necessarily get called for 
> zones with bad alignment.

What does 'bad alignment' mean? Buddy was used to require each zone to
start at 1<<MAX_ORDER naturally aligned physical address (any ram
before the alignment was wasted). In any case all 'struct page' where
pfn_valid would return true, should start with PG_reserved set, if
it's not the case it should be fixed I guess.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Handle vma regions with no backing page
  2008-06-04 19:51           ` Andrea Arcangeli
@ 2008-06-04 19:59             ` Dave Hansen
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2008-06-04 19:59 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Anthony Liguori, Muli Ben-Yehuda, Ben-Ami Yassour1, Han Weidong,
	Kay, Allen M, Amit Shah, kvm, avi

On Wed, 2008-06-04 at 21:51 +0200, Andrea Arcangeli wrote:
> > Dave mentioned that SetPageReserved() doesn't necessarily get called
> for 
> > zones with bad alignment.
> 
> What does 'bad alignment' mean? Buddy was used to require each zone to
> start at 1<<MAX_ORDER naturally aligned physical address (any ram
> before the alignment was wasted). In any case all 'struct page' where
> pfn_valid would return true, should start with PG_reserved set, if
> it's not the case it should be fixed I guess.

I was thinking of the case where you have a large sparsemem section
size, and a zone which is smaller.

Say you have a 1GB SECTION_SIZE and a single 512MB zone.  You'll get a
true pfn_valid() for the 512MB that isn't in the zone since the
mem_map[] is actually larger than the zone.  But, since
memmap_init_zone() is the one responsible for setting PageReserved(), I
don't think anybody will set PageReserved() on that top 512MB.

-- Dave


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-06-04 19:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-03 11:17 [PATCH] Handle vma regions with no backing page Ben-Ami Yassour
2008-06-03 11:39 ` Andrea Arcangeli
2008-06-04 15:09   ` Ben-Ami Yassour
2008-06-04 16:17     ` Muli Ben-Yehuda
2008-06-04 19:34       ` Andrea Arcangeli
2008-06-04 19:41         ` Anthony Liguori
2008-06-04 19:51           ` Andrea Arcangeli
2008-06-04 19:59             ` Dave Hansen
2008-06-04  9:48 ` Avi Kivity
2008-06-04 16:48   ` Anthony Liguori
  -- strict thread matches above, loose matches on Subject: below --
2008-04-29 14:32 Anthony Liguori
2008-04-29 14:54 ` Andrea Arcangeli
2008-04-29 15:14   ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox