linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC] shared page table for hugetlbpage memory causing leak.
@ 2008-01-16 17:25 Larry Woodman
  2008-01-16 18:54 ` Adam Litke
  0 siblings, 1 reply; 6+ messages in thread
From: Larry Woodman @ 2008-01-16 17:25 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: text/plain, Size: 1284 bytes --]

I think the shared page table code for hugetlb memory on x86 and x86_64
is causing a leak.  When a user of hugepages exits using this code the 
system
leaks some of the hugepages.

-------------------------------------------------------
Part of /proc/meminfo just before database startup:
HugePages_Total:  5500
HugePages_Free:   5500
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Just before shutdown:
HugePages_Total:  5500
HugePages_Free:   4475
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

After shutdown:
HugePages_Total:  5500 
HugePages_Free:   4988 
HugePages_Rsvd:      0 
Hugepagesize:     2048 kB
----------------------------------------------------------

I think the problem occurs durring a fork, in copy_hugetlb_page_range(). 
It locates the dst_pte using huge_pte_alloc().  Since huge_pte_alloc() 
calls huge_pmd_share() it will share the pmd page if can yet the main 
loop in copy_hugetlb_page_range() does a get_page() on every hugepage.  
This is a violation of the shared hugepmd pagetable protocol and creates 
additional referenced to the hugepages.  

I think we can skip the entire replication of the ptes when the hugepage
pagetables are shared.  This patch skips copying the ptes and the get_page()
calls if the hugetlbpage pagetable is shared.





[-- Attachment #2: linux-shared.patch --]
[-- Type: text/plain, Size: 1178 bytes --]

--- linux-2.6.23/mm/hugetlb.c.orig	2008-01-16 12:05:41.496448000 -0500
+++ linux-2.6.23/mm/hugetlb.c	2008-01-16 12:09:57.184746000 -0500
@@ -377,18 +377,22 @@ int copy_hugetlb_page_range(struct mm_st
 		dst_pte = huge_pte_alloc(dst, addr);
 		if (!dst_pte)
 			goto nomem;
-		spin_lock(&dst->page_table_lock);
-		spin_lock(&src->page_table_lock);
-		if (!pte_none(*src_pte)) {
-			if (cow)
-				ptep_set_wrprotect(src, addr, src_pte);
-			entry = *src_pte;
-			ptepage = pte_page(entry);
-			get_page(ptepage);
-			set_huge_pte_at(dst, addr, dst_pte, entry);
+
+		/* if hugetlbpage pagetables are shared dont take additional references */
+		if(!(is_vm_hugtlb_page(vma) && dst_pte == src_pte)) {
+			spin_lock(&dst->page_table_lock);
+			spin_lock(&src->page_table_lock);
+			if (!pte_none(*src_pte)) {
+				if (cow)
+					ptep_set_wrprotect(src, addr, src_pte);
+				entry = *src_pte;
+				ptepage = pte_page(entry);
+				get_page(ptepage);
+				set_huge_pte_at(dst, addr, dst_pte, entry);
+			}
+			spin_unlock(&src->page_table_lock);
+			spin_unlock(&dst->page_table_lock);
 		}
-		spin_unlock(&src->page_table_lock);
-		spin_unlock(&dst->page_table_lock);
 	}
 	return 0;
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] shared page table for hugetlbpage memory causing leak.
  2008-01-16 17:25 [RFC] shared page table for hugetlbpage memory causing leak Larry Woodman
@ 2008-01-16 18:54 ` Adam Litke
  2008-01-16 18:55   ` Larry Woodman
  2008-01-17 10:19   ` Balbir Singh
  0 siblings, 2 replies; 6+ messages in thread
From: Adam Litke @ 2008-01-16 18:54 UTC (permalink / raw)
  To: Larry Woodman; +Cc: linux-mm

Since we know we are dealing with a hugetlb VMA, how about the
following, simpler, _untested_ patch:

Signed-off-by: Adam Litke <agl@us.ibm.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6f97821..75b0e4f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -644,6 +644,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		dst_pte = huge_pte_alloc(dst, addr);
 		if (!dst_pte)
 			goto nomem;
+
+		/* If page table is shared do not copy or take references */
+		if (src_pte == dst_pte)
+			continue;
+
 		spin_lock(&dst->page_table_lock);
 		spin_lock(&src->page_table_lock);
 		if (!pte_none(*src_pte)) {


-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC] shared page table for hugetlbpage memory causing leak.
  2008-01-16 18:54 ` Adam Litke
@ 2008-01-16 18:55   ` Larry Woodman
  2008-01-17 10:19   ` Balbir Singh
  1 sibling, 0 replies; 6+ messages in thread
From: Larry Woodman @ 2008-01-16 18:55 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm

Adam Litke wrote:

>Since we know we are dealing with a hugetlb VMA, how about the
>following, simpler, _untested_ patch:
>
>Signed-off-by: Adam Litke <agl@us.ibm.com>
>
>diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>index 6f97821..75b0e4f 100644
>--- a/mm/hugetlb.c
>+++ b/mm/hugetlb.c
>@@ -644,6 +644,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> 		dst_pte = huge_pte_alloc(dst, addr);
> 		if (!dst_pte)
> 			goto nomem;
>+
>+		/* If page table is shared do not copy or take references */
>+		if (src_pte == dst_pte)
>+			continue;
>+
> 		spin_lock(&dst->page_table_lock);
> 		spin_lock(&src->page_table_lock);
> 		if (!pte_none(*src_pte)) {
>
>
>  
>
Agreed.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] shared page table for hugetlbpage memory causing leak.
  2008-01-16 18:54 ` Adam Litke
  2008-01-16 18:55   ` Larry Woodman
@ 2008-01-17 10:19   ` Balbir Singh
  2008-01-17 11:53     ` Larry Woodman
  1 sibling, 1 reply; 6+ messages in thread
From: Balbir Singh @ 2008-01-17 10:19 UTC (permalink / raw)
  To: Adam Litke; +Cc: Larry Woodman, linux-mm

* Adam Litke <agl@us.ibm.com> [2008-01-16 12:54:28]:

> Since we know we are dealing with a hugetlb VMA, how about the
> following, simpler, _untested_ patch:
> 
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6f97821..75b0e4f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -644,6 +644,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  		dst_pte = huge_pte_alloc(dst, addr);
>  		if (!dst_pte)
>  			goto nomem;
> +
> +		/* If page table is shared do not copy or take references */
> +		if (src_pte == dst_pte)
> +			continue;
> +

Shouldn't you be checking the PTE contents rather than the pointers?
Shouldn't the check be

                if (unlikely(pte_same(*src_pte, *dst_pte))
                        continue;


>  		spin_lock(&dst->page_table_lock);
>  		spin_lock(&src->page_table_lock);
>  		if (!pte_none(*src_pte)) {
> 

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] shared page table for hugetlbpage memory causing leak.
  2008-01-17 10:19   ` Balbir Singh
@ 2008-01-17 11:53     ` Larry Woodman
  2008-01-17 12:12       ` Balbir Singh
  0 siblings, 1 reply; 6+ messages in thread
From: Larry Woodman @ 2008-01-17 11:53 UTC (permalink / raw)
  To: balbir; +Cc: Adam Litke, linux-mm

On Thu, 2008-01-17 at 15:49 +0530, Balbir Singh wrote:
> * Adam Litke <agl@us.ibm.com> [2008-01-16 12:54:28]:
> 
> > Since we know we are dealing with a hugetlb VMA, how about the
> > following, simpler, _untested_ patch:
> > 
> > Signed-off-by: Adam Litke <agl@us.ibm.com>
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 6f97821..75b0e4f 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -644,6 +644,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> >  		dst_pte = huge_pte_alloc(dst, addr);
> >  		if (!dst_pte)
> >  			goto nomem;
> > +
> > +		/* If page table is shared do not copy or take references */
> > +		if (src_pte == dst_pte)
> > +			continue;
> > +
> 
> Shouldn't you be checking the PTE contents rather than the pointers?
No, this is chacking for shared page tables not shared pages.
> Shouldn't the check be
> 
>                 if (unlikely(pte_same(*src_pte, *dst_pte))
>                         continue;
> 
> 
> >  		spin_lock(&dst->page_table_lock);
> >  		spin_lock(&src->page_table_lock);
> >  		if (!pte_none(*src_pte)) {
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] shared page table for hugetlbpage memory causing leak.
  2008-01-17 11:53     ` Larry Woodman
@ 2008-01-17 12:12       ` Balbir Singh
  0 siblings, 0 replies; 6+ messages in thread
From: Balbir Singh @ 2008-01-17 12:12 UTC (permalink / raw)
  To: Larry Woodman; +Cc: Adam Litke, linux-mm

* Larry Woodman <lwoodman@redhat.com> [2008-01-17 06:53:38]:

> On Thu, 2008-01-17 at 15:49 +0530, Balbir Singh wrote:
> > * Adam Litke <agl@us.ibm.com> [2008-01-16 12:54:28]:
> > 
> > > Since we know we are dealing with a hugetlb VMA, how about the
> > > following, simpler, _untested_ patch:
> > > 
> > > Signed-off-by: Adam Litke <agl@us.ibm.com>
> > > 
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 6f97821..75b0e4f 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -644,6 +644,11 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> > >  		dst_pte = huge_pte_alloc(dst, addr);
> > >  		if (!dst_pte)
> > >  			goto nomem;
> > > +
> > > +		/* If page table is shared do not copy or take references */
> > > +		if (src_pte == dst_pte)
> > > +			continue;
> > > +
> > 
> > Shouldn't you be checking the PTE contents rather than the pointers?
> No, this is chacking for shared page tables not shared pages.

Aah.. I see.

Thanks for clarifying!

-- 
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-01-17 12:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-16 17:25 [RFC] shared page table for hugetlbpage memory causing leak Larry Woodman
2008-01-16 18:54 ` Adam Litke
2008-01-16 18:55   ` Larry Woodman
2008-01-17 10:19   ` Balbir Singh
2008-01-17 11:53     ` Larry Woodman
2008-01-17 12:12       ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).