[RFC PATCH] Increase locking granularity in THP page fault code

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] Increase locking granularity in THP page fault code
@ 2013-08-30 16:58 Alex Thorlton
  2013-08-30 16:58 ` [RFC PATCH] Change THP code to use pud_page(pud)->ptl lock page_table_lock Alex Thorlton
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Thorlton @ 2013-08-30 16:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Kirill A. Shutemov, Mel Gorman, Xiao Guangrong,
	Andrea Arcangeli, Hugh Dickins, Rik van Riel, Peter Zijlstra,
	Robin Holt, Alex Thorlton, linux-mm

We have a threaded page fault scaling test that is performing very
poorly due to the use of the page_table_lock in the THP fault code
path. By replacing the page_table_lock with the ptl on the pud_page
(need CONFIG_SPLIT_PTLOCKS for this to work), we're able to increase
the granularity of the locking here by a factor of 512, i.e. instead
of locking all 256TB of addressable memory, we only lock the 512GB that
is handled by a single pud_page.

The test I'm running creates 512 threads, pins each thread to a cpu, has
the threads allocate 512mb of memory each and then touch the first byte
of each 4k chunk of the allocated memory.  Here are the timings from
this test on 3.11-rc7, clean, THP on:

real	22m50.904s
user	15m26.072s
sys	11430m19.120s

And here are the timings with my modified kernel, THP on:

real	0m37.018s
user	21m39.164s
sys	155m9.132s

As you can see, we get a huge performance boost by locking a more
targeted chunk of memory instead of locking the whole page table.  At
this point, I'm comfortable saying that there are obvious benefits to
increasing the granularity of the locking, but I'm not sure that I've
done this in the best way possible.  Mainly, I'm not positive that using
the pud_page lock actually protects everything that we're concerned
about locking here.  I'm hoping that everyone can provide some input
on whether or not this seems like a reasonable move to make and, if so,
confirm that I've locked things appropriately.

As a side note, we still have some pretty significant scaling issues
with this test, both with THP on, and off.  I'm cleaning up the locking
here first as this is causing the biggest performance hit.

Alex Thorlton (1):
  Change THP code to use pud_page(pud)->ptl lock instead of
    page_table_lock

 mm/huge_memory.c | 4 ++--
 mm/memory.c      | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.12.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC PATCH] Change THP code to use pud_page(pud)->ptl lock page_table_lock
  2013-08-30 16:58 [RFC PATCH] Increase locking granularity in THP page fault code Alex Thorlton
@ 2013-08-30 16:58 ` Alex Thorlton
  2013-08-30 17:57   ` Naoya Horiguchi
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Thorlton @ 2013-08-30 16:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Kirill A. Shutemov, Mel Gorman, Xiao Guangrong,
	Andrea Arcangeli, Hugh Dickins, Rik van Riel, Peter Zijlstra,
	Robin Holt, Alex Thorlton, linux-mm

This patch changes out the page_table_lock for the pud_page ptl in the
THP fault path; pretty self-explanatory.  I got lazy and commented out
the spinlock assertion in follow_trans_huge_pmd instead of digging up
the pud_page ptl in this function.  This is just a proof of concept, so
I didn't feel that it was too important to keep around for now. 

---
 mm/huge_memory.c | 4 ++--
 mm/memory.c      | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..d3b34e2f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1240,10 +1240,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 				   pmd_t *pmd,
 				   unsigned int flags)
 {
-	struct mm_struct *mm = vma->vm_mm;
+//	struct mm_struct *mm = vma->vm_mm;
 	struct page *page = NULL;
 
-	assert_spin_locked(&mm->page_table_lock);
+//	assert_spin_locked(&mm->page_table_lock);
 
 	if (flags & FOLL_WRITE && !pmd_write(*pmd))
 		goto out;
diff --git a/mm/memory.c b/mm/memory.c
index af84bc0..5b4e910 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1527,15 +1527,15 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
 			split_huge_page_pmd(vma, address, pmd);
 			goto split_fallthrough;
 		}
-		spin_lock(&mm->page_table_lock);
+		spin_lock(&pud_page(*pud)->ptl);
 		if (likely(pmd_trans_huge(*pmd))) {
 			if (unlikely(pmd_trans_splitting(*pmd))) {
-				spin_unlock(&mm->page_table_lock);
+				spin_unlock(&pud_page(*pud)->ptl);
 				wait_split_huge_page(vma->anon_vma, pmd);
 			} else {
 				page = follow_trans_huge_pmd(vma, address,
 							     pmd, flags);
-				spin_unlock(&mm->page_table_lock);
+				spin_unlock(&pud_page(*pud)->ptl);
 				*page_mask = HPAGE_PMD_NR - 1;
 				goto out;
 			}
-- 
1.7.12.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] Change THP code to use pud_page(pud)->ptl lock page_table_lock
  2013-08-30 16:58 ` [RFC PATCH] Change THP code to use pud_page(pud)->ptl lock page_table_lock Alex Thorlton
@ 2013-08-30 17:57   ` Naoya Horiguchi
  0 siblings, 0 replies; 3+ messages in thread
From: Naoya Horiguchi @ 2013-08-30 17:57 UTC (permalink / raw)
  To: Alex Thorlton
  Cc: linux-kernel, Andrew Morton, Kirill A. Shutemov, Mel Gorman,
	Xiao Guangrong, Andrea Arcangeli, Hugh Dickins, Rik van Riel,
	Peter Zijlstra, Robin Holt, linux-mm

Hi Alex,

I'm interested in the same issue, and posted patches a few hours ago
too (Cc:ed you.) It can be interesting/helpful for you.

On Fri, Aug 30, 2013 at 11:58:17AM -0500, Alex Thorlton wrote:
> This patch changes out the page_table_lock for the pud_page ptl in the
> THP fault path; pretty self-explanatory.  I got lazy and commented out
> the spinlock assertion in follow_trans_huge_pmd instead of digging up
> the pud_page ptl in this function.  This is just a proof of concept, so
> I didn't feel that it was too important to keep around for now. 
> 
> ---
>  mm/huge_memory.c | 4 ++--
>  mm/memory.c      | 6 +++---
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index a92012a..d3b34e2f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1240,10 +1240,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>  				   pmd_t *pmd,
>  				   unsigned int flags)
>  {
> -	struct mm_struct *mm = vma->vm_mm;
> +//	struct mm_struct *mm = vma->vm_mm;
>  	struct page *page = NULL;
>  
> -	assert_spin_locked(&mm->page_table_lock);
> +//	assert_spin_locked(&mm->page_table_lock);
>  
>  	if (flags & FOLL_WRITE && !pmd_write(*pmd))
>  		goto out;
> diff --git a/mm/memory.c b/mm/memory.c
> index af84bc0..5b4e910 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1527,15 +1527,15 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
>  			split_huge_page_pmd(vma, address, pmd);
>  			goto split_fallthrough;
>  		}
> -		spin_lock(&mm->page_table_lock);
> +		spin_lock(&pud_page(*pud)->ptl);
>  		if (likely(pmd_trans_huge(*pmd))) {
>  			if (unlikely(pmd_trans_splitting(*pmd))) {
> -				spin_unlock(&mm->page_table_lock);
> +				spin_unlock(&pud_page(*pud)->ptl);
>  				wait_split_huge_page(vma->anon_vma, pmd);
>  			} else {
>  				page = follow_trans_huge_pmd(vma, address,
>  							     pmd, flags);
> -				spin_unlock(&mm->page_table_lock);
> +				spin_unlock(&pud_page(*pud)->ptl);
>  				*page_mask = HPAGE_PMD_NR - 1;
>  				goto out;
>  			}

I think that other ptl holders should use pud_page->ptl rather than
mm->page_table_lock. Otherwise we have a race that other threads can
change *pmd when running on this code.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-08-30 17:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-30 16:58 [RFC PATCH] Increase locking granularity in THP page fault code Alex Thorlton
2013-08-30 16:58 ` [RFC PATCH] Change THP code to use pud_page(pud)->ptl lock page_table_lock Alex Thorlton
2013-08-30 17:57   ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).