Re: [RFC PATCH] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Michal Hocko <mhocko@suse.cz>
Cc: Linux-MM <linux-mm@kvack.org>, Hugh Dickins <hughd@google.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Kenneth W Chen <kenneth.w.chen@intel.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables
Date: Thu, 19 Jul 2012 15:49:08 +0100	[thread overview]
Message-ID: <20120719144908.GX9222@suse.de> (raw)
In-Reply-To: <20120719144213.GJ2864@tiehlicka.suse.cz>

On Thu, Jul 19, 2012 at 04:42:13PM +0200, Michal Hocko wrote:
> [/me puts the patch destroyer glasses on]
> 

It's a super power now.

> On Wed 18-07-12 11:43:09, Mel Gorman wrote:
> [...]
> > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> > index f6679a7..0524556 100644
> > --- a/arch/x86/mm/hugetlbpage.c
> > +++ b/arch/x86/mm/hugetlbpage.c
> > @@ -68,14 +68,37 @@ static void huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
> >  	struct vm_area_struct *svma;
> >  	unsigned long saddr;
> >  	pte_t *spte = NULL;
> > +	spinlock_t *spage_table_lock = NULL;
> > +	struct rw_semaphore *smmap_sem = NULL;
> >  
> >  	if (!vma_shareable(vma, addr))
> >  		return;
> >  
> > +retry:
> >  	mutex_lock(&mapping->i_mmap_mutex);
> >  	vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
> >  		if (svma == vma)
> >  			continue;
> > +		if (svma->vm_mm == vma->vm_mm)
> > +			continue;
> > +
> > +		/*
> > +		 * The target mm could be in the process of tearing down
> > +		 * its page tables and the i_mmap_mutex on its own is
> > +		 * not sufficient. To prevent races against teardown and
> > +		 * pagetable updates, we acquire the mmap_sem and pagetable
> > +		 * lock of the remote address space. down_read_trylock()
> > +		 * is necessary as the other process could also be trying
> > +		 * to share pagetables with the current mm.
> > +		 */
> > +		if (!down_read_trylock(&svma->vm_mm->mmap_sem)) {
> > +			mutex_unlock(&mapping->i_mmap_mutex);
> > +			goto retry;
> > +		}
> > +
> 
> I am afraid this can easily cause a dead lock. Consider
> fork
>   dup_mmap
>     down_write(&oldmm->mmap_sem)
>     copy_page_range
>       copy_hugetlb_page_range
>         huge_pte_alloc
> 
> svma could belong to oldmm and then we would loop for ever. 
> svma->vm_mm == vma->vm_mm doesn't help because vma is child's one and mm
> differ in that case. I am wondering you didn't hit this while testing.
> It would suggest that the ptes are not populated yet because we didn't
> let parent play and then other children could place its vma in the list
> before parent?
> 

Yes, I think you're right - both about the race and why I didn't hit it.
The libhugetlbfs tests probably avoided the bug for the same reason.
Thanks for pointing this out.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Michal Hocko <mhocko@suse.cz>
Cc: Linux-MM <linux-mm@kvack.org>, Hugh Dickins <hughd@google.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Kenneth W Chen <kenneth.w.chen@intel.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables
Date: Thu, 19 Jul 2012 15:49:08 +0100	[thread overview]
Message-ID: <20120719144908.GX9222@suse.de> (raw)
In-Reply-To: <20120719144213.GJ2864@tiehlicka.suse.cz>

On Thu, Jul 19, 2012 at 04:42:13PM +0200, Michal Hocko wrote:
> [/me puts the patch destroyer glasses on]
> 

It's a super power now.

> On Wed 18-07-12 11:43:09, Mel Gorman wrote:
> [...]
> > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> > index f6679a7..0524556 100644
> > --- a/arch/x86/mm/hugetlbpage.c
> > +++ b/arch/x86/mm/hugetlbpage.c
> > @@ -68,14 +68,37 @@ static void huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
> >  	struct vm_area_struct *svma;
> >  	unsigned long saddr;
> >  	pte_t *spte = NULL;
> > +	spinlock_t *spage_table_lock = NULL;
> > +	struct rw_semaphore *smmap_sem = NULL;
> >  
> >  	if (!vma_shareable(vma, addr))
> >  		return;
> >  
> > +retry:
> >  	mutex_lock(&mapping->i_mmap_mutex);
> >  	vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
> >  		if (svma == vma)
> >  			continue;
> > +		if (svma->vm_mm == vma->vm_mm)
> > +			continue;
> > +
> > +		/*
> > +		 * The target mm could be in the process of tearing down
> > +		 * its page tables and the i_mmap_mutex on its own is
> > +		 * not sufficient. To prevent races against teardown and
> > +		 * pagetable updates, we acquire the mmap_sem and pagetable
> > +		 * lock of the remote address space. down_read_trylock()
> > +		 * is necessary as the other process could also be trying
> > +		 * to share pagetables with the current mm.
> > +		 */
> > +		if (!down_read_trylock(&svma->vm_mm->mmap_sem)) {
> > +			mutex_unlock(&mapping->i_mmap_mutex);
> > +			goto retry;
> > +		}
> > +
> 
> I am afraid this can easily cause a dead lock. Consider
> fork
>   dup_mmap
>     down_write(&oldmm->mmap_sem)
>     copy_page_range
>       copy_hugetlb_page_range
>         huge_pte_alloc
> 
> svma could belong to oldmm and then we would loop for ever. 
> svma->vm_mm == vma->vm_mm doesn't help because vma is child's one and mm
> differ in that case. I am wondering you didn't hit this while testing.
> It would suggest that the ptes are not populated yet because we didn't
> let parent play and then other children could place its vma in the list
> before parent?
> 

Yes, I think you're right - both about the race and why I didn't hit it.
The libhugetlbfs tests probably avoided the bug for the same reason.
Thanks for pointing this out.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2012-07-19 14:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-18 10:43 [RFC PATCH] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables Mel Gorman
2012-07-18 10:43 ` Mel Gorman
2012-07-19  9:08 ` Cong Wang
2012-07-20 13:43   ` Mel Gorman
2012-07-20 13:43     ` Mel Gorman
2012-07-19 12:38 ` Hugh Dickins
2012-07-19 12:38   ` Hugh Dickins
2012-07-19 14:16   ` Mel Gorman
2012-07-19 14:16     ` Mel Gorman
2012-07-19 14:42 ` Michal Hocko
2012-07-19 14:42   ` Michal Hocko
2012-07-19 14:49   ` Mel Gorman [this message]
2012-07-19 14:49     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120719144908.GX9222@suse.de \
    --to=mgorman@suse.de \
    --cc=david@gibson.dropbear.id.au \
    --cc=hughd@google.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.