Re: Multiple potential races on vma->vm_flags

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: Multiple potential races on vma->vm_flags
Date: Mon, 14 Sep 2015 13:16:09 +0300	[thread overview]
Message-ID: <20150914101609.GA8293@node.dhcp.inet.fi> (raw)
In-Reply-To: <alpine.LSU.2.11.1509111734480.7660@eggly.anvils>

On Fri, Sep 11, 2015 at 06:27:14PM -0700, Hugh Dickins wrote:
> On Fri, 11 Sep 2015, Kirill A. Shutemov wrote:
> > On Thu, Sep 10, 2015 at 03:27:59PM +0200, Andrey Konovalov wrote:
> > > Can a vma be shared among a few mm's?
> > 
> > Define "shared".
> > 
> > vma can belong only to one process (mm_struct), but it can be accessed
> > from other process like in rmap case below.
> > 
> > rmap uses anon_vma_lock for anon vma and i_mmap_rwsem for file vma to make
> > sure that the vma will not disappear under it.
> > 
> > > If yes, then taking current->mm->mmap_sem to protect vma is not enough.
> > 
> > Depends on what protection you are talking about.
> >  
> > > In the first report below both T378 and T398 take
> > > current->mm->mmap_sem at mm/mlock.c:650, but they turn out to be
> > > different locks (the addresses are different).
> > 
> > See i_mmap_lock_read() in T398. It will guarantee that vma is there.
> > 
> > > In the second report T309 doesn't take any locks at all, since it
> > > assumes that after checking atomic_dec_and_test(&mm->mm_users) the mm
> > > has no other users, but then it does a write to vma.
> > 
> > This one is tricky. I *assume* the mm cannot be generally accessible after
> > mm_users drops to zero, but I'm not entirely sure about it.
> > procfs? ptrace?
> 
> Most of the things (including procfs and ptrace) that need to work on
> a foreign mm do take a hold on mm_users with get_task_mm().  swapoff
> uses atomic_inc_not_zero(&mm->mm_users).  In KSM I managed to get away
> with just a hold on the structure itself, atomic_inc(&mm->mm_count),
> and a check for mm_users 0 wherever it down_reads mmap_sem (but Andrey
> might like to turn KSM on: it wouldn't be entirely shocking if he were
> to discover an anomaly from that).
> 
> > 
> > The VMA is still accessible via rmap at this point. And I think it can be
> > a problem:
> > 
> > 		CPU0					CPU1
> > exit_mmap()
> >   // mmap_sem is *not* taken
> >   munlock_vma_pages_all()
> >     munlock_vma_pages_range()
> >     						try_to_unmap_one()
> > 						  down_read_trylock(&vma->vm_mm->mmap_sem))
> > 						  !!(vma->vm_flags & VM_LOCKED) == true
> >       vma->vm_flags &= ~VM_LOCKED;
> >       <munlock the page>
> >       						  mlock_vma_page(page);
> > 						  // mlocked pages is leaked.
> > 
> > The obvious solution is to take mmap_sem in exit path, but it would cause
> > performance regression.
> > 
> > Any comments?
> 
> I'm inclined to echo Vlastimil's comment from earlier in the thread:
> sounds like an overkill, unless we find something more serious than this.
> 
> I'm not sure whether we'd actually see a regression from taking mmap_sem
> in exit path; but given that it's mmap_sem, yes, history tells us please
> not to take it any more than we have to.
> 
> I do remember wishing, when working out KSM's mm handling, that exit took
> mmap_sem: it would have made it simpler, but that wasn't a change I dared
> to make.
> 
> Maybe an mm_users 0 check after down_read_trylock in try_to_unmap_one() 
> could fix it?

I don't see how. It would shift a picture, but doesn't fix it: exit_mmap()
can happen after down_read_trylock() and mm_users check.
We would only hide the problem.

> But if we were to make a bigger change for this VM_LOCKED issue, and
> something more serious makes it worth all the effort, I'd say that
> what needs to be done is to give mlock/munlock proper locking (haha).
> 
> I have not yet looked at your mlocked THP patch (sorry), but when I
> was doing the same thing for huge tmpfs, what made it so surprisingly
> difficult was all the spongy trylocking, which concealed the rules.
> 
> Maybe I'm completely wrong, but I thought a lot of awkwardness might
> disappear if they were relying on anon_vma->rwsem and i_mmap_rwsem
> throughout instead of mmap_sem.

This can be helpful. But the risk is getting scalability regression on
other front: long anon_vma chain or highly shared files.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: Multiple potential races on vma->vm_flags
Date: Mon, 14 Sep 2015 13:16:09 +0300	[thread overview]
Message-ID: <20150914101609.GA8293@node.dhcp.inet.fi> (raw)
In-Reply-To: <alpine.LSU.2.11.1509111734480.7660@eggly.anvils>

On Fri, Sep 11, 2015 at 06:27:14PM -0700, Hugh Dickins wrote:
> On Fri, 11 Sep 2015, Kirill A. Shutemov wrote:
> > On Thu, Sep 10, 2015 at 03:27:59PM +0200, Andrey Konovalov wrote:
> > > Can a vma be shared among a few mm's?
> > 
> > Define "shared".
> > 
> > vma can belong only to one process (mm_struct), but it can be accessed
> > from other process like in rmap case below.
> > 
> > rmap uses anon_vma_lock for anon vma and i_mmap_rwsem for file vma to make
> > sure that the vma will not disappear under it.
> > 
> > > If yes, then taking current->mm->mmap_sem to protect vma is not enough.
> > 
> > Depends on what protection you are talking about.
> >  
> > > In the first report below both T378 and T398 take
> > > current->mm->mmap_sem at mm/mlock.c:650, but they turn out to be
> > > different locks (the addresses are different).
> > 
> > See i_mmap_lock_read() in T398. It will guarantee that vma is there.
> > 
> > > In the second report T309 doesn't take any locks at all, since it
> > > assumes that after checking atomic_dec_and_test(&mm->mm_users) the mm
> > > has no other users, but then it does a write to vma.
> > 
> > This one is tricky. I *assume* the mm cannot be generally accessible after
> > mm_users drops to zero, but I'm not entirely sure about it.
> > procfs? ptrace?
> 
> Most of the things (including procfs and ptrace) that need to work on
> a foreign mm do take a hold on mm_users with get_task_mm().  swapoff
> uses atomic_inc_not_zero(&mm->mm_users).  In KSM I managed to get away
> with just a hold on the structure itself, atomic_inc(&mm->mm_count),
> and a check for mm_users 0 wherever it down_reads mmap_sem (but Andrey
> might like to turn KSM on: it wouldn't be entirely shocking if he were
> to discover an anomaly from that).
> 
> > 
> > The VMA is still accessible via rmap at this point. And I think it can be
> > a problem:
> > 
> > 		CPU0					CPU1
> > exit_mmap()
> >   // mmap_sem is *not* taken
> >   munlock_vma_pages_all()
> >     munlock_vma_pages_range()
> >     						try_to_unmap_one()
> > 						  down_read_trylock(&vma->vm_mm->mmap_sem))
> > 						  !!(vma->vm_flags & VM_LOCKED) == true
> >       vma->vm_flags &= ~VM_LOCKED;
> >       <munlock the page>
> >       						  mlock_vma_page(page);
> > 						  // mlocked pages is leaked.
> > 
> > The obvious solution is to take mmap_sem in exit path, but it would cause
> > performance regression.
> > 
> > Any comments?
> 
> I'm inclined to echo Vlastimil's comment from earlier in the thread:
> sounds like an overkill, unless we find something more serious than this.
> 
> I'm not sure whether we'd actually see a regression from taking mmap_sem
> in exit path; but given that it's mmap_sem, yes, history tells us please
> not to take it any more than we have to.
> 
> I do remember wishing, when working out KSM's mm handling, that exit took
> mmap_sem: it would have made it simpler, but that wasn't a change I dared
> to make.
> 
> Maybe an mm_users 0 check after down_read_trylock in try_to_unmap_one() 
> could fix it?

I don't see how. It would shift a picture, but doesn't fix it: exit_mmap()
can happen after down_read_trylock() and mm_users check.
We would only hide the problem.

> But if we were to make a bigger change for this VM_LOCKED issue, and
> something more serious makes it worth all the effort, I'd say that
> what needs to be done is to give mlock/munlock proper locking (haha).
> 
> I have not yet looked at your mlocked THP patch (sorry), but when I
> was doing the same thing for huge tmpfs, what made it so surprisingly
> difficult was all the spongy trylocking, which concealed the rules.
> 
> Maybe I'm completely wrong, but I thought a lot of awkwardness might
> disappear if they were relying on anon_vma->rwsem and i_mmap_rwsem
> throughout instead of mmap_sem.

This can be helpful. But the risk is getting scalability regression on
other front: long anon_vma chain or highly shared files.

-- 
 Kirill A. Shutemov

next prev parent reply	other threads:[~2015-09-14 10:16 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAAeHK+z8o96YeRF-fQXmoApOKXa0b9pWsQHDeP=5GC_hMTuoDg@mail.gmail.com>
     [not found] ` <55EC9221.4040603@oracle.com>
2015-09-07 11:40   ` Multiple potential races on vma->vm_flags Kirill A. Shutemov
2015-09-07 11:40     ` Kirill A. Shutemov
2015-09-09 15:27     ` Vlastimil Babka
2015-09-09 15:27       ` Vlastimil Babka
2015-09-09 16:01       ` Kirill A. Shutemov
2015-09-09 16:01         ` Kirill A. Shutemov
2015-09-10  0:58     ` Sasha Levin
2015-09-10  0:58       ` Sasha Levin
2015-09-10  8:36       ` Kirill A. Shutemov
2015-09-10  8:36         ` Kirill A. Shutemov
2015-09-10 13:27         ` Andrey Konovalov
2015-09-10 13:27           ` Andrey Konovalov
2015-09-11 10:39           ` Kirill A. Shutemov
2015-09-11 10:39             ` Kirill A. Shutemov
2015-09-11 15:29             ` Vlastimil Babka
2015-09-11 15:29               ` Vlastimil Babka
2015-09-11 16:08               ` Vlastimil Babka
2015-09-11 16:08                 ` Vlastimil Babka
2015-09-12  1:27             ` Hugh Dickins
2015-09-12  1:27               ` Hugh Dickins
2015-09-14 10:16               ` Kirill A. Shutemov [this message]
2015-09-14 10:16                 ` Kirill A. Shutemov
2015-09-15 17:36               ` Sasha Levin
2015-09-15 17:36                 ` Sasha Levin
2015-09-15 19:01                 ` Kirill A. Shutemov
2015-09-15 19:01                   ` Kirill A. Shutemov
2015-09-22 16:47                   ` Andrey Konovalov
2015-09-22 16:47                     ` Andrey Konovalov
2015-09-22 18:54                     ` Hugh Dickins
2015-09-22 18:54                       ` Hugh Dickins
2015-09-22 19:45                       ` Andrey Konovalov
2015-09-22 19:45                         ` Andrey Konovalov
2015-09-23  1:39                         ` Hugh Dickins
2015-09-23  1:39                           ` Hugh Dickins
2015-09-23 11:46                           ` Kirill A. Shutemov
2015-09-23 11:46                             ` Kirill A. Shutemov
2015-09-23 22:58                             ` Davidlohr Bueso
2015-09-23 22:58                               ` Davidlohr Bueso
2015-09-23 13:08                           ` Andrey Konovalov
2015-09-23 13:08                             ` Andrey Konovalov
2015-09-24  0:42                             ` Sasha Levin
2015-09-24  0:42                               ` Sasha Levin
2015-09-25 19:33                               ` Kirill A. Shutemov
2015-09-25 19:33                                 ` Kirill A. Shutemov
2015-10-13 22:38                               ` Hugh Dickins
2015-10-13 22:38                                 ` Hugh Dickins
2015-10-13 22:33                             ` Hugh Dickins
2015-10-13 22:33                               ` Hugh Dickins
2015-10-15 16:58                               ` Andrey Konovalov
2015-10-15 16:58                                 ` Andrey Konovalov
2015-09-23 21:30                 ` Sasha Levin
2015-09-23 21:30                   ` Sasha Levin
2015-09-25 14:26                   ` Oleg Nesterov
2015-09-25 14:26                     ` Oleg Nesterov
2015-09-24 13:11                 ` Oleg Nesterov
2015-09-24 13:11                   ` Oleg Nesterov
2015-09-24 16:27                   ` Sasha Levin
2015-09-24 16:27                     ` Sasha Levin
2015-09-24 17:26                     ` Oleg Nesterov
2015-09-24 17:26                       ` Oleg Nesterov
2015-09-24 18:52                       ` Andrey Ryabinin
2015-09-24 18:52                         ` Andrey Ryabinin
2015-09-24 19:01                         ` Sasha Levin
2015-09-24 19:01                           ` Sasha Levin
2015-09-25 12:41                         ` Oleg Nesterov
2015-09-25 12:41                           ` Oleg Nesterov
2015-09-23 15:34             ` Oleg Nesterov
2015-09-23 15:34               ` Oleg Nesterov
2015-09-23 15:38               ` Oleg Nesterov
2015-09-23 15:38                 ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150914101609.GA8293@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=dvyukov@google.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.