From: Matthew Wilcox <willy@infradead.org>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: mhocko@kernel.org, ldufour@linux.vnet.ibm.com, vbabka@suse.cz,
akpm@linux-foundation.org, dave.hansen@intel.com,
oleg@redhat.com, srikar@linux.vnet.ibm.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC v9 PATCH 2/4] mm: mmap: zap pages with read mmap_sem in munmap
Date: Tue, 11 Sep 2018 19:29:21 -0700 [thread overview]
Message-ID: <20180912022921.GA20056@bombadil.infradead.org> (raw)
In-Reply-To: <b69d3f7d-e9ba-b95c-45cd-44489950751b@linux.alibaba.com>
On Tue, Sep 11, 2018 at 04:35:03PM -0700, Yang Shi wrote:
> On 9/11/18 2:16 PM, Matthew Wilcox wrote:
> > On Wed, Sep 12, 2018 at 04:58:11AM +0800, Yang Shi wrote:
> > > mm/mmap.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > I really think you're going about this the wrong way by duplicating
> > vm_munmap().
>
> If we don't duplicate vm_munmap() or do_munmap(), we need pass an extra
> parameter to them to tell when it is fine to downgrade write lock or if the
> lock has been acquired outside it (i.e. in mmap()/mremap()), right? But,
> vm_munmap() or do_munmap() is called not only by mmap-related, but also some
> other places, like arch-specific places, which don't need downgrade write
> lock or are not safe to do so.
>
> Actually, I did this way in the v1 patches, but it got pushed back by tglx
> who suggested duplicate the code so that the change could be done in mm only
> without touching other files, i.e. arch-specific stuff. I didn't have strong
> argument to convince him.
With my patch, there is nothing to change in arch-specific code.
Here it is again ...
diff --git a/mm/mmap.c b/mm/mmap.c
index de699523c0b7..06dc31d1da8c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2798,11 +2798,11 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
* work. This now handles partial unmappings.
* Jeremy Fitzhardinge <jeremy@goop.org>
*/
-int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
- struct list_head *uf)
+static int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
+ struct list_head *uf, bool downgrade)
{
unsigned long end;
- struct vm_area_struct *vma, *prev, *last;
+ struct vm_area_struct *vma, *prev, *last, *tmp;
if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE-start)
return -EINVAL;
@@ -2816,7 +2816,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
if (!vma)
return 0;
prev = vma->vm_prev;
- /* we have start < vma->vm_end */
+ /* we have start < vma->vm_end */
/* if it doesn't overlap, we have nothing.. */
end = start + len;
@@ -2873,18 +2873,22 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
/*
* unlock any mlock()ed ranges before detaching vmas
+ * and check to see if there's any reason we might have to hold
+ * the mmap_sem write-locked while unmapping regions.
*/
- if (mm->locked_vm) {
- struct vm_area_struct *tmp = vma;
- while (tmp && tmp->vm_start < end) {
- if (tmp->vm_flags & VM_LOCKED) {
- mm->locked_vm -= vma_pages(tmp);
- munlock_vma_pages_all(tmp);
- }
- tmp = tmp->vm_next;
+ for (tmp = vma; tmp && tmp->vm_start < end; tmp = tmp->vm_next) {
+ if (tmp->vm_flags & VM_LOCKED) {
+ mm->locked_vm -= vma_pages(tmp);
+ munlock_vma_pages_all(tmp);
}
+ if (tmp->vm_file &&
+ has_uprobes(tmp, tmp->vm_start, tmp->vm_end))
+ downgrade = false;
}
+ if (downgrade)
+ downgrade_write(&mm->mmap_sem);
+
/*
* Remove the vma's, and unmap the actual pages
*/
@@ -2896,7 +2900,13 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
/* Fix up all other VM information */
remove_vma_list(mm, vma);
- return 0;
+ return downgrade ? 1 : 0;
+}
+
+int do_unmap(struct mm_struct *mm, unsigned long start, size_t len,
+ struct list_head *uf)
+{
+ return __do_munmap(mm, start, len, uf, false);
}
int vm_munmap(unsigned long start, size_t len)
@@ -2905,11 +2915,12 @@ int vm_munmap(unsigned long start, size_t len)
struct mm_struct *mm = current->mm;
LIST_HEAD(uf);
- if (down_write_killable(&mm->mmap_sem))
- return -EINTR;
-
- ret = do_munmap(mm, start, len, &uf);
- up_write(&mm->mmap_sem);
+ down_write(&mm->mmap_sem);
+ ret = __do_munmap(mm, start, len, &uf, true);
+ if (ret == 1)
+ up_read(&mm->mmap_sem);
+ else
+ up_write(&mm->mmap_sem);
userfaultfd_unmap_complete(mm, &uf);
return ret;
}
Anybody calling do_munmap() will not get the lock dropped.
> And, Michal prefers have VM_HUGETLB and VM_PFNMAP handled separately for
> safe and bisectable sake, which needs call the regular do_munmap().
That can be introduced and then taken out ... indeed, you can split this into
many patches, starting with this:
+ if (tmp->vm_file)
+ downgrade = false;
to only allow this optimisation for anonymous mappings at first.
> In addition to this, I just found mpx code may call do_munmap() recursively
> when I was looking into the mpx code.
>
> We might be able to handle these by the extra parameter, but it sounds it
> make the code hard to understand and error prone.
Only if you make the extra parameter mandatory.
next prev parent reply other threads:[~2018-09-12 2:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-11 20:58 [RFC v9 PATCH 0/4] mm: zap pages with read mmap_sem in munmap for large mapping Yang Shi
2018-09-11 20:58 ` [RFC v9 PATCH 1/4] mm: refactor do_munmap() to extract the common part Yang Shi
2018-09-11 20:58 ` [RFC v9 PATCH 2/4] mm: mmap: zap pages with read mmap_sem in munmap Yang Shi
2018-09-11 21:16 ` Matthew Wilcox
2018-09-11 23:35 ` Yang Shi
2018-09-12 2:29 ` Matthew Wilcox [this message]
2018-09-12 9:11 ` Michal Hocko
2018-09-12 17:15 ` Yang Shi
2018-09-11 20:58 ` [RFC v9 PATCH 3/4] mm: unmap VM_HUGETLB mappings with optimized path Yang Shi
2018-09-11 20:58 ` [RFC v9 PATCH 4/4] mm: unmap VM_PFNMAP " Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180912022921.GA20056@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=ldufour@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=oleg@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=vbabka@suse.cz \
--cc=yang.shi@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).