* [PATCH 0/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() @ 2016-05-16 15:25 Oleg Nesterov 2016-05-16 15:25 ` [PATCH 1/1] " Oleg Nesterov 0 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2016-05-16 15:25 UTC (permalink / raw) To: Andrew Morton, Andrea Arcangeli; +Cc: Linus Torvalds, linux-kernel, linux-mm Hello, Sorry for delay. So this is the same patch, just I added the helpers for get/put mm->mm_users. I won't mind to change userfaultfd_get_mm() to return mm_struct-or- NULL, or perhaps instead we should simply add the trivial helper which does atomic_inc_not_zero(mm->mm_users) into sched.h, it can have more callers (fs/proc, uprobes). Testing. I have found selftests/vm/userfaultfd.c and it seems to work. Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-16 15:25 [PATCH 0/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() Oleg Nesterov @ 2016-05-16 15:25 ` Oleg Nesterov 2016-05-16 15:57 ` Andrea Arcangeli 2016-05-16 17:22 ` [PATCH v2 " Oleg Nesterov 0 siblings, 2 replies; 8+ messages in thread From: Oleg Nesterov @ 2016-05-16 15:25 UTC (permalink / raw) To: Andrew Morton, Andrea Arcangeli; +Cc: Linus Torvalds, linux-kernel, linux-mm userfaultfd_file_create() increments mm->mm_users; this means that the memory won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can populate the orphaned mm more. Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed when we are going to actually play with this memory. Except handle_userfault() path doesn't need this, the caller must already have a reference. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/userfaultfd.c | 55 ++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 42 insertions(+), 13 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 66cdb44..1a2f38a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -70,6 +70,20 @@ struct userfaultfd_wake_range { unsigned long len; }; +/* + * mm_struct can't go away, but we need to verify that this memory is still + * alive and avoid the race with exit_mmap(). + */ +static inline bool userfaultfd_get_mm(struct userfaultfd_ctx *ctx) +{ + return atomic_inc_not_zero(&ctx->mm->mm_users); +} + +static inline void userfaultfd_put_mm(struct userfaultfd_ctx *ctx) +{ + mmput(ctx->mm); +} + static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode, int wake_flags, void *key) { @@ -137,7 +151,7 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx) VM_BUG_ON(waitqueue_active(&ctx->fault_wqh)); VM_BUG_ON(spin_is_locked(&ctx->fd_wqh.lock)); VM_BUG_ON(waitqueue_active(&ctx->fd_wqh)); - mmput(ctx->mm); + mmdrop(ctx->mm); kmem_cache_free(userfaultfd_ctx_cachep, ctx); } } @@ -434,6 +448,9 @@ static int userfaultfd_release(struct inode *inode, struct file *file) ACCESS_ONCE(ctx->released) = true; + if (!userfaultfd_get_mm(ctx)) + goto wakeup; + /* * Flush page faults out of all CPUs. NOTE: all page faults * must be retried without returning VM_FAULT_SIGBUS if @@ -466,7 +483,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file) vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; } up_write(&mm->mmap_sem); - + userfaultfd_put_mm(ctx); +wakeup: /* * After no new page faults can wait on this fault_*wqh, flush * the last page faults that may have been already waiting on @@ -760,10 +778,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, start = uffdio_register.range.start; end = start + uffdio_register.range.len; + ret = -ENOMEM; + if (!userfaultfd_get_mm(ctx)) + goto out; + down_write(&mm->mmap_sem); vma = find_vma_prev(mm, start, &prev); - - ret = -ENOMEM; if (!vma) goto out_unlock; @@ -864,6 +884,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, } while (vma && vma->vm_start < end); out_unlock: up_write(&mm->mmap_sem); + userfaultfd_put_mm(ctx); if (!ret) { /* * Now that we scanned all vmas we can already tell @@ -902,10 +923,12 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, start = uffdio_unregister.start; end = start + uffdio_unregister.len; + ret = -ENOMEM; + if (!userfaultfd_get_mm(ctx)) + goto out; + down_write(&mm->mmap_sem); vma = find_vma_prev(mm, start, &prev); - - ret = -ENOMEM; if (!vma) goto out_unlock; @@ -998,6 +1021,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, } while (vma && vma->vm_start < end); out_unlock: up_write(&mm->mmap_sem); + userfaultfd_put_mm(ctx); out: return ret; } @@ -1067,9 +1091,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, goto out; if (uffdio_copy.mode & ~UFFDIO_COPY_MODE_DONTWAKE) goto out; - - ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, - uffdio_copy.len); + if (userfaultfd_get_mm(ctx)) { + ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, + uffdio_copy.len); + userfaultfd_put_mm(ctx); + } if (unlikely(put_user(ret, &user_uffdio_copy->copy))) return -EFAULT; if (ret < 0) @@ -1110,8 +1136,11 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) goto out; - ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, - uffdio_zeropage.range.len); + if (userfaultfd_get_mm(ctx)) { + ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, + uffdio_zeropage.range.len); + userfaultfd_put_mm(ctx); + } if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage))) return -EFAULT; if (ret < 0) @@ -1289,12 +1318,12 @@ static struct file *userfaultfd_file_create(int flags) ctx->released = false; ctx->mm = current->mm; /* prevent the mm struct to be freed */ - atomic_inc(&ctx->mm->mm_users); + atomic_inc(&ctx->mm->mm_count); file = anon_inode_getfile("[userfaultfd]", &userfaultfd_fops, ctx, O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS)); if (IS_ERR(file)) { - mmput(ctx->mm); + mmdrop(ctx->mm); kmem_cache_free(userfaultfd_ctx_cachep, ctx); } out: -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-16 15:25 ` [PATCH 1/1] " Oleg Nesterov @ 2016-05-16 15:57 ` Andrea Arcangeli 2016-05-16 16:20 ` Oleg Nesterov 2016-05-16 17:22 ` [PATCH v2 " Oleg Nesterov 1 sibling, 1 reply; 8+ messages in thread From: Andrea Arcangeli @ 2016-05-16 15:57 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Andrew Morton, Linus Torvalds, linux-kernel, linux-mm On Mon, May 16, 2016 at 05:25:46PM +0200, Oleg Nesterov wrote: > userfaultfd_file_create() increments mm->mm_users; this means that the memory > won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can > populate the orphaned mm more. > > Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count > to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed > when we are going to actually play with this memory. Except handle_userfault() > path doesn't need this, the caller must already have a reference. This is nice and desired improvement to reduce the pinning from the "mm" as a whole to just the "mm struct". The code used mm_users for simplicity, but using mm_count was definitely wanted to always keep the memory footprint as low as possible (especially to avoid some latency in the footprint reduction in the future non-cooperative usage). Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> > +static inline bool userfaultfd_get_mm(struct userfaultfd_ctx *ctx) > +{ > + return atomic_inc_not_zero(&ctx->mm->mm_users); > +} Nice cleanup, but wouldn't it be more generic to implement this as mmget(&ctx->mm) (or maybe mmget_not_zero) in include/linux/mm.h instead of userfaultfd.c, so then others can use it too, see: drivers/gpu/drm/i915/i915_gem_userptr.c: if (atomic_inc_not_zero(&mm->mm_users)) { drivers/iommu/intel-svm.c: if (!atomic_inc_not_zero(&svm->mm->mm_users)) fs/proc/base.c: if (!atomic_inc_not_zero(&mm->mm_users)) fs/proc/base.c: if (!atomic_inc_not_zero(&mm->mm_users)) fs/proc/task_mmu.c: if (!mm || !atomic_inc_not_zero(&mm->mm_users)) fs/proc/task_mmu.c: if (!mm || !atomic_inc_not_zero(&mm->mm_users)) fs/proc/task_nommu.c: if (!mm || !atomic_inc_not_zero(&mm->mm_users)) kernel/events/uprobes.c: if (!atomic_inc_not_zero(&vma->vm_mm->mm_users)) mm/oom_kill.c: if (!atomic_inc_not_zero(&mm->mm_users)) { mm/swapfile.c: if (!atomic_inc_not_zero(&mm->mm_users)) Anyway this is just an idea, userfaultfd_get_mm is sure fine with me. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-16 15:57 ` Andrea Arcangeli @ 2016-05-16 16:20 ` Oleg Nesterov 0 siblings, 0 replies; 8+ messages in thread From: Oleg Nesterov @ 2016-05-16 16:20 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, Linus Torvalds, linux-kernel, linux-mm On 05/16, Andrea Arcangeli wrote: > > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Thanks, > > +static inline bool userfaultfd_get_mm(struct userfaultfd_ctx *ctx) > > +{ > > + return atomic_inc_not_zero(&ctx->mm->mm_users); > > +} > > Nice cleanup, but wouldn't it be more generic to implement this as > mmget(&ctx->mm) (or maybe mmget_not_zero) in include/linux/mm.h > instead of userfaultfd.c, so then others can use it too, see: Yes, agreed. userfaultfd_get_mm() doesn't look as good as I initially thought. So I guess it would be better to make V2 right now, to avoid another change in userfaultfd.c which changes the same code. Except I think mmget_not_zero() should go to linux/sched.h, until we move mmdrop/mmput/etc to linux/mm.h. I'll send V2 soon... Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-16 15:25 ` [PATCH 1/1] " Oleg Nesterov 2016-05-16 15:57 ` Andrea Arcangeli @ 2016-05-16 17:22 ` Oleg Nesterov 2016-05-17 15:33 ` Michal Hocko 1 sibling, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2016-05-16 17:22 UTC (permalink / raw) To: Andrew Morton, Andrea Arcangeli; +Cc: Linus Torvalds, linux-kernel, linux-mm userfaultfd_file_create() increments mm->mm_users; this means that the memory won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can populate the orphaned mm more. Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed when we are going to actually play with this memory. Except handle_userfault() path doesn't need this, the caller must already have a reference. The patch adds the new trivial helper, mmget_not_zero(), it can have more users. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/userfaultfd.c | 41 ++++++++++++++++++++++++++++------------- include/linux/sched.h | 7 ++++++- 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 66cdb44..2d97952 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -137,7 +137,7 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx) VM_BUG_ON(waitqueue_active(&ctx->fault_wqh)); VM_BUG_ON(spin_is_locked(&ctx->fd_wqh.lock)); VM_BUG_ON(waitqueue_active(&ctx->fd_wqh)); - mmput(ctx->mm); + mmdrop(ctx->mm); kmem_cache_free(userfaultfd_ctx_cachep, ctx); } } @@ -434,6 +434,9 @@ static int userfaultfd_release(struct inode *inode, struct file *file) ACCESS_ONCE(ctx->released) = true; + if (!mmget_not_zero(mm)) + goto wakeup; + /* * Flush page faults out of all CPUs. NOTE: all page faults * must be retried without returning VM_FAULT_SIGBUS if @@ -466,7 +469,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file) vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; } up_write(&mm->mmap_sem); - + mmput(mm); +wakeup: /* * After no new page faults can wait on this fault_*wqh, flush * the last page faults that may have been already waiting on @@ -760,10 +764,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, start = uffdio_register.range.start; end = start + uffdio_register.range.len; + ret = -ENOMEM; + if (!mmget_not_zero(mm)) + goto out; + down_write(&mm->mmap_sem); vma = find_vma_prev(mm, start, &prev); - - ret = -ENOMEM; if (!vma) goto out_unlock; @@ -864,6 +870,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, } while (vma && vma->vm_start < end); out_unlock: up_write(&mm->mmap_sem); + mmput(mm); if (!ret) { /* * Now that we scanned all vmas we can already tell @@ -902,10 +909,12 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, start = uffdio_unregister.start; end = start + uffdio_unregister.len; + ret = -ENOMEM; + if (!mmget_not_zero(mm)) + goto out; + down_write(&mm->mmap_sem); vma = find_vma_prev(mm, start, &prev); - - ret = -ENOMEM; if (!vma) goto out_unlock; @@ -998,6 +1007,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, } while (vma && vma->vm_start < end); out_unlock: up_write(&mm->mmap_sem); + mmput(mm); out: return ret; } @@ -1067,9 +1077,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, goto out; if (uffdio_copy.mode & ~UFFDIO_COPY_MODE_DONTWAKE) goto out; - - ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, - uffdio_copy.len); + if (mmget_not_zero(ctx->mm)) { + ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, + uffdio_copy.len); + mmput(ctx->mm); + } if (unlikely(put_user(ret, &user_uffdio_copy->copy))) return -EFAULT; if (ret < 0) @@ -1110,8 +1122,11 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) goto out; - ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, - uffdio_zeropage.range.len); + if (mmget_not_zero(ctx->mm)) { + ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, + uffdio_zeropage.range.len); + mmput(ctx->mm); + } if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage))) return -EFAULT; if (ret < 0) @@ -1289,12 +1304,12 @@ static struct file *userfaultfd_file_create(int flags) ctx->released = false; ctx->mm = current->mm; /* prevent the mm struct to be freed */ - atomic_inc(&ctx->mm->mm_users); + atomic_inc(&ctx->mm->mm_count); file = anon_inode_getfile("[userfaultfd]", &userfaultfd_fops, ctx, O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS)); if (IS_ERR(file)) { - mmput(ctx->mm); + mmdrop(ctx->mm); kmem_cache_free(userfaultfd_ctx_cachep, ctx); } out: diff --git a/include/linux/sched.h b/include/linux/sched.h index 52c4847..49997bf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2610,12 +2610,17 @@ extern struct mm_struct * mm_alloc(void); /* mmdrop drops the mm and the page tables */ extern void __mmdrop(struct mm_struct *); -static inline void mmdrop(struct mm_struct * mm) +static inline void mmdrop(struct mm_struct *mm) { if (unlikely(atomic_dec_and_test(&mm->mm_count))) __mmdrop(mm); } +static inline bool mmget_not_zero(struct mm_struct *mm) +{ + return atomic_inc_not_zero(&mm->mm_users); +} + /* mmput gets rid of the mappings and all user-space */ extern void mmput(struct mm_struct *); /* Grab a reference to a task's mm, if it is not already going away */ -- 2.5.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-16 17:22 ` [PATCH v2 " Oleg Nesterov @ 2016-05-17 15:33 ` Michal Hocko 2016-05-17 16:30 ` Oleg Nesterov 0 siblings, 1 reply; 8+ messages in thread From: Michal Hocko @ 2016-05-17 15:33 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andrea Arcangeli, Linus Torvalds, linux-kernel, linux-mm On Mon 16-05-16 19:22:54, Oleg Nesterov wrote: > userfaultfd_file_create() increments mm->mm_users; this means that the memory > won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can > populate the orphaned mm more. > > Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count > to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed > when we are going to actually play with this memory. Except handle_userfault() > path doesn't need this, the caller must already have a reference. We should definitely get rid of all unbound pinning via mm_users. > The patch adds the new trivial helper, mmget_not_zero(), it can have more users. Is this really helpful? > Signed-off-by: Oleg Nesterov <oleg@redhat.com> The patch seems good to me but I am not familiar with the userfaultfd internals enought to give you reviewed-by nor acked-by. I welcome the change anyway. > --- > fs/userfaultfd.c | 41 ++++++++++++++++++++++++++++------------- > include/linux/sched.h | 7 ++++++- > 2 files changed, 34 insertions(+), 14 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 66cdb44..2d97952 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -137,7 +137,7 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx) > VM_BUG_ON(waitqueue_active(&ctx->fault_wqh)); > VM_BUG_ON(spin_is_locked(&ctx->fd_wqh.lock)); > VM_BUG_ON(waitqueue_active(&ctx->fd_wqh)); > - mmput(ctx->mm); > + mmdrop(ctx->mm); > kmem_cache_free(userfaultfd_ctx_cachep, ctx); > } > } > @@ -434,6 +434,9 @@ static int userfaultfd_release(struct inode *inode, struct file *file) > > ACCESS_ONCE(ctx->released) = true; > > + if (!mmget_not_zero(mm)) > + goto wakeup; > + > /* > * Flush page faults out of all CPUs. NOTE: all page faults > * must be retried without returning VM_FAULT_SIGBUS if > @@ -466,7 +469,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file) > vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; > } > up_write(&mm->mmap_sem); > - > + mmput(mm); > +wakeup: > /* > * After no new page faults can wait on this fault_*wqh, flush > * the last page faults that may have been already waiting on > @@ -760,10 +764,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > start = uffdio_register.range.start; > end = start + uffdio_register.range.len; > > + ret = -ENOMEM; > + if (!mmget_not_zero(mm)) > + goto out; > + > down_write(&mm->mmap_sem); > vma = find_vma_prev(mm, start, &prev); > - > - ret = -ENOMEM; > if (!vma) > goto out_unlock; > > @@ -864,6 +870,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > } while (vma && vma->vm_start < end); > out_unlock: > up_write(&mm->mmap_sem); > + mmput(mm); > if (!ret) { > /* > * Now that we scanned all vmas we can already tell > @@ -902,10 +909,12 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > start = uffdio_unregister.start; > end = start + uffdio_unregister.len; > > + ret = -ENOMEM; > + if (!mmget_not_zero(mm)) > + goto out; > + > down_write(&mm->mmap_sem); > vma = find_vma_prev(mm, start, &prev); > - > - ret = -ENOMEM; > if (!vma) > goto out_unlock; > > @@ -998,6 +1007,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > } while (vma && vma->vm_start < end); > out_unlock: > up_write(&mm->mmap_sem); > + mmput(mm); > out: > return ret; > } > @@ -1067,9 +1077,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, > goto out; > if (uffdio_copy.mode & ~UFFDIO_COPY_MODE_DONTWAKE) > goto out; > - > - ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, > - uffdio_copy.len); > + if (mmget_not_zero(ctx->mm)) { > + ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, > + uffdio_copy.len); > + mmput(ctx->mm); > + } > if (unlikely(put_user(ret, &user_uffdio_copy->copy))) > return -EFAULT; > if (ret < 0) > @@ -1110,8 +1122,11 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, > if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) > goto out; > > - ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, > - uffdio_zeropage.range.len); > + if (mmget_not_zero(ctx->mm)) { > + ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, > + uffdio_zeropage.range.len); > + mmput(ctx->mm); > + } > if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage))) > return -EFAULT; > if (ret < 0) > @@ -1289,12 +1304,12 @@ static struct file *userfaultfd_file_create(int flags) > ctx->released = false; > ctx->mm = current->mm; > /* prevent the mm struct to be freed */ > - atomic_inc(&ctx->mm->mm_users); > + atomic_inc(&ctx->mm->mm_count); > > file = anon_inode_getfile("[userfaultfd]", &userfaultfd_fops, ctx, > O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS)); > if (IS_ERR(file)) { > - mmput(ctx->mm); > + mmdrop(ctx->mm); > kmem_cache_free(userfaultfd_ctx_cachep, ctx); > } > out: > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 52c4847..49997bf 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -2610,12 +2610,17 @@ extern struct mm_struct * mm_alloc(void); > > /* mmdrop drops the mm and the page tables */ > extern void __mmdrop(struct mm_struct *); > -static inline void mmdrop(struct mm_struct * mm) > +static inline void mmdrop(struct mm_struct *mm) > { > if (unlikely(atomic_dec_and_test(&mm->mm_count))) > __mmdrop(mm); > } > > +static inline bool mmget_not_zero(struct mm_struct *mm) > +{ > + return atomic_inc_not_zero(&mm->mm_users); > +} > + > /* mmput gets rid of the mappings and all user-space */ > extern void mmput(struct mm_struct *); > /* Grab a reference to a task's mm, if it is not already going away */ > -- > 2.5.0 > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-17 15:33 ` Michal Hocko @ 2016-05-17 16:30 ` Oleg Nesterov 2016-05-17 20:34 ` Michal Hocko 0 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2016-05-17 16:30 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Andrea Arcangeli, Linus Torvalds, linux-kernel, linux-mm On 05/17, Michal Hocko wrote: > > On Mon 16-05-16 19:22:54, Oleg Nesterov wrote: > > > The patch adds the new trivial helper, mmget_not_zero(), it can have more users. > > Is this really helpful? Well, this is subjective of course, but I think the code looks a bit better this way. uprobes, fs/proc and more can use this helper too. And in fact the initial version of this patch did atomic_inc_not_zero(mm->users) by hand, then it was suggested to add a helper. > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > > The patch seems good to me but I am not familiar with the userfaultfd > internals enought to give you reviewed-by nor acked-by. I welcome the > change anyway. Thanks ;) Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() 2016-05-17 16:30 ` Oleg Nesterov @ 2016-05-17 20:34 ` Michal Hocko 0 siblings, 0 replies; 8+ messages in thread From: Michal Hocko @ 2016-05-17 20:34 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andrea Arcangeli, Linus Torvalds, linux-kernel, linux-mm On Tue 17-05-16 18:30:44, Oleg Nesterov wrote: > On 05/17, Michal Hocko wrote: > > > > On Mon 16-05-16 19:22:54, Oleg Nesterov wrote: > > > > > The patch adds the new trivial helper, mmget_not_zero(), it can have more users. > > > > Is this really helpful? > > Well, this is subjective of course, but I think the code looks a bit better this > way. uprobes, fs/proc and more can use this helper too. > > And in fact the initial version of this patch did atomic_inc_not_zero(mm->users) by > hand, then it was suggested to add a helper. I would prefer a more descriptive name (something like mmget_alive) but as you say this is highly subjective and nothing that should delay this fix. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-05-17 20:34 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-05-16 15:25 [PATCH 0/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() Oleg Nesterov 2016-05-16 15:25 ` [PATCH 1/1] " Oleg Nesterov 2016-05-16 15:57 ` Andrea Arcangeli 2016-05-16 16:20 ` Oleg Nesterov 2016-05-16 17:22 ` [PATCH v2 " Oleg Nesterov 2016-05-17 15:33 ` Michal Hocko 2016-05-17 16:30 ` Oleg Nesterov 2016-05-17 20:34 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).