From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40398) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YTZQU-00045T-3k for qemu-devel@nongnu.org; Thu, 05 Mar 2015 12:19:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YTZQS-0007d2-8G for qemu-devel@nongnu.org; Thu, 05 Mar 2015 12:19:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:59681) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YTZQR-0007cb-VW for qemu-devel@nongnu.org; Thu, 05 Mar 2015 12:19:08 -0500 From: Andrea Arcangeli Date: Thu, 5 Mar 2015 18:18:03 +0100 Message-Id: <1425575884-2574-21-git-send-email-aarcange@redhat.com> In-Reply-To: <1425575884-2574-1-git-send-email-aarcange@redhat.com> References: <1425575884-2574-1-git-send-email-aarcange@redhat.com> Subject: [Qemu-devel] [PATCH 20/21] userfaultfd: UFFDIO_REMAP List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Android Kernel Team Cc: Robert Love , Dave Hansen , Jan Kara , Neil Brown , Stefan Hajnoczi , Andrew Jones , Sanidhya Kashyap , KOSAKI Motohiro , Michel Lespinasse , Taras Glek , zhang.zhanghailiang@huawei.com, Pavel Emelyanov , Hugh Dickins , Mel Gorman , Sasha Levin , "Dr. David Alan Gilbert" , "Huangpeng (Peter)" , Andres Lagar-Cavilla , Christopher Covington , Anthony Liguori , Paolo Bonzini , "Kirill A. Shutemov" , Keith Packard , Wenchao Xia , Juan Quintela , Andy Lutomirski , Minchan Kim , Dmitry Adamushko , Johannes Weiner , Mike Hommey , Andrew Morton , Linus Torvalds , Peter Feiner This remap ioctl allows to atomically move a page in or out of an userfaultfd address space. It's more expensive than "copy" (and of course more expensive than "zerofill") as it requires a TLB flush on the source range for each ioctl, which is an expensive operation on SMP. Especially if copying only a few pages at time, copying without TLB flush is faster. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 6230f22..b4c7f25 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -892,6 +892,54 @@ out: return ret; } +static int userfaultfd_remap(struct userfaultfd_ctx *ctx, + unsigned long arg) +{ + __s64 ret; + struct uffdio_remap uffdio_remap; + struct uffdio_remap __user *user_uffdio_remap; + struct userfaultfd_wake_range range; + + user_uffdio_remap = (struct uffdio_remap __user *) arg; + + ret = -EFAULT; + if (copy_from_user(&uffdio_remap, user_uffdio_remap, + /* don't copy "remap" and "wake" last field */ + sizeof(uffdio_remap)-sizeof(__s64)*2)) + goto out; + + ret = validate_range(ctx->mm, uffdio_remap.dst, uffdio_remap.len); + if (ret) + goto out; + ret = validate_range(current->mm, uffdio_remap.src, uffdio_remap.len); + if (ret) + goto out; + ret = -EINVAL; + if (uffdio_remap.mode & ~(UFFDIO_REMAP_MODE_ALLOW_SRC_HOLES| + UFFDIO_REMAP_MODE_DONTWAKE)) + goto out; + + ret = remap_pages(ctx->mm, current->mm, + uffdio_remap.dst, uffdio_remap.src, + uffdio_remap.len, uffdio_remap.mode); + if (unlikely(put_user(ret, &user_uffdio_remap->remap))) + return -EFAULT; + if (ret < 0) + goto out; + /* len == 0 would wake all */ + BUG_ON(!ret); + range.len = ret; + if (!(uffdio_remap.mode & UFFDIO_REMAP_MODE_DONTWAKE)) { + range.start = uffdio_remap.dst; + ret = wake_userfault(ctx, &range); + if (unlikely(put_user(ret, &user_uffdio_remap->wake))) + return -EFAULT; + } + ret = range.len == uffdio_remap.len ? 0 : -EAGAIN; +out: + return ret; +} + /* * userland asks for a certain API version and we return which bits * and ioctl commands are implemented in this kernel for such API @@ -955,6 +1003,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, case UFFDIO_ZEROPAGE: ret = userfaultfd_zeropage(ctx, arg); break; + case UFFDIO_REMAP: + ret = userfaultfd_remap(ctx, arg); + break; } return ret; }