From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Hansen Subject: Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization Date: Tue, 23 Jun 2015 12:00:19 -0700 Message-ID: <5589ACC3.3060401@intel.com> References: <1431624680-20153-1-git-send-email-aarcange@redhat.com> <1431624680-20153-11-git-send-email-aarcange@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1431624680-20153-11-git-send-email-aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Andrea Arcangeli , Andrew Morton , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org, kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: Pavel Emelyanov , Sanidhya Kashyap , zhang.zhanghailiang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org, Linus Torvalds , "Kirill A. Shutemov" , Andres Lagar-Cavilla , Paolo Bonzini , Rik van Riel , Mel Gorman , Andy Lutomirski , Hugh Dickins , Peter Feiner , "Dr. David Alan Gilbert" , Johannes Weiner , "Huangpeng (Peter)" List-Id: linux-api@vger.kernel.org On 05/14/2015 10:31 AM, Andrea Arcangeli wrote: > +static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode, > + int wake_flags, void *key) > +{ > + struct userfaultfd_wake_range *range = key; > + int ret; > + struct userfaultfd_wait_queue *uwq; > + unsigned long start, len; > + > + uwq = container_of(wq, struct userfaultfd_wait_queue, wq); > + ret = 0; > + /* don't wake the pending ones to avoid reads to block */ > + if (uwq->pending && !ACCESS_ONCE(uwq->ctx->released)) > + goto out; > + /* len == 0 means wake all */ > + start = range->start; > + len = range->len; > + if (len && (start > uwq->address || start + len <= uwq->address)) > + goto out; > + ret = wake_up_state(wq->private, mode); > + if (ret) > + /* wake only once, autoremove behavior */ > + list_del_init(&wq->task_list); > +out: > + return ret; > +} ... > +static __always_inline int validate_range(struct mm_struct *mm, > + __u64 start, __u64 len) > +{ > + __u64 task_size = mm->task_size; > + > + if (start & ~PAGE_MASK) > + return -EINVAL; > + if (len & ~PAGE_MASK) > + return -EINVAL; > + if (!len) > + return -EINVAL; > + if (start < mmap_min_addr) > + return -EINVAL; > + if (start >= task_size) > + return -EINVAL; > + if (len > task_size - start) > + return -EINVAL; > + return 0; > +} Hey Andrea, Down in userfaultfd_wake_function(), it looks like you intended for a len=0 to mean "wake all". But the validate_range() that we do from userspace has a !len check in it, which keeps us from passing a len=0 in from userspace. Was that "wake all" for some internal use, or is the check too strict? I was trying to use the wake ioctl after an madvise() (as opposed to filling things in using a userfd copy). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by kanga.kvack.org (Postfix) with ESMTP id A49D96B0032 for ; Tue, 23 Jun 2015 15:00:21 -0400 (EDT) Received: by pactm7 with SMTP id tm7so12476879pac.2 for ; Tue, 23 Jun 2015 12:00:21 -0700 (PDT) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTP id a5si35950000pat.91.2015.06.23.12.00.20 for ; Tue, 23 Jun 2015 12:00:20 -0700 (PDT) Message-ID: <5589ACC3.3060401@intel.com> Date: Tue, 23 Jun 2015 12:00:19 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization References: <1431624680-20153-1-git-send-email-aarcange@redhat.com> <1431624680-20153-11-git-send-email-aarcange@redhat.com> In-Reply-To: <1431624680-20153-11-git-send-email-aarcange@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-api@vger.kernel.org Cc: Pavel Emelyanov , Sanidhya Kashyap , zhang.zhanghailiang@huawei.com, Linus Torvalds , "Kirill A. Shutemov" , Andres Lagar-Cavilla , Paolo Bonzini , Rik van Riel , Mel Gorman , Andy Lutomirski , Hugh Dickins , Peter Feiner , "Dr. David Alan Gilbert" , Johannes Weiner , "Huangpeng (Peter)" On 05/14/2015 10:31 AM, Andrea Arcangeli wrote: > +static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode, > + int wake_flags, void *key) > +{ > + struct userfaultfd_wake_range *range = key; > + int ret; > + struct userfaultfd_wait_queue *uwq; > + unsigned long start, len; > + > + uwq = container_of(wq, struct userfaultfd_wait_queue, wq); > + ret = 0; > + /* don't wake the pending ones to avoid reads to block */ > + if (uwq->pending && !ACCESS_ONCE(uwq->ctx->released)) > + goto out; > + /* len == 0 means wake all */ > + start = range->start; > + len = range->len; > + if (len && (start > uwq->address || start + len <= uwq->address)) > + goto out; > + ret = wake_up_state(wq->private, mode); > + if (ret) > + /* wake only once, autoremove behavior */ > + list_del_init(&wq->task_list); > +out: > + return ret; > +} ... > +static __always_inline int validate_range(struct mm_struct *mm, > + __u64 start, __u64 len) > +{ > + __u64 task_size = mm->task_size; > + > + if (start & ~PAGE_MASK) > + return -EINVAL; > + if (len & ~PAGE_MASK) > + return -EINVAL; > + if (!len) > + return -EINVAL; > + if (start < mmap_min_addr) > + return -EINVAL; > + if (start >= task_size) > + return -EINVAL; > + if (len > task_size - start) > + return -EINVAL; > + return 0; > +} Hey Andrea, Down in userfaultfd_wake_function(), it looks like you intended for a len=0 to mean "wake all". But the validate_range() that we do from userspace has a !len check in it, which keeps us from passing a len=0 in from userspace. Was that "wake all" for some internal use, or is the check too strict? I was trying to use the wake ioctl after an madvise() (as opposed to filling things in using a userfd copy). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933574AbbFWTAc (ORCPT ); Tue, 23 Jun 2015 15:00:32 -0400 Received: from mga03.intel.com ([134.134.136.65]:30809 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933547AbbFWTAV (ORCPT ); Tue, 23 Jun 2015 15:00:21 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,667,1427785200"; d="scan'208";a="593413145" Message-ID: <5589ACC3.3060401@intel.com> Date: Tue, 23 Jun 2015 12:00:19 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Andrea Arcangeli , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-api@vger.kernel.org CC: Pavel Emelyanov , Sanidhya Kashyap , zhang.zhanghailiang@huawei.com, Linus Torvalds , "Kirill A. Shutemov" , Andres Lagar-Cavilla , Paolo Bonzini , Rik van Riel , Mel Gorman , Andy Lutomirski , Hugh Dickins , Peter Feiner , "Dr. David Alan Gilbert" , Johannes Weiner , "Huangpeng (Peter)" Subject: Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization References: <1431624680-20153-1-git-send-email-aarcange@redhat.com> <1431624680-20153-11-git-send-email-aarcange@redhat.com> In-Reply-To: <1431624680-20153-11-git-send-email-aarcange@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/14/2015 10:31 AM, Andrea Arcangeli wrote: > +static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode, > + int wake_flags, void *key) > +{ > + struct userfaultfd_wake_range *range = key; > + int ret; > + struct userfaultfd_wait_queue *uwq; > + unsigned long start, len; > + > + uwq = container_of(wq, struct userfaultfd_wait_queue, wq); > + ret = 0; > + /* don't wake the pending ones to avoid reads to block */ > + if (uwq->pending && !ACCESS_ONCE(uwq->ctx->released)) > + goto out; > + /* len == 0 means wake all */ > + start = range->start; > + len = range->len; > + if (len && (start > uwq->address || start + len <= uwq->address)) > + goto out; > + ret = wake_up_state(wq->private, mode); > + if (ret) > + /* wake only once, autoremove behavior */ > + list_del_init(&wq->task_list); > +out: > + return ret; > +} ... > +static __always_inline int validate_range(struct mm_struct *mm, > + __u64 start, __u64 len) > +{ > + __u64 task_size = mm->task_size; > + > + if (start & ~PAGE_MASK) > + return -EINVAL; > + if (len & ~PAGE_MASK) > + return -EINVAL; > + if (!len) > + return -EINVAL; > + if (start < mmap_min_addr) > + return -EINVAL; > + if (start >= task_size) > + return -EINVAL; > + if (len > task_size - start) > + return -EINVAL; > + return 0; > +} Hey Andrea, Down in userfaultfd_wake_function(), it looks like you intended for a len=0 to mean "wake all". But the validate_range() that we do from userspace has a !len check in it, which keeps us from passing a len=0 in from userspace. Was that "wake all" for some internal use, or is the check too strict? I was trying to use the wake ioctl after an madvise() (as opposed to filling things in using a userfd copy). From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50237) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z7TQt-0002wg-Ft for qemu-devel@nongnu.org; Tue, 23 Jun 2015 15:00:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z7TQo-000758-D4 for qemu-devel@nongnu.org; Tue, 23 Jun 2015 15:00:31 -0400 Received: from mga14.intel.com ([192.55.52.115]:27733) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z7TQo-0006yb-86 for qemu-devel@nongnu.org; Tue, 23 Jun 2015 15:00:26 -0400 Message-ID: <5589ACC3.3060401@intel.com> Date: Tue, 23 Jun 2015 12:00:19 -0700 From: Dave Hansen MIME-Version: 1.0 References: <1431624680-20153-1-git-send-email-aarcange@redhat.com> <1431624680-20153-11-git-send-email-aarcange@redhat.com> In-Reply-To: <1431624680-20153-11-git-send-email-aarcange@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrea Arcangeli , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-api@vger.kernel.org Cc: zhang.zhanghailiang@huawei.com, Pavel Emelyanov , Johannes Weiner , Hugh Dickins , "Dr. David Alan Gilbert" , Sanidhya Kashyap , Andres Lagar-Cavilla , Mel Gorman , Paolo Bonzini , "Kirill A. Shutemov" , "Huangpeng (Peter)" , Andy Lutomirski , Linus Torvalds , Peter Feiner On 05/14/2015 10:31 AM, Andrea Arcangeli wrote: > +static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode, > + int wake_flags, void *key) > +{ > + struct userfaultfd_wake_range *range = key; > + int ret; > + struct userfaultfd_wait_queue *uwq; > + unsigned long start, len; > + > + uwq = container_of(wq, struct userfaultfd_wait_queue, wq); > + ret = 0; > + /* don't wake the pending ones to avoid reads to block */ > + if (uwq->pending && !ACCESS_ONCE(uwq->ctx->released)) > + goto out; > + /* len == 0 means wake all */ > + start = range->start; > + len = range->len; > + if (len && (start > uwq->address || start + len <= uwq->address)) > + goto out; > + ret = wake_up_state(wq->private, mode); > + if (ret) > + /* wake only once, autoremove behavior */ > + list_del_init(&wq->task_list); > +out: > + return ret; > +} ... > +static __always_inline int validate_range(struct mm_struct *mm, > + __u64 start, __u64 len) > +{ > + __u64 task_size = mm->task_size; > + > + if (start & ~PAGE_MASK) > + return -EINVAL; > + if (len & ~PAGE_MASK) > + return -EINVAL; > + if (!len) > + return -EINVAL; > + if (start < mmap_min_addr) > + return -EINVAL; > + if (start >= task_size) > + return -EINVAL; > + if (len > task_size - start) > + return -EINVAL; > + return 0; > +} Hey Andrea, Down in userfaultfd_wake_function(), it looks like you intended for a len=0 to mean "wake all". But the validate_range() that we do from userspace has a !len check in it, which keeps us from passing a len=0 in from userspace. Was that "wake all" for some internal use, or is the check too strict? I was trying to use the wake ioctl after an madvise() (as opposed to filling things in using a userfd copy).