From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59506) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4Y5V-0002Gj-L3 for qemu-devel@nongnu.org; Mon, 15 Jun 2015 13:22:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z4Y5T-0000Mo-6z for qemu-devel@nongnu.org; Mon, 15 Jun 2015 13:22:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57939) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4Y5T-0000Mf-0m for qemu-devel@nongnu.org; Mon, 15 Jun 2015 13:22:19 -0400 From: Andrea Arcangeli Date: Mon, 15 Jun 2015 19:22:08 +0200 Message-Id: <1434388931-24487-5-git-send-email-aarcange@redhat.com> In-Reply-To: <1434388931-24487-1-git-send-email-aarcange@redhat.com> References: <1434388931-24487-1-git-send-email-aarcange@redhat.com> Subject: [Qemu-devel] [PATCH 4/7] userfaultfd: avoid missing wakeups during refile in userfaultfd_read List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, kvm@vger.kernel.org Cc: zhang.zhanghailiang@huawei.com, Pavel Emelyanov , Johannes Weiner , Hugh Dickins , "Dr. David Alan Gilbert" , Sanidhya Kashyap , Dave Hansen , Andres Lagar-Cavilla , Mel Gorman , Paolo Bonzini , "Kirill A. Shutemov" , "Huangpeng (Peter)" , Andy Lutomirski , Linus Torvalds , Peter Feiner During the refile in userfaultfd_read both waitqueues could look empty to the lockless wake_userfault(). Use a seqcount to prevent this false negative that could leave an userfault blocked. Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 8286ec8..f9e11ec 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -45,6 +45,8 @@ struct userfaultfd_ctx { wait_queue_head_t fault_wqh; /* waitqueue head for the pseudo fd to wakeup poll/read */ wait_queue_head_t fd_wqh; + /* a refile sequence protected by fault_pending_wqh lock */ + struct seqcount refile_seq; /* pseudo fd refcounting */ atomic_t refcount; /* userfaultfd syscall flags */ @@ -547,6 +549,15 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait, uwq = find_userfault(ctx); if (uwq) { /* + * Use a seqcount to repeat the lockless check + * in wake_userfault() to avoid missing + * wakeups because during the refile both + * waitqueue could become empty if this is the + * only userfault. + */ + write_seqcount_begin(&ctx->refile_seq); + + /* * The fault_pending_wqh.lock prevents the uwq * to disappear from under us. * @@ -570,6 +581,8 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait, list_del(&uwq->wq.task_list); __add_wait_queue(&ctx->fault_wqh, &uwq->wq); + write_seqcount_end(&ctx->refile_seq); + /* careful to always initialize msg if ret == 0 */ *msg = uwq->msg; spin_unlock(&ctx->fault_pending_wqh.lock); @@ -648,6 +661,9 @@ static void __wake_userfault(struct userfaultfd_ctx *ctx, static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx, struct userfaultfd_wake_range *range) { + unsigned seq; + bool need_wakeup; + /* * To be sure waitqueue_active() is not reordered by the CPU * before the pagetable update, use an explicit SMP memory @@ -663,8 +679,13 @@ static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx, * userfaults yet. So we take the spinlock only when we're * sure we've userfaults to wake. */ - if (waitqueue_active(&ctx->fault_pending_wqh) || - waitqueue_active(&ctx->fault_wqh)) + do { + seq = read_seqcount_begin(&ctx->refile_seq); + need_wakeup = waitqueue_active(&ctx->fault_pending_wqh) || + waitqueue_active(&ctx->fault_wqh); + cond_resched(); + } while (read_seqcount_retry(&ctx->refile_seq, seq)); + if (need_wakeup) __wake_userfault(ctx, range); } @@ -1223,6 +1244,7 @@ static void init_once_userfaultfd_ctx(void *mem) init_waitqueue_head(&ctx->fault_pending_wqh); init_waitqueue_head(&ctx->fault_wqh); init_waitqueue_head(&ctx->fd_wqh); + seqcount_init(&ctx->refile_seq); } /**