From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64D98C001DC for ; Wed, 19 Jul 2023 21:16:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229635AbjGSVQg (ORCPT ); Wed, 19 Jul 2023 17:16:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229471AbjGSVQg (ORCPT ); Wed, 19 Jul 2023 17:16:36 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12FA11FC1 for ; Wed, 19 Jul 2023 14:16:35 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-573a92296c7so1496707b3.1 for ; Wed, 19 Jul 2023 14:16:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689801394; x=1690406194; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=F9BpFuhMTkAOIbHVpXjSO4oTM9iSvRhAIZS7PoLmzJ4=; b=2rWGTRMAxnD6VeFPQEreW2HWSQArB4kLh/AVFJFeMRRu/b0qI1M/YvEqYUK8drstJH c/O7qo/G7EcS+Ij4KHSS9CoygJOHy5gj8wp3xmV/RasyNFdsxhTWXIzN3WjKFi628Z7Z rIt976xfybkKpbGh6/5efs/pvr+MjFRMYiTTM3VrAUGQ+YfVC370lk74ULcnEvsN3CDB NRDaruRhWjr9gF9nA4RyHuIkv2SMzlJxCucuDfduj2L9I5lcjVw2w2ZwfBjwrBYDCZD4 R2llbtpO92RHOF1V8NVsi+Zl+Wv+rxrxnQsvjkVz34oPQHrkFZaceYSPBoWmhKHdwQq6 pRsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689801394; x=1690406194; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=F9BpFuhMTkAOIbHVpXjSO4oTM9iSvRhAIZS7PoLmzJ4=; b=ewTk61jygyPlfPnCtnfBLgzZ9T26IZ4AF53BPUCN4nxP6rjplFf7vzSFRNVnZImHuc wsd1t2kT1p+WbJGn5H9mNax/RLnV1j09jJAoR7aqcjWBvV5zeESBZPmOoSTYPgIywo8r adKAfhZ11sn7w9yKN4kgMykwpCc/0jmRhLB232Q7BcHrg27I034AZ4u+vEdvBbX09Gwo n8nsGhno1tImUoFWyrxrdT2Lg+MsfxH0l+y9VIVJISWZSSOfIQWub51WJXUp3Xj3fWab dphMfRmSi14IsP3HT3Bfd4g5AzVMN3C6SQTs8aLE9C1wjpF8g/EVYz3Cp9u8QvuUjmCu plEg== X-Gm-Message-State: ABy/qLbtjtv4vi4XVi9cDHpVGbW1tufSI06N6Yn79XZVG/hFrRAqpNoa g0G770C92Qfy2iIyZQeJeIr+lmTObI+uvt8qdyYd X-Google-Smtp-Source: APBJJlHreOXAS0H9GQPXpAnmR0dGnAWP0KSY+RuBCP7YUswW95L59jjmKDrqfTpSNj9UEciNANfkAkMJF+8kz9Tkf4LV X-Received: from axel.svl.corp.google.com ([2620:15c:2a3:200:2c07:36ef:118f:86cf]) (user=axelrasmussen job=sendgmr) by 2002:a81:430c:0:b0:555:cd45:bc3a with SMTP id q12-20020a81430c000000b00555cd45bc3amr198551ywa.9.1689801394300; Wed, 19 Jul 2023 14:16:34 -0700 (PDT) Date: Wed, 19 Jul 2023 14:16:31 -0700 In-Reply-To: <79375b71-db2e-3e66-346b-254c90d915e2@cslab.ece.ntua.gr> Mime-Version: 1.0 References: <79375b71-db2e-3e66-346b-254c90d915e2@cslab.ece.ntua.gr> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230719211631.890995-1-axelrasmussen@google.com> Subject: Re: Using userfaultfd with KVM's async page fault handling causes processes to hung waiting for mmap_lock to be released From: Axel Rasmussen To: Dimitris Siakavaras Cc: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Peter Xu , linux-mm@kvack.org, Axel Rasmussen Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Thanks for the detailed report Dimitris! I've CCed the MM mailing list and some folks who work on userfaultfd. I took a look at this today, but I haven't quite come up with a solution. I thought it might be as easy as changing userfaultfd_release() to set released *after* taking the lock. But no such luck, the ordering is what it is to deal with another subtle case: WRITE_ONCE(ctx->released, true); if (!mmget_not_zero(mm)) goto wakeup; /* * Flush page faults out of all CPUs. NOTE: all page faults * must be retried without returning VM_FAULT_SIGBUS if * userfaultfd_ctx_get() succeeds but vma->vma_userfault_ctx * changes while handle_userfault released the mmap_lock. So * it's critical that released is set to true (above), before * taking the mmap_lock for writing. */ mmap_write_lock(mm); I think perhaps the right thing to do is to have handle_userfault() release mmap_lock when it returns VM_FAULT_NOPAGE, and to have GUP deal with that appropriately? But, some investigation is required to be sure that's okay to do in the other non-GUP ways we can end up in handle_userfault().