From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AA9B31717F; Tue, 31 Mar 2026 15:24:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774970643; cv=none; b=J2zQm1m8f0mP+JDsZK+gIdHHdrCKE6mEpND1Q5TZNpFuwCW+5oRZdOMX9czrmaRfv9+/EqB6xt/Wbi/ooZ87JNhqUWrGsb5BiSX6MzZ6xYNU+clz6gCC/lQhc+8GLzgoJtRpm7v08zuHBR7+E2ydGGT9BvUlaVknehQ83SsaIz0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774970643; c=relaxed/simple; bh=3mA8yhmwSAY4oxzzAbFQyFqOh6xAwsxwgZjjBw72kWw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=m+ydJzjurX4pDhp8LsHqw91v0WlgG7KMRNoRLMID6sJpSMPTJUupGELRVLIqwLb197U8ZmV49BGQ5lzmYMGiCIpzaHMmd9DBFA4Pe6m+hr9eIxit/2BjqDm7pvh/kFo+x9Z2SSjBcYI9BxMf7p+3wyPfJO5mSGv6rfWOHscO2hE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YKqiA6JX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YKqiA6JX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 925D4C19423; Tue, 31 Mar 2026 15:24:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774970642; bh=3mA8yhmwSAY4oxzzAbFQyFqOh6xAwsxwgZjjBw72kWw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=YKqiA6JXJp8cZOzS0sGVZ37Nc19QkYMmlautv+TkliDyiVF4PeV3zMHiNLwUI4RK/ vuWX6IwSC+JpiWxF9sDdC+Rknar2iLWvFVMr216e5uKXpDJfqlRHSh0ta6rLRFbcsf 7x4HHDlGp64qAgFz+icMggzhUz45aVTmLjG5bukEVsMCsNdEocNsM2Bf4FT8xAnK7j pYl891rWsuPwvkqM5G8WhKYxoV9vI7bQWZq07rcrh6tLx3TiqZ7TkIrU8odYJo5Ybe NUWAWRKrHOD//+MV22oUMi4aPYxtvImourkUYscAFQtxeC19vapANRluBjwes0uwf3 BkBzpSX8Kjucw== Date: Wed, 1 Apr 2026 00:24:01 +0900 From: "Harry Yoo (Oracle)" To: Mike Rapoport Cc: Andrew Morton , Andrea Arcangeli , Andrei Vagin , Axel Rasmussen , Baolin Wang , David Hildenbrand , Hugh Dickins , James Houghton , "Liam R. Howlett" , "Lorenzo Stoakes (Oracle)" , "Matthew Wilcox (Oracle)" , Michal Hocko , Muchun Song , Nikita Kalyazin , Oscar Salvador , Paolo Bonzini , Peter Xu , Sean Christopherson , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 02/15] userfaultfd: introduce struct mfill_state Message-ID: References: <20260330101116.1117699-1-rppt@kernel.org> <20260330101116.1117699-3-rppt@kernel.org> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, Mar 31, 2026 at 05:32:28PM +0300, Mike Rapoport wrote: > Hi Harry, > > On Tue, Mar 31, 2026 at 04:03:13PM +0900, Harry Yoo (Oracle) wrote: > > On Mon, Mar 30, 2026 at 01:11:03PM +0300, Mike Rapoport wrote: > > > From: "Mike Rapoport (Microsoft)" > > > > > > mfill_atomic() passes a lot of parameters down to its callees. > > > > > > Aggregate them all into mfill_state structure and pass this structure to > > > functions that implement various UFFDIO_ commands. > > > > > > Tracking the state in a structure will allow moving the code that retries > > > copying of data for UFFDIO_COPY into mfill_atomic_pte_copy() and make the > > > loop in mfill_atomic() identical for all UFFDIO operations on PTE-mapped > > > memory. > > > > > > The mfill_state definition is deliberately local to mm/userfaultfd.c, > > > hence shmem_mfill_atomic_pte() is not updated. > > > > > > [harry.yoo@oracle.com: properly initialize mfill_state.len to fix > > > folio_add_new_anon_rmap() WARN] > > > Link: https://lkml.kernel.org/r/abehBY7QakYF9bK4@hyeyoo > > > Signed-off-by: Mike Rapoport (Microsoft) > > > Signed-off-by: Harry Yoo > > > Acked-by: David Hildenbrand (Arm) > > > --- > > > mm/userfaultfd.c | 148 ++++++++++++++++++++++++++--------------------- > > > 1 file changed, 82 insertions(+), 66 deletions(-) > > > > > > @@ -790,12 +804,14 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > > > uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) > > > goto out_unlock; > > > > > > - while (src_addr < src_start + len) { > > > - pmd_t dst_pmdval; > > > + state.vma = dst_vma; > > > > Oh wait, the lock leak was introduced in patch 2. > > Lock leak was introduced in patch 4 that moved getting the vma. Still not sure what I could possibly be missing, so let me try again. when I check out to this commit "userfaultfd: introduce struct mfill_state" I see: | static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, | unsigned long dst_start, | unsigned long src_start, | unsigned long len, | uffd_flags_t flags) | { | struct mfill_state state = (struct mfill_state){ | .ctx = ctx, | .dst_start = dst_start, | .src_start = src_start, .flags = flags, | .len = len, | .src_addr = src_start, | .dst_addr = dst_start, | }; | [ ...snip...] | retry: | /* | * Make sure the vma is not shared, that the dst range is | * both valid and fully within a single existing vma. | */ | dst_vma = uffd_mfill_lock(dst_mm, dst_start, len); It acquires the vma lock (or mmap_lock) here, but doesn't set state.vma. | if (IS_ERR(dst_vma)) { | err = PTR_ERR(dst_vma); | goto out; | } | | /* | * If memory mappings are changing because of non-cooperative | * operation (e.g. mremap) running in parallel, bail out and | * request the user to retry later | */ | down_read(&ctx->map_changing_lock); | err = -EAGAIN; | if (atomic_read(&ctx->mmap_changing)) | goto out_unlock; | | err = -EINVAL; | /* | * shmem_zero_setup is invoked in mmap for MAP_ANONYMOUS|MAP_SHARED but | * it will overwrite vm_ops, so vma_is_anonymous must return false. | */ | if (WARN_ON_ONCE(vma_is_anonymous(dst_vma) && | dst_vma->vm_flags & VM_SHARED)) | | /* | * validate 'mode' now that we know the dst_vma: don't allow | * a wrprotect copy if the userfaultfd didn't register as WP. | */ | if ((flags & MFILL_ATOMIC_WP) && !(dst_vma->vm_flags & VM_UFFD_WP)) | goto out_unlock; | | /* | * If this is a HUGETLB vma, pass off to appropriate routine | */ | if (is_vm_hugetlb_page(dst_vma)) | return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, | src_start, len, flags); | | if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) | goto out_unlock; | if (!vma_is_shmem(dst_vma) && | uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) | goto out_unlock; | | state.vma = dst_vma; It is set here. So if anything before this jumps to `out_unlock` label due to a sanity check, [...] | while (state.src_addr < src_start + len) { | VM_WARN_ON_ONCE(state.dst_addr >= dst_start + len); | | pmd_t dst_pmdval; | [...] | | out_unlock: | up_read(&ctx->map_changing_lock); | uffd_mfill_unlock(state.vma); the `vma` parameter will be NULL? If I'm not missing something this is introduced in patch 2 and fixed in patch 4. | out: | if (state.folio) | folio_put(state.folio); | VM_WARN_ON_ONCE(copied < 0); | VM_WARN_ON_ONCE(err > 0); | VM_WARN_ON_ONCE(!copied && !err); | return copied ? copied : err; | } > > > @@ -866,10 +882,10 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > > > > > > out_unlock: > > > up_read(&ctx->map_changing_lock); > > > - uffd_mfill_unlock(dst_vma); > > > + uffd_mfill_unlock(state.vma); > > > out: > > > - if (folio) > > > - folio_put(folio); > > > + if (state.folio) > > > + folio_put(state.folio); > > > > Sashiko raised a concern [2] that it the VMA might be unmapped and > > a new mapping created as a uffd hugetlb vma and leak the folio by > > going through > > > > `if (is_vm_hugetlb_page(dst_vma)) > > return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, > > src_start, len, flags);` > > > > but it appears to be a false positive (to me) because > > > > `if (atomic_read(&ctx->mmap_changing))` check should have detected unmapping > > and free the folio? > > I think it's real, and it's there more or less from the beginning, although > nobody hit it yet :) > > Before retrying the copy we drop all the locks, so if the copy is really > long the old mapping can be wiped and a new mapping can be created instead. Oops, perhaps I should have imagined harder :) > There's already a v4 of a patch that attempts to solve this: > > https://lore.kernel.org/all/20260331134158.622084-1-devnexen@gmail.com Thanks for the pointer! > > [2] https://sashiko.dev/#/patchset/20260330101116.1117699-1-rppt%40kernel.org?patch=13671 > > > > > VM_WARN_ON_ONCE(copied < 0); > > > VM_WARN_ON_ONCE(err > 0); > > > VM_WARN_ON_ONCE(!copied && !err); -- Cheers, Harry / Hyeonggon