From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02ED8C77B7F for ; Fri, 27 Jun 2025 18:46:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9930B6B009F; Fri, 27 Jun 2025 14:46:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 96AC16B00BD; Fri, 27 Jun 2025 14:46:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 880A76B00C0; Fri, 27 Jun 2025 14:46:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 75B8E6B009F for ; Fri, 27 Jun 2025 14:46:49 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A95321602C4 for ; Fri, 27 Jun 2025 18:46:48 +0000 (UTC) X-FDA: 83602062096.19.A975BC3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 805CC4000A for ; Fri, 27 Jun 2025 18:46:46 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TOi9rIA1; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751050006; a=rsa-sha256; cv=none; b=JIcuYF2qcawbctbS4LfQDZjxRKAiHIeMIFay6zbhQ09BuKce+QwXlQu8E8dy/P2Zdr3FVv YCeeLdSEaSbI+/QQu9Fp0KsoHJygYgV3c5WFUrNkayyTkCsg3Jm12MZ6fTG0eogDMZ/dy+ aGoFysrPeJCfgWifrJESygp3FqgFJDU= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TOi9rIA1; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751050006; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wPsuOShwW8rcUabyA/w40dAw+qv83nK9tnkMUhJXu8o=; b=SUMSlQKK1D7QnxBV5OS8GhZUza0kSU1BP6tWJ3KC6kZ9YH74F+0G/O2EGiXoZyZQxyck15 8F8JsMA4jqkqHz9zyWq3qI9FyEpI3SVDSkoQ/xpD74zet7bFqoLVV00qwCrbfCa9NYoogz vPPAiGjXRE4y+hpQddDOEWnCEgQe7bo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751050005; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wPsuOShwW8rcUabyA/w40dAw+qv83nK9tnkMUhJXu8o=; b=TOi9rIA1kgJ1BvkIXPsdw4gIKzkYVcVIQ9Dn/TohFf3p1fQWPiOPqI/gUih4lPrPDnbfxu tQc7u3HF5e88GI1Zp0Rb/90exssubo854T/2pkqRj7zpr2WlHEB8JbTA0z5sknTBnNlvIa c2j6w6etyZadIwYP+xbhW63NRqBYJ2o= Received: from mail-oi1-f198.google.com (mail-oi1-f198.google.com [209.85.167.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-315-sHAQ_TmEOBm4sQsSQsVp-w-1; Fri, 27 Jun 2025 14:46:44 -0400 X-MC-Unique: sHAQ_TmEOBm4sQsSQsVp-w-1 X-Mimecast-MFC-AGG-ID: sHAQ_TmEOBm4sQsSQsVp-w_1751050003 Received: by mail-oi1-f198.google.com with SMTP id 5614622812f47-403317cd1ffso1125674b6e.2 for ; Fri, 27 Jun 2025 11:46:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751050003; x=1751654803; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=wPsuOShwW8rcUabyA/w40dAw+qv83nK9tnkMUhJXu8o=; b=GCwMuQBJGgbtc0NCNAkxFHOYYUm0+ZJfs153/KCETLQawUA70LgFYFPS8RldJ9DAib ooWU2DJOeG/Ef5KJ8+6tqN2whut5UguEUVX4ho51bGUSLsnuASLuLv4c9iePZyKa0vA0 Ultxp+fGnrr5rTM4f5/5VB0iTJQcJ3PS/PvxG2MdbDjkvUj5ybKMyul5cuwT4QtlK+gc r5p7ArTBuuj0yeEZqQxgo9PkU9t28fzLrrW0zYUvB1UdZeEz6rqXNPsnxesEaeO/BqoT 0aOuiZ709JJ/RyQ6JlLK2dq2wuFtSL4FXvQ3MgraKv+trhDk60ank5uw88AgyRbaC79S M8ow== X-Gm-Message-State: AOJu0Yyd1WhW14r7xKN+WnGpKHx1gCy4e6g7AIR6MmyXC6fYBEPhmRT5 BbIYHD9I5TfZmKrEGRjDeDVdHNUPk8GSRbaOcCUz74zzkXpDtJi7/eZr197jcRCIJmztUTddO49 4AGHLiqhHmIaRDMjrQ+34qMNPkPgJwbo8M+ymL5DLjudwaQKp5Bek X-Gm-Gg: ASbGncsMBP4iGs6hTCYgc1VyXFBkk9IkaT8xNei6UwAQoyYIwH4DD1B4GTQhg8CE1/x D7/Z6sfVHsXrJ5pAGIcxczWpLEHd3WJ8JvKxkqcJiJiB0jTPI/yoLMn1KZjOznFDWykjGI6nxxi BTrGLcgU3bmkCUw9ROhlsnTfc7EsAkp8ZiDFPbH/Hwqpx8oYwHBEffbjtek8kJZB8IQlYqSs/AD Lnkjs3S5bOglo69ZXcU2S0vMKgBd+jGDAsXsam61bt0pOPvBqmJmqFqiq0H28Q1fArZpo2ySiLX vzfimIkHxOrL5Q== X-Received: by 2002:a05:6808:1887:b0:3f6:6d8f:1365 with SMTP id 5614622812f47-40b33c181f7mr3137111b6e.3.1751050003463; Fri, 27 Jun 2025 11:46:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGBkeyZbHONdPICNY/ngZpaDDN33V/Cjjb5FkFPOhcDzAO8OcJNj0te7ITvNSIpnVPEr+Vc1Q== X-Received: by 2002:a05:6808:1887:b0:3f6:6d8f:1365 with SMTP id 5614622812f47-40b33c181f7mr3137094b6e.3.1751050003038; Fri, 27 Jun 2025 11:46:43 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 5614622812f47-40b324050c7sm470446b6e.30.2025.06.27.11.46.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jun 2025 11:46:42 -0700 (PDT) Date: Fri, 27 Jun 2025 14:46:31 -0400 From: Peter Xu To: Nikita Kalyazin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Hugh Dickins , Oscar Salvador , Michal Hocko , David Hildenbrand , Muchun Song , Andrea Arcangeli , Ujwal Kundur , Suren Baghdasaryan , Andrew Morton , Vlastimil Babka , "Liam R . Howlett" , James Houghton , Mike Rapoport , Lorenzo Stoakes , Axel Rasmussen Subject: Re: [PATCH 0/4] mm/userfaultfd: modulize memory types Message-ID: References: <20250620190342.1780170-1-peterx@redhat.com> <114133f5-0282-463d-9d65-3143aa658806@amazon.com> <7666ee96-6f09-4dc1-8cb2-002a2d2a29cf@amazon.com> <7455220c-e35b-4509-b7c3-a78fde5b12d5@amazon.com> MIME-Version: 1.0 In-Reply-To: <7455220c-e35b-4509-b7c3-a78fde5b12d5@amazon.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: p5DOr0IrQe-NwoZaNZY1D7ESXuVnQH8MXcEBHDDkgY8_1751050003 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 805CC4000A X-Stat-Signature: wm3yek9eh5quukqpto4sdw6cs9p5tkcc X-Rspam-User: X-HE-Tag: 1751050006-965279 X-HE-Meta: U2FsdGVkX1+spcA0dYGL55kFYYFwVh4wnlOiDsc6yMaQJzl4buJLrElE8aXWM9ciAmg5n5dVUxSfQEbbzBhveJxj6VOmMj3M2XTjSSAsQi7t3/7OvSy91f5Na1jWhrhTTKxwiv1Cm3lKlrRTQj/84N/XIz9nPu3jCiycfFxum7wY/ooGmAeo3sg1hPnDbAJgj56m7bWvApAm9+kJUSzs+uMgmsUPDGiYJBz3R24mECTMh+vi+XPnGcuTMP3O1HIOUuQOr97BFzF2KyidxuSlDWtPLxdm769H5c6t2JcOR6tFhQ4+amlxKil2zDO3VHOPhgwjY9xT/sf6d8U93O39OiXORnJcuq31O8/55fSupAYGPeQ2LzKNuSsUPuiWtlFkgmFuMDQUdgHm13iVDZXicKmEFTF8HzLi6oqsPKZfgeqD1avpC2L04946c4KFzbxsV36e6/lKZ+IWziEmuN8t9n12CV0TWmD/x52HWU2cqsu9ezO5/Epgw2TTw3irij082mbXqqyXJTEbQVd+6/nKv+igBfEys4/jOVUYnvufE8WSCfuX7e63d7fqf3UUkgy0q58h8K2nILjb2jd96cSRCxcoI85Db0E38TLbsSqdlZofGYANrixTH1eYqrzBOQtMiM412iS25UfM8Wf3R3wAo4rz8DbjdWeisAxULPpbFldpv7xOEH+s9RNTRgR8Spp4c8sEhuhjT5V7BSQD8gD3/7BF6SeIpTmny6bmltHKBMgZOCXGf9p3++JEK8R9pXHycvBTC7LG0hI2i3kXyUUXIvME8cU5D9fVbZVVBhs1E0MusNSYGjg+oVmpAtZ384zlQT+pQWhkDVQkIPaYy+TOQOmhoq+R6tCnzSwCW1UsM/hJe47bONTN+4viimYmVyPzKhOoEIx0VS0quQBiWJJ4JP2zIPQMniEUD7hTFZCWs+FbNMHFy3SWZG0k706e8KrmcFcVo5k4ar7pB3yw3DE GTjWFNWS lH4PQU4PBvczP4JYyFXWl3MaPs8TUP05afeNlBkhgjnDADgOAL41yxBxOhptYeVjH1G7A18FTXmaX1zupFmCohr92NXLYyEGFrmp5vYrdH9rY6uO+vZ7DZ0gfbmimp0+SAmT1waxFmjfgw/YWwFc58kiSu8sVPnDk4SI8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 27, 2025 at 05:59:49PM +0100, Nikita Kalyazin wrote: > > > On 27/06/2025 14:51, Peter Xu wrote: > > On Thu, Jun 26, 2025 at 05:09:47PM +0100, Nikita Kalyazin wrote: > > > > > > > > > On 25/06/2025 21:17, Peter Xu wrote: > > > > On Wed, Jun 25, 2025 at 05:56:23PM +0100, Nikita Kalyazin wrote: > > > > > > > > > > > > > > > On 20/06/2025 20:03, Peter Xu wrote: > > > > > > [based on akpm/mm-new] > > > > > > > > > > > > This series is an alternative proposal of what Nikita proposed here on the > > > > > > initial three patches: > > > > > > > > > > > > https://lore.kernel.org/r/20250404154352.23078-1-kalyazin@amazon.com > > > > > > > > > > > > This is not yet relevant to any guest-memfd support, but paving way for it. > > > > > > > > > > Hi Peter, > > > > > > > > Hi, Nikita, > > > > > > > > > > > > > > Thanks for posting this. I confirmed that minor fault handling was working > > > > > for guest_memfd based on this series and looked simple (a draft based on > > > > > mmap support in guest_memfd v7 [1]): > > > > > > > > Thanks for the quick spin, glad to know it works. Some trivial things to > > > > mention below.. > > > > > > Following up, I drafted UFFDIO_COPY support for guest_memfd to confirm it > > > works as well: > > > > Appreciated. > > > > Since at it, I'll comment quickly below. > > > > > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > > index 8c44e4b9f5f8..b5458a22fff4 100644 > > > --- a/virt/kvm/guest_memfd.c > > > +++ b/virt/kvm/guest_memfd.c > > > @@ -349,12 +349,19 @@ static bool kvm_gmem_offset_is_shared(struct file > > > *file, pgoff_t index) > > > > > > static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) > > > { > > > + struct vm_area_struct *vma = vmf ? vmf->vma : NULL; > > > struct inode *inode = file_inode(vmf->vma->vm_file); > > > struct folio *folio; > > > vm_fault_t ret = VM_FAULT_LOCKED; > > > > > > filemap_invalidate_lock_shared(inode->i_mapping); > > > > > > + folio = filemap_get_entry(inode->i_mapping, vmf->pgoff); > > > + if (!folio && vma && userfaultfd_missing(vma)) { > > > + filemap_invalidate_unlock_shared(inode->i_mapping); > > > + return handle_userfault(vmf, VM_UFFD_MISSING); > > > + } > > > > Likely a possible refcount leak when folio != NULL here. > > Thank you. I was only aiming to cover the happy case for know. I will keep > it in mind for the future. Yep that's good enough, thanks. It's really something I'd comment passingly, it's definitely reassuring to know the happy case works. > > > + > > > folio = kvm_gmem_get_folio(inode, vmf->pgoff); > > > if (IS_ERR(folio)) { > > > int err = PTR_ERR(folio); > > > @@ -438,10 +445,57 @@ static int kvm_gmem_uffd_get_folio(struct inode > > > *inode, pgoff_t pgoff, > > > return 0; > > > } > > > > > > +static int kvm_gmem_mfill_atomic_pte(pmd_t *dst_pmd, > > > + struct vm_area_struct *dst_vma, > > > + unsigned long dst_addr, > > > + unsigned long src_addr, > > > + uffd_flags_t flags, > > > + struct folio **foliop) > > > +{ > > > + struct inode *inode = file_inode(dst_vma->vm_file); > > > + pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); > > > + struct folio *folio; > > > + int ret; > > > + > > > + folio = kvm_gmem_get_folio(inode, pgoff); > > > + if (IS_ERR(folio)) { > > > + ret = PTR_ERR(folio); > > > + goto out; > > > + } > > > + > > > + folio_unlock(folio); > > > + > > > + if (uffd_flags_mode_is(flags, MFILL_ATOMIC_COPY)) { > > > + void *vaddr = kmap_local_folio(folio, 0); > > > + ret = copy_from_user(vaddr, (const void __user *)src_addr, PAGE_SIZE); > > > + kunmap_local(vaddr); > > > + if (unlikely(ret)) { > > > + *foliop = folio; > > > + ret = -ENOENT; > > > + goto out; > > > + } > > > + } else { /* ZEROPAGE */ > > > + clear_user_highpage(&folio->page, dst_addr); > > > + } > > > + > > > + kvm_gmem_mark_prepared(folio); > > > > Since Faud's series hasn't yet landed, so I'm almost looking at the current > > code base with an imagination of what might happen. > > > > In general, missing trapping for guest-memfd could start to be slightly > > trickier. So far IIUC guest-memfd cache pool needs to be populated only by > > a prior fallocate() syscall, not during fault. So I suppose we will need > > to use uptodate bit to mark folio ready, like what's done here. > > I don't think I'm familiar with the fallocate() requirement in guest_memfd. > Fuad's v12 [1] (although I think it has been like that from the beginning) > calls kvm_gmem_get_folio() that populates pagecache in the fault handler > (kvm_gmem_fault_shared()). SEV [2] and TDX [3] seem to use > kvm_gmem_populate() for both allocation and preparation. I actually didn't notice fault() uses kvm_gmem_get_folio(), which has FGP_CREAT indeed. I checked Ackerley's latest 1G patchset, which also did the same that kvm_gmem_get_folio() will invoke the custom allocator to allocate 1G pages even during a fault(). Not sure whether it's intentional though, for example, if the tests in userspace always does fallocate() then the code should run the same, and FGP_CREAT will just never be used. Thanks for pointing this out. I definitely didn't notice this trivial detail before. Looks like it's not a major issue, if the folio can be dynamically allocated, then MISSING mode (if/when it'll be supported) can capture both "!folio" and "folio && !uptodate" cases here as missing. > > [1] https://lore.kernel.org/kvm/20250611133330.1514028-1-tabba@google.com/T/#m15b53a741e4f328e61f995a01afb9c4682ffe611 > [2] https://elixir.bootlin.com/linux/v6.16-rc3/source/arch/x86/kvm/svm/sev.c#L2331 > [3] https://elixir.bootlin.com/linux/v6.16-rc3/source/arch/x86/kvm/vmx/tdx.c#L3236 -- Peter Xu