From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58E6AC3600C for ; Thu, 3 Apr 2025 17:01:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2ACC5280007; Thu, 3 Apr 2025 13:01:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23575280005; Thu, 3 Apr 2025 13:01:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D680280007; Thu, 3 Apr 2025 13:01:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E01DA280005 for ; Thu, 3 Apr 2025 13:01:39 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 34088140DA7 for ; Thu, 3 Apr 2025 17:01:41 +0000 (UTC) X-FDA: 83293349202.23.488B914 Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) by imf29.hostedemail.com (Postfix) with ESMTP id 37D4D12001A for ; Thu, 3 Apr 2025 17:01:39 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=KmdaZ8rx; spf=pass (imf29.hostedemail.com: domain of "prvs=18185d7dd=kalyazin@amazon.co.uk" designates 52.95.48.154 as permitted sender) smtp.mailfrom="prvs=18185d7dd=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743699699; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fgBFSqRBMvJIhoEL8t5/weTt711thPXAbfYiTxlEOwQ=; b=W/knanJfS4WL5CJcJgKj2Ot7koBioLRPaHnpH6nMi4tDopGJB3ATbizL5esn6bh6J7tBKj gxyCWV/657F6Hkj80es1Kp9l4p/OMB8smSlYP2+8b29eYZr7eGlat3H7ahnPzV+nfv8RNp KIMrBqTLrZN255yL4MD9RQER91lGg7U= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=KmdaZ8rx; spf=pass (imf29.hostedemail.com: domain of "prvs=18185d7dd=kalyazin@amazon.co.uk" designates 52.95.48.154 as permitted sender) smtp.mailfrom="prvs=18185d7dd=kalyazin@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743699699; a=rsa-sha256; cv=none; b=Uw83MlH3rYB3XL40/7gnmpCYX4R74EZYbhzjORvZHRXfyjeBVVsNMx/x6wDnl1jefRfGXD YZvQTR+AGGd9V5N8Yvvr97jh++vunTNVC+G6mJF1WJL2rmvWfE438mrqfJNPMQIjtVv4W2 kb/82yMpUxapDexHcnYS6c1CqJ5AKxI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1743699699; x=1775235699; h=message-id:date:mime-version:reply-to:subject:to:cc: references:from:in-reply-to:content-transfer-encoding; bh=fgBFSqRBMvJIhoEL8t5/weTt711thPXAbfYiTxlEOwQ=; b=KmdaZ8rxkk1YaYO1DJKNVN2fkft9aTY2FCE1xlV+YkV2sLPQu5k8NZgK JIr0S3er6jp9ZOGcFCivJ69F1LOudUf3ks83DNFQlUAP4PbRWHqGYzdJb pFZOxgsRAQcUdKLDYnExenp4mnZMmwPRpXjbBbCwYA7U2JSvFzyytlD+3 A=; X-IronPort-AV: E=Sophos;i="6.15,184,1739836800"; d="scan'208";a="477267433" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-6001.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Apr 2025 17:01:35 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.43.254:22038] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.38.20:2525] with esmtp (Farcaster) id 1e7deb8f-4698-4cb9-98e4-58e352c395be; Thu, 3 Apr 2025 17:01:34 +0000 (UTC) X-Farcaster-Flow-ID: 1e7deb8f-4698-4cb9-98e4-58e352c395be Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19MTAEUC002.ant.amazon.com (10.252.51.245) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Thu, 3 Apr 2025 17:01:34 +0000 Received: from [192.168.28.77] (10.106.82.23) by EX19D022EUC002.ant.amazon.com (10.252.51.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Thu, 3 Apr 2025 17:01:32 +0000 Message-ID: <16763ece-fa3b-4a68-9b3c-5331954d96e7@amazon.com> Date: Thu, 3 Apr 2025 18:01:28 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: Subject: Re: [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs To: James Houghton CC: , , , , , , , , , , , , , , , , , References: <20250402160721.97596-1-kalyazin@amazon.com> <20250402160721.97596-2-kalyazin@amazon.com> Content-Language: en-US From: Nikita Kalyazin Autocrypt: addr=kalyazin@amazon.com; keydata= xjMEY+ZIvRYJKwYBBAHaRw8BAQdA9FwYskD/5BFmiiTgktstviS9svHeszG2JfIkUqjxf+/N JU5pa2l0YSBLYWx5YXppbiA8a2FseWF6aW5AYW1hem9uLmNvbT7CjwQTFggANxYhBGhhGDEy BjLQwD9FsK+SyiCpmmTzBQJnrNfABQkFps9DAhsDBAsJCAcFFQgJCgsFFgIDAQAACgkQr5LK IKmaZPOpfgD/exazh4C2Z8fNEz54YLJ6tuFEgQrVQPX6nQ/PfQi2+dwBAMGTpZcj9Z9NvSe1 CmmKYnYjhzGxzjBs8itSUvWIcMsFzjgEY+ZIvRIKKwYBBAGXVQEFAQEHQCqd7/nb2tb36vZt ubg1iBLCSDctMlKHsQTp7wCnEc4RAwEIB8J+BBgWCAAmFiEEaGEYMTIGMtDAP0Wwr5LKIKma ZPMFAmes18AFCQWmz0MCGwwACgkQr5LKIKmaZPNTlQEA+q+rGFn7273rOAg+rxPty0M8lJbT i2kGo8RmPPLu650A/1kWgz1AnenQUYzTAFnZrKSsXAw5WoHaDLBz9kiO5pAK In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.106.82.23] X-ClientProxiedBy: EX19D009EUA004.ant.amazon.com (10.252.50.97) To EX19D022EUC002.ant.amazon.com (10.252.51.137) X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 37D4D12001A X-Stat-Signature: t5asdju6h1t55rfikoasui9cbtrjuzs4 X-HE-Tag: 1743699698-235537 X-HE-Meta: U2FsdGVkX18XIn767jCeqCih9BPpckTGNM1EYJcpVGMTew1nazX5/oThIwIvKWshNRDeD191gt16SBaEimjCRhg2Ohq7iPPfdZSRhDk6w4Gkyj2SrkBGHaEsMSfIQAQI6MSw0GiRTketfS5nZg/pqWam/t/A4P14ggjvZO3EIOrtnjq4u9bywvwjc/S7Rc9YDxEvqmLKzxl2lx3ef38nYvyJOu3d5GN9LZ3MuCPTmj644+Ik6rBOQ6W1Y6CvvQP3W7ppibqsTL6yXTb0N18/PaTz+0ngUeycrGA9IhwtY+Q5WLyk72DB5XMPpZYDnq5TV9GLfxPU+65mVOvDHbhuZsdsq34U1vXaEDXBbdposQziqvkj6QNLe0hViEnsJYP01XKJHuqtDPLyIfyfhlXiLHMoPiLgYXBDbJqCukuBQ/FtO+XbLTyFtA1747SPVeVpDx3ebl3+ChmphW6TvrcayES2J7wXlk+ezKF+2LlyWcnXssMg11O+9nM4FiTcAS1PVWIvTDgmXMwabuLTMTOXZD8ci6nbelXsX8zHHPT13i2f2tchwSX973XfD0XD54XNBkkOzzDPhE5AygBrT60hfVGkFgBRHmZmTv9g9StmTHBbfSBL+DSIFQEvfVv/IMahWBt9mlkakAU+1dOZGY5QXevBwAbKm043u0rGO1Q6OiDqKuCpnZIzHE0u7k0U1lAQe5gYNJN6FqVSGLV2USACuNo5Y7xO814WAd1Ni49R8HRolqtSyDlrFwmuNCUDcEeXo3cUC4ONP3mvI/IPersiq29eKI+Ha1Ojc/eQp9PHfFOYN1JxE4l1EUIuYYdRE16fyvx6aYzCNTgRWOX6Gf2lIv+m5gavf7186hrCuZejOXMUYTZFPJ33E3UNUDC00dMgomS1TZf36hmKn+ii6kiTrGv/aSt6VG/wIm4e2GO/bGvscXx5zchsy2ZBRYnXOI8zhoy0qGGW8OZYelyzxQ9 5/V22i/a a7xwNYyC9yID+tRn42A6gdWquyClBWjx8eSdBGpNGeEd7jlAWBFva07kmqBzQKlrBmx40+IGZ8iAFhK7pw22GYInmR/eEVLniUpqZcMgy18/q04uUSenhEgAL/+VV4rdY++KBQwhhYi+ypCPFzrGOASTN4XRjp0wKzV0yFLfPDSxhUhryRCFoRx5+DqzBQj8w9Wj+n5nYTZF13OST23nR1ojm7rMskz02Yc0PNTpGv3KR6oR60bqWb2PGDGAe9nbLRfBKEfrJSIlmx4Ew4VatZthGj1pd7EXo9hcS6mUeHEQWlmGlmhvLmYglfh+4R8XlebFL8GTZyulSquNQ1F8y/7foRh/GRwy20epZNR5281hRoNBleLPRfbYkOZNL6D54ty3+7CGnLl7v/M4c9VN5X9xiaHsTopIq+HbPPy6lAEfjXwI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 02/04/2025 20:04, James Houghton wrote: > On Wed, Apr 2, 2025 at 9:07 AM Nikita Kalyazin wrote: >> >> Remove shmem-specific code from UFFDIO_CONTINUE implementation for >> non-huge pages by calling vm_ops->fault(). A new VMF flag, >> FAULT_FLAG_NO_USERFAULT_MINOR, is introduced to avoid recursive call to >> handle_userfault(). >> >> Signed-off-by: Nikita Kalyazin >> --- >> include/linux/mm_types.h | 3 +++ >> mm/hugetlb.c | 2 +- >> mm/shmem.c | 3 ++- >> mm/userfaultfd.c | 25 ++++++++++++++++++------- >> 4 files changed, 24 insertions(+), 9 deletions(-) >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 0234f14f2aa6..91a00f2cd565 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -1429,6 +1429,8 @@ enum tlb_flush_reason { >> * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached. >> * We should only access orig_pte if this flag set. >> * @FAULT_FLAG_VMA_LOCK: The fault is handled under VMA lock. >> + * @FAULT_FLAG_NO_USERFAULT_MINOR: The fault handler must not call userfaultfd >> + * minor handler. > > Perhaps instead a flag that says to avoid the userfaultfd minor fault > handler, maybe we should have a flag to indicate that vm_ops->fault() > has been called by UFFDIO_CONTINUE. See below. > >> * >> * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify >> * whether we would allow page faults to retry by specifying these two >> @@ -1467,6 +1469,7 @@ enum fault_flag { >> FAULT_FLAG_UNSHARE = 1 << 10, >> FAULT_FLAG_ORIG_PTE_VALID = 1 << 11, >> FAULT_FLAG_VMA_LOCK = 1 << 12, >> + FAULT_FLAG_NO_USERFAULT_MINOR = 1 << 13, >> }; >> >> typedef unsigned int __bitwise zap_flags_t; >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 97930d44d460..ba90d48144fc 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -6228,7 +6228,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, >> } >> >> /* Check for page in userfault range. */ >> - if (userfaultfd_minor(vma)) { >> + if (userfaultfd_minor(vma) && !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) { >> folio_unlock(folio); >> folio_put(folio); >> /* See comment in userfaultfd_missing() block above */ >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 1ede0800e846..5e1911e39dec 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -2467,7 +2467,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, >> fault_mm = vma ? vma->vm_mm : NULL; >> >> folio = filemap_get_entry(inode->i_mapping, index); >> - if (folio && vma && userfaultfd_minor(vma)) { >> + if (folio && vma && userfaultfd_minor(vma) && >> + !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) { >> if (!xa_is_value(folio)) >> folio_put(folio); >> *fault_type = handle_userfault(vmf, VM_UFFD_MINOR); >> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c >> index d06453fa8aba..68a995216789 100644 >> --- a/mm/userfaultfd.c >> +++ b/mm/userfaultfd.c >> @@ -386,24 +386,35 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, >> unsigned long dst_addr, >> uffd_flags_t flags) >> { >> - struct inode *inode = file_inode(dst_vma->vm_file); >> - pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); >> struct folio *folio; >> struct page *page; >> int ret; >> + struct vm_fault vmf = { >> + .vma = dst_vma, >> + .address = dst_addr, >> + .flags = FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE | >> + FAULT_FLAG_NO_USERFAULT_MINOR, >> + .pte = NULL, >> + .page = NULL, >> + .pgoff = linear_page_index(dst_vma, dst_addr), >> + }; >> + >> + if (!dst_vma->vm_ops || !dst_vma->vm_ops->fault) >> + return -EINVAL; >> >> - ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); >> - /* Our caller expects us to return -EFAULT if we failed to find folio */ >> - if (ret == -ENOENT) >> + ret = dst_vma->vm_ops->fault(&vmf); > > shmem_get_folio() was being called with SGP_NOALLOC, and now it is > being called with SGP_CACHE (by shmem_fault()). This will result in a > UAPI change: UFFDIO_CONTINUE for a VA without a page in the page cache > should result in EFAULT, but now the page will be allocated. > SGP_NOALLOC was carefully chosen[1], so I think a better way to do > this will be to: > > 1. Have a FAULT_FLAG_USERFAULT_CONTINUE (or something) > 2. In shmem_fault(), if FAULT_FLAG_USERFAULT_CONTINUE, use SGP_NOALLOC > instead of SGP_CACHE (and make sure not to drop into > handle_userfault(), of course) Yes, that is very true. Thanks for pointing out. Will apply in the next version. > > [1]: https://lore.kernel.org/linux-mm/20220610173812.1768919-1-axelrasmussen@google.com/ > >> + if (ret & VM_FAULT_ERROR) { >> ret = -EFAULT; >> - if (ret) >> goto out; >> + } >> + >> + page = vmf.page; >> + folio = page_folio(page); >> if (!folio) { > > What if ret == VM_FAULT_RETRY? I think we should retry instead instead > of returning -EFAULT. True. I'm going to retry the vm_ops->fault() call in this case. > > And I'm not sure how VM_FAULT_NOPAGE should be handled, like if we > need special logic for it or not. As far as I see, the only case it can come up in shmem is during a fault when a hole is being punched [1]. I'm thinking of returning EAGAIN to the userspace if this happens. [1] https://elixir.bootlin.com/linux/v6.14-rc6/source/mm/shmem.c#L2670 > >> ret = -EFAULT; >> goto out; >> } >> >> - page = folio_file_page(folio, pgoff); >> if (PageHWPoison(page)) { >> ret = -EIO; >> goto out_release; >> -- >> 2.47.1 >>