From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C198C61DA4 for ; Fri, 27 Jan 2023 15:32:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4ADC6B0072; Fri, 27 Jan 2023 10:32:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A21BC6B0073; Fri, 27 Jan 2023 10:32:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C21C6B0074; Fri, 27 Jan 2023 10:32:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7EA2F6B0072 for ; Fri, 27 Jan 2023 10:32:43 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 35BC9405C8 for ; Fri, 27 Jan 2023 15:32:43 +0000 (UTC) X-FDA: 80400971406.06.7EE162D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf11.hostedemail.com (Postfix) with ESMTP id B3B2C40005 for ; Fri, 27 Jan 2023 15:32:39 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EVWu+hrB; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674833560; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sJQRlyLqASk+uXdbQGFsiJ2d1fNp4UuhBEQ/arPq7Lc=; b=VThKciD0T5Q1GCJ10nOXWr3irZWmrVFz0++VTmZwMDOib7bGXYcfJRLB8bZArfGEFvJ2rq ouMXoIAmvzm5/R8oH0rvyjYNFT4Bdx+S8Ygf4SqDw7FE/EN3Ka7hB1O0L5l9tn+TzvHI59 OduiO24gSY2/+ULtx06AAfevlZcQLR8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EVWu+hrB; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674833560; a=rsa-sha256; cv=none; b=QIhTxx1h31ekqMiTzTNTK2AuROO1r4WiLVBgVKZgNK8A3pBa5DZ35adH7icUDOynlezNjE JRVbmF+4REojC76tE+aP7gvnnUMYkNJg3SWFITVljfmXO/HrGvvrkLvtqScoM2BorvZKEw K55LSL+SQhbWhKP5PvMSzJe/YARuRGw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674833559; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sJQRlyLqASk+uXdbQGFsiJ2d1fNp4UuhBEQ/arPq7Lc=; b=EVWu+hrB7xGpheVe6I4Gv//6MEER6r/3pEfZiD4oVdkGWiAdIetr/FBAaxE/tFCd6/mglw u0fQ7CLY23nA0SoL6E1p2WOAnZGhnnZp39e5HZD6vwC7bM6JVnvQhSX8aBeTcGGEoC4ck6 /hYsN3MDFas3Tjdf7J8cl5y2tZyysjY= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-611-hdwbB2D-POyPeXfVBJ2Big-1; Fri, 27 Jan 2023 10:32:37 -0500 X-MC-Unique: hdwbB2D-POyPeXfVBJ2Big-1 Received: by mail-qk1-f197.google.com with SMTP id u10-20020a05620a0c4a00b00705e77d6207so3154757qki.5 for ; Fri, 27 Jan 2023 07:32:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=sJQRlyLqASk+uXdbQGFsiJ2d1fNp4UuhBEQ/arPq7Lc=; b=M91beHntEEm/DbwrrD/r09duhIXJ977x6rn3ntYUGMiaOrFV0cOiG76+TBRpJNFGKj hxgfRtW9CtITjB0YWzeQxSMByj/X5kZQJ/Lf5H4mKFDQ2e7KNoG++7U8sevqk52auBB7 21P+hz3IoWEksTSqc9L9JnCwxhbEST4JUXIspOnXgKDXtDKP43SMC89CCdklXkXmcL6d 0teM4XP8vg1q2xQ0Z57mywrctlAPc/Ehm1SywtndZkIHmlNsEo7c1JEGwYgoVW4ZsJoK 42KZ/a/V2hAhGekcDB8fRXFjvkDFMxv9sYotwLBKABIEN2UtPHUX7xQrYMXKCJijLxMN Vw6A== X-Gm-Message-State: AFqh2kpaG9FI4K5eKt0YSpHgywCqyzIBQGeg15bncsuNDcx16yE0CTbm MIIVAra6p6m+pnmfYBI1llgmT6f45L8T8lE7+RO6Q4a4sGuJJ+KPfgbqxBdaYl/bCEFEMl7JUUg DgHO8GpCWWAM= X-Received: by 2002:ac8:45c4:0:b0:3b6:34b0:fc9c with SMTP id e4-20020ac845c4000000b003b634b0fc9cmr54937827qto.42.1674833557017; Fri, 27 Jan 2023 07:32:37 -0800 (PST) X-Google-Smtp-Source: AMrXdXt0k4/DU6oVKnRtGvExXLnbSo8VJTo48yXrLbOJjyyP3PKeDnkAkB1/MMIbIrafuBgbyw9qpA== X-Received: by 2002:ac8:45c4:0:b0:3b6:34b0:fc9c with SMTP id e4-20020ac845c4000000b003b634b0fc9cmr54937805qto.42.1674833556730; Fri, 27 Jan 2023 07:32:36 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id j18-20020ac84052000000b003b635a5d56csm2821434qtl.30.2023.01.27.07.32.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Jan 2023 07:32:35 -0800 (PST) Date: Fri, 27 Jan 2023 10:32:33 -0500 From: Peter Xu To: Muhammad Usama Anjum Cc: David Hildenbrand , Andrew Morton , =?utf-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Mike Rapoport , Nadav Amit , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Subject: Re: [PATCH v8 1/4] userfaultfd: Add UFFD WP Async support Message-ID: References: <20230124084323.1363825-1-usama.anjum@collabora.com> <20230124084323.1363825-2-usama.anjum@collabora.com> <1968dff9-f48a-3290-a15b-a8b739f31ed2@collabora.com> MIME-Version: 1.0 In-Reply-To: <1968dff9-f48a-3290-a15b-a8b739f31ed2@collabora.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B3B2C40005 X-Stat-Signature: n9tu1ot4z1mbnhu8u943okpykza4isyg X-HE-Tag: 1674833559-49636 X-HE-Meta: U2FsdGVkX1///QcrF9vgWeurqEKS9/KAXlZ116bDlFsrRaxlyiY9So9OIwTgOpxSg0ZQmr94VD8P0ejghP4k/PHvtndxiBArPWLjnUcQWdXgtpXAbRYhF9/iDCCNLJ0csw6/iwnP6YgiI+R1hNPU1V2YE8ALv2MZw7TpGjKRDRfU/oVNabbT1Ut0wUxtybESM4xPL3Mgm41b3+PZjeEFwUztZmJ0GN2Uf4VD2Ynxk0F36RWamDA+EreHMmIcYd3HCDP6qBFRAgxhZeKJIqJUmhKQFi976yP1cPaZvlrD/CqO6x3IuEPAPHOFZrTbKzY3+/5VHZcSHc6JWXDx+HT5hTrj5Mt4/+Eh5qSA6KX+bVX9bUJgZmMrwFOtz7SYPBWYwNoEKtWPFty1fh/tRTKFoAVkydxj/AepNl+8y3WCC3tDKG2KeuR9AmVq3p0vQt4yZ2kk0LTSIGfZhalZHwqmuN6l1AMoIjewlJ6BwI/alnrlWklJvsuZr51yZPrli9fSmRMAp2BVk1ldhAHnv8V9GDP/cRIG46qyNOOQzpAfgdGmQqBE91JvOW1e5bcqxeug2uXIbu5PT1b0Q1Fzih8pR8zbep5Tzeho82cSN8tBwPV3c110lldbS8nVcpZYOKqVtUTnTLPrkq8jfGlLN+hzocYPyfBzJxIos4zl/5OnvYjXOS18MIcPM5bNFMLw1I5wScApPVamaeojZgJQaq6TGFu6ttYUzKiRkKOdEzX0TeTGwRMx/493pyWMHu2JD36mIxwwsVoUG68pDwWUIYwtTzO3nkDDL/sf8LbQTLXslGS0D83RjSbgvOUwyIe2QwB6nMhlpPPNw/8hKO33bzMM0TvrKgUWRkAENnClyOi7MhrKWPu2lVQEGgK3SoxN04sbCWQE9MjsO4vKpKfjrlnfCTeCqMQPhbOC8WknQ+uUtq11aDxG78DD4YPwXx81t6TN6oXKdeLFkJ8kgJTPEIU r29CFFBw /TS7zz/R5iq2TX33Ssw1BVBsR4m9Jh5lC1MIMgs3ygVCUuTqUXcrnXSZB5Kg35vKYNnb7CW3r4AJmHlmBPZdfTNHy3IGiDExhxWZHJUhMsepEl1986v9gB6O21wedgz1LeNKEyH0FrV0lIs4Z+mjoFH6wlIo1+GfspXw4RFaHA9rra3L9DM1cF4BcNN+QgKekwUa8MEXqmGvHHSOFxthCz/seEq3n6F+xL9wMt4t7vprW8BCSq0pIhV+p1+ENd+yZtr8GeHgIdCXhuNfFkqC/Hefuwy41Lomm/RqioEKG79HrDogNVHRVxII18y4xJ5p00HJyOk9SgZ3/44FhDCfo18iflcDEYgceKzkyCaCKg74S7SGwXwM+Giqd/q5nn3xhC162m94TOOUXDyDwjI9NwwqtUffC7Iu0XcKvhSBMVIc53fsO1wvhzB2uJA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 27, 2023 at 11:47:14AM +0500, Muhammad Usama Anjum wrote: > >> diff --git a/mm/memory.c b/mm/memory.c > >> index 4000e9f017e0..8c03b133d483 100644 > >> --- a/mm/memory.c > >> +++ b/mm/memory.c > >> @@ -3351,6 +3351,18 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) > >> > >> if (likely(!unshare)) { > >> if (userfaultfd_pte_wp(vma, *vmf->pte)) { > >> + if (userfaultfd_wp_async(vma)) { > >> + /* > >> + * Nothing needed (cache flush, TLB invalidations, > >> + * etc.) because we're only removing the uffd-wp bit, > >> + * which is completely invisible to the user. This > >> + * falls through to possible CoW. > > > > Here it says it falls through to CoW, but.. > > > >> + */ > >> + pte_unmap_unlock(vmf->pte, vmf->ptl); > >> + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, > >> + pte_clear_uffd_wp(*vmf->pte)); > >> + return 0; > > > > ... it's not doing so. The original lines should do: > > > > https://lore.kernel.org/all/Y8qq0dKIJBshua+X@x1n/ [1] > > > > Side note: you cannot modify pgtable after releasing the pgtable lock. > > It's racy. > If I don't unlock and return after removing the UFFD_WP flag in case of > async wp, the target just gets stuck. Maybe the pte lock is not unlocked in > some path. > > If I unlock and don't return, the crash happens. > > So I'd put unlock and return from here. Please comment on the below patch > and what do you think should be done. I've missed something. Have you tried to just use exactly what I suggested in [1]? I'll paste again: ---8<--- diff --git a/mm/memory.c b/mm/memory.c index 4000e9f017e0..09aab434654c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3351,8 +3351,20 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) if (likely(!unshare)) { if (userfaultfd_pte_wp(vma, *vmf->pte)) { - pte_unmap_unlock(vmf->pte, vmf->ptl); - return handle_userfault(vmf, VM_UFFD_WP); + if (userfaultfd_uffd_wp_async(vma)) { + /* + * Nothing needed (cache flush, TLB + * invalidations, etc.) because we're only + * removing the uffd-wp bit, which is + * completely invisible to the user. + * This falls through to possible CoW. + */ + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, + pte_clear_uffd_wp(*vmf->pte)); + } else { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return handle_userfault(vmf, VM_UFFD_WP); + } } ---8<--- Note that there's no "return", neither the unlock. The lock is used in the follow up write fault resolution and it's released later. Meanwhile please fully digest how pgtable lock is used in this path before moving forward on any of such changes. > > > > >> + } > >> pte_unmap_unlock(vmf->pte, vmf->ptl); > >> return handle_userfault(vmf, VM_UFFD_WP); > >> } > >> @@ -4812,8 +4824,21 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) > >> > >> if (vma_is_anonymous(vmf->vma)) { > >> if (likely(!unshare) && > >> - userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) > >> - return handle_userfault(vmf, VM_UFFD_WP); > >> + userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) { > >> + if (userfaultfd_wp_async(vmf->vma)) { > >> + /* > >> + * Nothing needed (cache flush, TLB invalidations, > >> + * etc.) because we're only removing the uffd-wp bit, > >> + * which is completely invisible to the user. This > >> + * falls through to possible CoW. > >> + */ > >> + set_pmd_at(vmf->vma->vm_mm, vmf->address, vmf->pmd, > >> + pmd_clear_uffd_wp(*vmf->pmd)); > > > > This is for THP, not hugetlb. > > > > Clearing uffd-wp bit here for the whole pmd is wrong to me, because we > > track writes in small page sizes only. We should just split. > By detecting if the fault is async wp, just splitting the PMD doesn't work. > The below given snippit is working right now. But definately, the fault of > the whole PMD is being resolved which if we can bypass by correctly > splitting would be highly desirable. Can you please take a look on UFFD > side and suggest the changes? It would be much appreciated. I'm attaching > WIP v9 patches for you to apply on next(next-20230105) and pagemap_ioctl > selftest can be ran to test things after making changes. Can you elaborate why thp split didn't work? Or if you want, I can look into this and provide the patch to enable uffd async mode. Thanks, -- Peter Xu