From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30A76C4320A for ; Wed, 1 Sep 2021 16:13:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C676C610A8 for ; Wed, 1 Sep 2021 16:13:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C676C610A8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 126226B0072; Wed, 1 Sep 2021 12:13:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C202940007; Wed, 1 Sep 2021 12:13:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF1C1900003; Wed, 1 Sep 2021 12:13:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id E0F4C6B0072 for ; Wed, 1 Sep 2021 12:13:26 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8FBFE1816392C for ; Wed, 1 Sep 2021 16:13:26 +0000 (UTC) X-FDA: 78539499612.25.6537311 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 327DD700009E for ; Wed, 1 Sep 2021 16:13:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1630512805; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jNQNf94w2B/KUb1e/ecylvdoWjqK4Psx9JHN1btjTDc=; b=jAqxGrNkwD9K8m9OvOox8e14nAYhINugnlpnypJSDIQUueC8kf3ev+sonzFGonowf92nLh Bt23WwymaM/vi4j6sG212QXpUtB3TuZxX9lqwLwDXraGuA8cyDMOPy4JZWGpoWcyyw6G80 AZps+VtbF3VIouKdaZo/R+4eeLdbnqs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-557-QvdmkoDpMsilcGqFgaWViQ-1; Wed, 01 Sep 2021 12:13:24 -0400 X-MC-Unique: QvdmkoDpMsilcGqFgaWViQ-1 Received: by mail-wm1-f69.google.com with SMTP id 5-20020a1c00050000b02902e67111d9f0so78659wma.4 for ; Wed, 01 Sep 2021 09:13:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=jNQNf94w2B/KUb1e/ecylvdoWjqK4Psx9JHN1btjTDc=; b=S9YVQTb0+1TeE54GoYkDl0yd8yTudHEKWGuDus9rJIef03d1D3SktRLX5mkovR1jGP oUS9xGh2FULT7+f28bH8UdniGVYIMJ9vQddWlKX+yBE16HPAbWh8U7uQ/XCXLB7ia3aP g74fz6vxeNjx03kUmaFQnj9Q0gzrC6g2OxiPFVse6HuzfO2uB7AOLwjEgqdM+W6Q53CQ RCYAtTgFKwOydpJu5BwVTHa9OHgheswetTBv+v3DFJ2WoqXbQnZ84+SPfqa2dM1yvTnw IZcW99M7kUEj9JueZ0KwsmIzFs0ZiDUFeocwyK1sUfzqVPRvQOukwZhiNoKhp8S7kuTy YTgw== X-Gm-Message-State: AOAM532NZGMNAKB90Chj0AXXXmtH2dZLZDoSpAxmbOIzD+ivvFdC30Fz S9LecGbOlah4g0DxLKGE9Tn2qbSWJdt2JcSmB8oIcT53vjJwHelOsij7qGxB7kBPd5am3iRS1aE 4O1qOYRT6YZw= X-Received: by 2002:a05:600c:b4e:: with SMTP id k14mr251916wmr.139.1630512803549; Wed, 01 Sep 2021 09:13:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyO4LkyCGLZr018Y/LoACRfCBfu0mckb3LIj0CvAOTwgWxvrwipGnQx7hqOjzn46NjmwWENpw== X-Received: by 2002:a05:600c:b4e:: with SMTP id k14mr251893wmr.139.1630512803372; Wed, 01 Sep 2021 09:13:23 -0700 (PDT) Received: from [192.168.3.132] (p4ff23f71.dip0.t-ipconnect.de. [79.242.63.113]) by smtp.gmail.com with ESMTPSA id s205sm21756wme.4.2021.09.01.09.13.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 01 Sep 2021 09:13:22 -0700 (PDT) To: Jason Gunthorpe Cc: Qi Zheng , akpm@linux-foundation.org, tglx@linutronix.de, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com References: <20210819031858.98043-1-zhengqi.arch@bytedance.com> <20210819031858.98043-7-zhengqi.arch@bytedance.com> <20210901135314.GA1859446@nvidia.com> <0c9766c9-6e8b-5445-83dc-9f2b71a76b4f@redhat.com> <20210901153247.GJ1721383@nvidia.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 6/9] mm: free user PTE page table pages Message-ID: <7789261d-6a64-c47b-be6c-c9be680e5d33@redhat.com> Date: Wed, 1 Sep 2021 18:13:07 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210901153247.GJ1721383@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jAqxGrNk; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf27.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 327DD700009E X-Stat-Signature: zk5yzyegy3tf5sykezqfzte9rc6qejaz X-HE-Tag: 1630512806-967130 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01.09.21 17:32, Jason Gunthorpe wrote: > On Wed, Sep 01, 2021 at 03:57:09PM +0200, David Hildenbrand wrote: >> On 01.09.21 15:53, Jason Gunthorpe wrote: >>> On Thu, Aug 19, 2021 at 11:18:55AM +0800, Qi Zheng wrote: >>> >>>> diff --git a/mm/gup.c b/mm/gup.c >>>> index 2630ed1bb4f4..30757f3b176c 100644 >>>> +++ b/mm/gup.c >>>> @@ -500,6 +500,9 @@ static struct page *follow_page_pte(struct vm_ar= ea_struct *vma, >>>> if (unlikely(pmd_bad(*pmd))) >>>> return no_page_table(vma, flags); >>>> + if (!pte_try_get(mm, pmd)) >>>> + return no_page_table(vma, flags); >>>> + >>>> ptep =3D pte_offset_map_lock(mm, pmd, address, &ptl); >>> >>> This is not good on a performance path, the pte_try_get() is >>> locking/locking the same lock that pte_offset_map_lock() is getting. >> >> Yes, and we really need patch #8, anything else is just confusing revi= ewers. >=20 > It is a bit better with patch 8, but it is still not optimal, we don't > need to do the atomic work at all if the entire ptep is accessed while > locked. So the above is stil not what I would expect here, even with > RCU. >=20 > eg I would expect that this kind of change would work first with the > existing paired acessors, ie >=20 > pte =3D pte_offset_map(pmd, address); > pte_unmap(pte); >=20 > Should handle the refcount under the covers, and same kind of idea for > the _locked/_unlocked varient. See my other mail. >=20 > Only places that don't already use that pairing should get modified. >=20 > To do this we have to extend the API so that pte_offset_map() can > fail, or very cleverly return some kind of global non-present pte page > (I wonder if the zero page would work?) I explored both ideas (returning NULL, return a specially prepared page)=20 and it didn't work in some cases where we unmap+remap etc. >=20 >>> Also, I don't really understand how this scheme works with >>> get_user_pages_fast. >> >> With the RCU change it in #8 it should work just fine, because RCU >> synchronize has to wait either until all other CPUs have left the RCU = read >> section, or re-enabled interrupts. >=20 > So at this point in the series fast gup is broken, that does mean the > series presentation really needs to be reworked. The better > presentation is to add the API changes, with a > no-functional-difference implementation, push the new API in well > split patches to all the consumption sites, then change the API to > have the new semantics. Exactly my thoughts. >=20 > RCU and refcount to free the page levels seems like a reasonable > approach, but I have to say I haven't thought it through fully - are > all the contexts that have the pte deref safe to do call_rcu? Very good question. I'd assume so. --=20 Thanks, David / dhildenb