From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8F00C433EF for ; Mon, 18 Oct 2021 18:36:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3F42761279 for ; Mon, 18 Oct 2021 18:36:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3F42761279 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7BC3E900003; Mon, 18 Oct 2021 14:36:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76C48900002; Mon, 18 Oct 2021 14:36:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E5E5900003; Mon, 18 Oct 2021 14:36:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id 509F7900002 for ; Mon, 18 Oct 2021 14:36:30 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0D4B4181B048E for ; Mon, 18 Oct 2021 18:36:30 +0000 (UTC) X-FDA: 78710413740.01.DBE56F8 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) by imf10.hostedemail.com (Postfix) with ESMTP id 6D46C6001988 for ; Mon, 18 Oct 2021 18:36:26 +0000 (UTC) Received: by mail-qk1-f174.google.com with SMTP id r15so16225417qkp.8 for ; Mon, 18 Oct 2021 11:36:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=sTUgwvzk5rBLdhIf8DaS7J5sGbd2IBWKO4C29AYu/xc=; b=eOxhZyJgnih5XmJTiUkqPchjr0GxPdMGTZWIOrKKBnbd5oIycmMn5R+gGA2sn3LkLK 8B5YnTnSH3n9NgDYusHw9PUwNsH//Okzwm4qTKGVc2c7MvJHV07z9MsRaF+X9nNusxjD Yb0grXjEMIpSTl4VGTSaQ3iI8fXwYmRb57jc4vhGUbIbYT72PrcbqfY3R8MNDShnqyYr ilFrKr09XfgBn481Ei+z6rbmJ8MirnyiEJDkYvopmpMs3+VzGVgPAR90YeYZjvrqPmsF DGN+7sqiBax8sZV/x9u99m04iC7V65j6dSsg+qoYNVK/JvhIJiSTlZsa1sZ0D3JDmRF6 pxEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=sTUgwvzk5rBLdhIf8DaS7J5sGbd2IBWKO4C29AYu/xc=; b=lVGt356wf05NPnBACenVoRzPZlPONr63+sc7FnPr6UYil/sQFbo1ToyXC1jzboXpL/ CT2pNrRSjbI9LupM47imOjoKHf2jSJ9nB+9JqsXPKUNc7XPiBwBIAgWFCBjRUdR8Go6q vycTsXutE8tTXceJEM3n3GLc/WaailJ4kUdW34E3JXupM5TdiBDPG+YC5W4mYv0FBZ5n Fb5MI75yba5zpUgpJd27FGWbyAss0UuLIU/2hAdP/BnSjlR9MjwowotmPMcinf53D0c8 HoJDBGyO/VeQGx6+B8fKBdY//6ybsCHG4F8v8IYkr6hk44B3MinDZl99K7rK1vNS/SqO oWww== X-Gm-Message-State: AOAM531wLwC5Ch8UY2UDu+vo4nVre3bT7CSOiNoHNrawgon6e0Kt9Ayf CfmakaJQi7IYQhOXCM5EhzlCPQ== X-Google-Smtp-Source: ABdhPJxOGqr4Dm+0wgcW0Qag4P+r6KIDtarTX9sViv/hQASfXUhZKkzCDtmh5gP+QfOaJGfe1TWiVQ== X-Received: by 2002:a05:620a:288f:: with SMTP id j15mr23613311qkp.280.1634582189080; Mon, 18 Oct 2021 11:36:29 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id m68sm6924782qkb.105.2021.10.18.11.36.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Oct 2021 11:36:28 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mcXUp-00GLtu-LR; Mon, 18 Oct 2021 15:36:27 -0300 Date: Mon, 18 Oct 2021 15:36:27 -0300 From: Jason Gunthorpe To: Alistair Popple Cc: Joao Martins , Dan Williams , Felix Kuehling , linux-mm@kvack.org, Vishal Verma , Dave Jiang , Naoya Horiguchi , Matthew Wilcox , John Hubbard , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Jonathan Corbet , Christoph Hellwig , nvdimm@lists.linux.dev, linux-doc@vger.kernel.org Subject: Re: [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages Message-ID: <20211018183627.GD3686969@ziepe.ca> References: <20210827145819.16471-1-joao.m.martins@oracle.com> <3f35cc33-7012-5230-a771-432275e6a21e@oracle.com> <20210929193405.GZ3544071@ziepe.ca> <31536278.clc0Zd3cv0@nvdebian> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <31536278.clc0Zd3cv0@nvdebian> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6D46C6001988 X-Stat-Signature: yn54h5fdfy3rrms85g683pq6tkry9pa8 Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=eOxhZyJg; spf=pass (imf10.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.222.174 as permitted sender) smtp.mailfrom=jgg@ziepe.ca; dmarc=none X-HE-Tag: 1634582186-844710 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 30, 2021 at 01:01:14PM +1000, Alistair Popple wrote: > On Thursday, 30 September 2021 5:34:05 AM AEST Jason Gunthorpe wrote: > > On Wed, Sep 29, 2021 at 12:50:15PM +0100, Joao Martins wrote: > > > > > > If the get_dev_pagemap has to remain then it just means we have to > > > > flush before changing pagemap pointers > > > Right -- I don't think we should need it as that discussion on the other > > > thread goes. > > > > > > OTOH, using @pgmap might be useful to unblock gup-fast FOLL_LONGTERM > > > for certain devmap types[0] (like MEMORY_DEVICE_GENERIC [device-dax] > > > can support it but not MEMORY_DEVICE_FSDAX [fsdax]). > > > > When looking at Logan's patches I think it is pretty clear to me that > > page->pgmap must never be a dangling pointer if the caller has a > > legitimate refcount on the page. > > > > For instance the migrate and stuff all blindly calls > > is_device_private_page() on the struct page expecting a valid > > page->pgmap. > > > > This also looks like it is happening, ie > > > > void __put_page(struct page *page) > > { > > if (is_zone_device_page(page)) { > > put_dev_pagemap(page->pgmap); > > > > Is indeed putting the pgmap ref back when the page becomes ungettable. > > > > This properly happens when the page refcount goes to zero and so it > > should fully interlock with __page_cache_add_speculative(): > > > > if (unlikely(!page_ref_add_unless(page, count, 0))) { > > > > Thus, in gup.c, if we succeed at try_grab_compound_head() then > > page->pgmap is a stable pointer with a valid refcount. > > > > So, all the external pgmap stuff in gup.c is completely pointless. > > try_grab_compound_head() provides us with an equivalent protection at > > lower cost. Remember gup.c doesn't deref the pgmap at all. > > > > Dan/Alistair/Felix do you see any hole in that argument?? > > As background note that device pages are currently considered free when > refcount == 1 but the pgmap reference is dropped when the refcount transitions > 1->0. The final pgmap reference is typically dropped when a driver calls > memunmap_pages() and put_page() drops the last page reference: > > void memunmap_pages(struct dev_pagemap *pgmap) > { > unsigned long pfn; > int i; > > dev_pagemap_kill(pgmap); > for (i = 0; i < pgmap->nr_range; i++) > for_each_device_pfn(pfn, pgmap, i) > put_page(pfn_to_page(pfn)); > dev_pagemap_cleanup(pgmap); > > If there are still pgmap references dev_pagemap_cleanup(pgmap) will block until > the final reference is dropped. So I think your argument holds at least for > DEVICE_PRIVATE and DEVICE_GENERIC. DEVICE_FS_DAX defines it's own pagemap > cleanup but I can't see why the same argument wouldn't hold there - if a page > has a valid refcount it must have a reference on the pagemap too. To close this circle - the issue is use after free on the struct page * entry while it has a zero ref. memunmap_pages() does wait for the refcount to go to zero, but it then goes on to free the memory under the struct pages. However there are possibly still untracked references to this memory in the page tables. This is the bug Dan has been working on - to shootdown page table mappings before getting to memunmap_pages() Getting the page map ref will make the use-after-free never crash, just be a silent security problem. :( Jason