From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D4106C52D7C for ; Thu, 15 Aug 2024 15:42:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:In-Reply-To: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=JsmHuSZ/h+8byKlDVdTBZriKwwSluzVcisJkUaNyJWc=; b=ejzRzCflET1WjHZLNU3gc4YVfy ysZkBVjMI5dDfm6/8bNiVKyXTR3w3FEyEQ6/4kQPo5MHSNTpeoC5I/wXjkfmMdSIYFSGyHswyoLjV qikWuxr+tDuwMmAVf9Obv/WKTdF/aGzw90diZRQLizSUb8PnRTWjXJMifsz7d7WMwo5DPLS9OCCTB mQvYHcnZxHVy0jKNTz/IaCdP0dl1UdhSPkmmm9FppXStlwJ6ZBjXy68PQicK25T64ecZM8ev0tewY IAEwAbjVbKF7o5O+qQOyUHnAupiqYlirviMvlF6YHtWIAOsbmX8aJKWv0PGttvqmpGaVGFgZi7Dlt oPOLfvng==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1seccX-0000000AQ0x-3ZE0; Thu, 15 Aug 2024 15:42:37 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1secbu-0000000APwm-410e for linux-arm-kernel@bombadil.infradead.org; Thu, 15 Aug 2024 15:41:59 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:In-Reply-To:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=JsmHuSZ/h+8byKlDVdTBZriKwwSluzVcisJkUaNyJWc=; b=lpBQOy2SVwOezHD8U0yGb2plpH Ty83wFQ3y/yZxlbJaml5xdK3qWclISNxpEh0unRDIUFf5d4rb2IFBOHkT3iqugU85xvapkXF7iPzU aEvVuG54BIUdaI6gj8dr29pnUsQhG/BhwhdtfMLBhfYHWX6YAxQnkHBcRiB7+TAJbN0vHBtZL6Zf7 N9IqDmkRMmm1VS8bsDPQ41SBDR+fyJZugIFaODSJbdg2RNQbidr9F+3J5UUU6pq1TU/U8UkoaR7A2 uGjhAPu3Ao641c7IK4BphUmxx07wPcVqX3QGEUEeGeKrHk+Pe0L3Yxmnz1TTgJ6Nx1fx5ePCoa/eQ aEl/B/lQ==; Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by desiato.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1secbm-00000008QMZ-3rdV for linux-arm-kernel@lists.infradead.org; Thu, 15 Aug 2024 15:41:54 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723736506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JsmHuSZ/h+8byKlDVdTBZriKwwSluzVcisJkUaNyJWc=; b=LsTDcnVIK6xXuqu1vO3PgSkT/N+6qpSmOCXLnwYU4KqcNRL/8nLET2EzMe2mAnLmlGEa20 ZDsgTjZ/nMsu22rjIPJVGm1jzwkklgHnwkaQ4LPu9Gd4a+72dEwWr+GKCoP6EvMOT1v1fC 34TEV4zwe/FLpBZ1PS7vlQVe+o78X44= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723736506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JsmHuSZ/h+8byKlDVdTBZriKwwSluzVcisJkUaNyJWc=; b=LsTDcnVIK6xXuqu1vO3PgSkT/N+6qpSmOCXLnwYU4KqcNRL/8nLET2EzMe2mAnLmlGEa20 ZDsgTjZ/nMsu22rjIPJVGm1jzwkklgHnwkaQ4LPu9Gd4a+72dEwWr+GKCoP6EvMOT1v1fC 34TEV4zwe/FLpBZ1PS7vlQVe+o78X44= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-686-cGwwo8KPNaeLzITwNx1vvQ-1; Thu, 15 Aug 2024 11:41:44 -0400 X-MC-Unique: cGwwo8KPNaeLzITwNx1vvQ-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6bf6c118511so2085746d6.2 for ; Thu, 15 Aug 2024 08:41:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723736503; x=1724341303; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JsmHuSZ/h+8byKlDVdTBZriKwwSluzVcisJkUaNyJWc=; b=FruEaJM+VrU0X1B5e+jkqiwmFxJNQvj72CbdTfm1tBX92aAHPCxIttHNAsYN6DSout okpGTo8BUyGPlSthZzMw5ELB+1Df5h3AX37Q0Hv7zLri6GRXqu1U9kX0a0W5JaH5+kZC Y/hiz97wRGCBw6xzaciVr9ifekSleu2Y5pZWRAIKsaAKxEpD9q6B2KEEgH8pszCpjoez 0nAlUXtBkQHTPaHUA1rjA9HIygYLHuxd0B9SIe8gSZp7JdnqjvLcHirAVfvLC2GlOy2d RrkcXkhAQNT3ByvEznAHn7WBwqH0fpVhx1+2nmZ0zMn8r1VQWtX5e7wdorIsJavZrb0h Q2+g== X-Forwarded-Encrypted: i=1; AJvYcCUl/4ba/qQADj8HeB6A0phyW7KzRnksJUCHakIPm3B1lggzbK6pTUAcWn/Fi6hX2tIPs2r4DCbVMvWz0e3+Rs30@lists.infradead.org X-Gm-Message-State: AOJu0YyIsRwkSDyMwDLyApnfTHtz2ik83HOP471wOblF3ItegHRxgrGG I7TdKYY5LpHPiYOYY9xNyB2vnCqduFOh97o+JtHZDsr2ok3bs/BtxN8gOJaYAG5ppDDAd5iGcpr nBRuE5zvxC5ETrMz5rm6MFYWkBDtfFkHBoQJEvLJ1GwJQoxqldMKDmbADn8rdY4Jxmfllhdhn X-Received: by 2002:a05:6214:c2a:b0:6b9:9417:c103 with SMTP id 6a1803df08f44-6bf5d15e1c2mr43262856d6.1.1723736503626; Thu, 15 Aug 2024 08:41:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFzOO1MXixhWkJJiUpcPb8vsKcxQhlblFzoLUtxY92DYm0+isa557XEa9BsKNxw5lcKbfHR8g== X-Received: by 2002:a05:6214:c2a:b0:6b9:9417:c103 with SMTP id 6a1803df08f44-6bf5d15e1c2mr43262466d6.1.1723736503043; Thu, 15 Aug 2024 08:41:43 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bf6ff0c354sm7266716d6.135.2024.08.15.08.41.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Aug 2024 08:41:42 -0700 (PDT) Date: Thu, 15 Aug 2024 11:41:39 -0400 From: Peter Xu To: Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sean Christopherson , Oscar Salvador , Axel Rasmussen , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Will Deacon , Gavin Shan , Paolo Bonzini , Zi Yan , Andrew Morton , Catalin Marinas , Ingo Molnar , Alistair Popple , Borislav Petkov , David Hildenbrand , Thomas Gleixner , kvm@vger.kernel.org, Dave Hansen , Alex Williamson , Yan Zhao Subject: Re: [PATCH 09/19] mm: New follow_pfnmap API Message-ID: References: <20240809160909.1023470-1-peterx@redhat.com> <20240809160909.1023470-10-peterx@redhat.com> <20240814131954.GK2032816@nvidia.com> <20240814221441.GB2032816@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20240814221441.GB2032816@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240815_164151_173359_0B1A2822 X-CRM114-Status: GOOD ( 45.52 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Aug 14, 2024 at 07:14:41PM -0300, Jason Gunthorpe wrote: > On Wed, Aug 14, 2024 at 02:24:47PM -0400, Peter Xu wrote: > > On Wed, Aug 14, 2024 at 10:19:54AM -0300, Jason Gunthorpe wrote: > > > On Fri, Aug 09, 2024 at 12:08:59PM -0400, Peter Xu wrote: > > > > > > > +/** > > > > + * follow_pfnmap_start() - Look up a pfn mapping at a user virtual address > > > > + * @args: Pointer to struct @follow_pfnmap_args > > > > + * > > > > + * The caller needs to setup args->vma and args->address to point to the > > > > + * virtual address as the target of such lookup. On a successful return, > > > > + * the results will be put into other output fields. > > > > + * > > > > + * After the caller finished using the fields, the caller must invoke > > > > + * another follow_pfnmap_end() to proper releases the locks and resources > > > > + * of such look up request. > > > > + * > > > > + * During the start() and end() calls, the results in @args will be valid > > > > + * as proper locks will be held. After the end() is called, all the fields > > > > + * in @follow_pfnmap_args will be invalid to be further accessed. > > > > + * > > > > + * If the PTE maps a refcounted page, callers are responsible to protect > > > > + * against invalidation with MMU notifiers; otherwise access to the PFN at > > > > + * a later point in time can trigger use-after-free. > > > > + * > > > > + * Only IO mappings and raw PFN mappings are allowed. > > > > > > What does this mean? The paragraph before said this can return a > > > refcounted page? > > > > This came from the old follow_pte(), I kept that as I suppose we should > > allow VM_IO | VM_PFNMAP just like before, even if in this case I suppose > > only the pfnmap matters where huge mappings can start to appear. > > If that is the intention it should actively block returning anything > that is vm_normal_page() not check the VM flags, see the other > discussion.. The restriction should only be applied to the vma attributes, not a specific pte mapping, IMHO. I mean, the comment was describing "which VMA is allowed to use this function", reflecting that we'll fail at anything !PFNMAP && !IO. It seems legal to have private mappings of them, where vm_normal_page() can return true here for some of the mappings under PFNMAP|IO. IIUC either the old follow_pte() or follow_pfnmap*() API cared much on this part yet so far. > > It makes sense as a restriction if you call the API follow pfnmap. I'm open to any better suggestion to names. Again, I think here it's more about the vma attribute, not "every mapping under the memory range". > > > > > + * The mmap semaphore > > > > + * should be taken for read, and the mmap semaphore cannot be released > > > > + * before the end() is invoked. > > > > > > This function is not safe for IO mappings and PFNs either, VFIO has a > > > known security issue to call it. That should be emphasised in the > > > comment. > > > > Any elaboration on this? I could have missed that.. > > Just because the memory is a PFN or IO doesn't mean it is safe to > access it without a refcount. There are many driver scenarios where > revoking a PFN from mmap needs to be a hard fence that nothing else > has access to that PFN. Otherwise it is a security problem for that > driver. Oh ok, I suppose you meant the VFIO whole thing on "zapping mapping when MMIO disabled"? If so I get it. More below. > > > I suppose so? As the pgtable is stable, I thought it means it's safe, but > > I'm not sure now when you mentioned there's a VFIO known issue, so I could > > have overlooked something. There's no address returned, but pfn, pgprot, > > write, etc. > > zap/etc will wait on the PTL, I think, so it should be safe for at > least the issues I am thinking of. > > > The user needs to do proper mapping if they need an usable address, > > e.g. generic_access_phys() does ioremap_prot() and recheck the pfn didn't > > change. > > No, you can't take the phys_addr_t outside the start/end region that > explicitly holds the lock protecting it. This is what the comment must > warn against doing. I think the comment has that part covered more or less: * During the start() and end() calls, the results in @args will be valid * as proper locks will be held. After the end() is called, all the fields * in @follow_pfnmap_args will be invalid to be further accessed. Feel free to suggest anything that will make it better. For generic_access_phys() as a specific example: I think it is safe to map the pfn even after end(). I meant here the "map" operation is benign with ioremap_prot(), afaiu: it doesn't include an access on top of the mapping yet. After the map, it rewalks the pgtable, making sure PFN is still there and valid, and it'll only access it this time before end(): if (write) memcpy_toio(maddr + offset, buf, len); else memcpy_fromio(buf, maddr + offset, len); ret = len; follow_pfnmap_end(&args); If PFN changed, it properly releases the mapping: if ((prot != pgprot_val(args.pgprot)) || (phys_addr != (args.pfn << PAGE_SHIFT)) || (writable != args.writable)) { follow_pfnmap_end(&args); iounmap(maddr); goto retry; } Then taking the example of VFIO: there's no risk of racing with a concurrent zapping as far as I can see, because otherwise it'll see pfn changed. Thanks, -- Peter Xu