From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6672C352A3 for ; Tue, 11 Feb 2020 16:50:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 74F3120708 for ; Tue, 11 Feb 2020 16:50:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="ifBMmXPW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 74F3120708 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 056936B02F8; Tue, 11 Feb 2020 11:50:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F20FD6B02F9; Tue, 11 Feb 2020 11:50:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE9416B02FA; Tue, 11 Feb 2020 11:50:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0110.hostedemail.com [216.40.44.110]) by kanga.kvack.org (Postfix) with ESMTP id C3C726B02F8 for ; Tue, 11 Feb 2020 11:50:39 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6CA2B3D12 for ; Tue, 11 Feb 2020 16:50:39 +0000 (UTC) X-FDA: 76478434998.03.home70_7a134ae070218 X-HE-Tag: home70_7a134ae070218 X-Filterd-Recvd-Size: 7903 Received: from mail-qv1-f65.google.com (mail-qv1-f65.google.com [209.85.219.65]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Tue, 11 Feb 2020 16:50:38 +0000 (UTC) Received: by mail-qv1-f65.google.com with SMTP id o18so5306987qvf.1 for ; Tue, 11 Feb 2020 08:50:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Jd5i4mX/Ogg4/ff2n6qJB7i2iFOPG1CrXa+uEUO9qsw=; b=ifBMmXPW/zhL2PKm3FCrxT+p/6/F3wl9Kh2h+E5xu7Q2TD9qIPdzdgI8LOYyye3FRr UtUBpmAtJ5follDHdD7SjZSZJGIh/ElsRSNlPKYJGDm6Uc7aM3xB468XblcMzBoTh6+j d0kT2iJCbTsVnzeZqaXmiAsbbTrcaLDK5yrV0fr0hi36Nqpkqn78C+d0FDkn4OmBg1nH +4J2crlppYnJKhtPKq6MpYFYYhz6L4hd/n8C0G8QoMdCooies8demjKz57mNc9gwuHqd iFKcwsUCFKfcOjmcSUVqeuUAUMws9OsTOw/EUB4jt7cIljVzgx4W0yBnZwk0+X8slKI6 QdgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Jd5i4mX/Ogg4/ff2n6qJB7i2iFOPG1CrXa+uEUO9qsw=; b=PEcuxv0IcfA2q3RRYT5NJwCN8bBfFhQZbg+1fbJe6NcgXmP3LulG5fBV5tXMYxcg/U Ds565Cg7a2jY3s/mcw/XTNLr+1XohgLqLX60TWlNyFJAnbFpu8iAueVQzn1nfkMs3mZc r4+akk15kEK04mvxB/Pm/AcMPa7uGYe11hfaG6VR8FVHPuSfPJdV3vZm4EZaAmnIW465 Bu/6m+iu0VcmD7lKhkRNiY/V6Doby08oau6ujoj8sPnWtGTIAJMKndsvLZoKPO0ywsAA iJaktjdwWzn1Z4Jt88N1Aix7fbFyIHEzM+4YUeNdcKNuu6ZDU4wQ2/9DGWzx5BT72SiP HRcQ== X-Gm-Message-State: APjAAAWQoWn8aoXaYGwYaheaiseEH1f9XTP6LAa3DUOjP8zIM6OHzMQH vCHIrvrbU/IQckDt5jD/QXT4Nw== X-Google-Smtp-Source: APXvYqycWT/ED4KtAEPXSOdv+cBO6MkaZONwPU4iRoEq6Z7Ai40GB4O1dCWxrG5iz5rmSQk0gWbZzw== X-Received: by 2002:ad4:4c08:: with SMTP id bz8mr15982924qvb.241.1581439838304; Tue, 11 Feb 2020 08:50:38 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id o6sm2206759qkk.53.2020.02.11.08.50.37 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 11 Feb 2020 08:50:37 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1j1Yk9-0002J8-7b; Tue, 11 Feb 2020 12:50:37 -0400 Date: Tue, 11 Feb 2020 12:50:37 -0400 From: Jason Gunthorpe To: Joao Martins Cc: linux-nvdimm@lists.01.org, Dan Williams , Vishal Verma , Dave Jiang , Ira Weiny , Alex Williamson , Cornelia Huck , kvm@vger.kernel.org, Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , x86@kernel.org, Liran Alon , Nikita Leshenko , Barret Rhoden , Boris Ostrovsky , Matthew Wilcox , Konrad Rzeszutek Wilk Subject: Re: [PATCH RFC 09/10] vfio/type1: Use follow_pfn for VM_FPNMAP VMAs Message-ID: <20200211165037.GA22564@ziepe.ca> References: <20200110190313.17144-1-joao.m.martins@oracle.com> <20200110190313.17144-10-joao.m.martins@oracle.com> <20200207210831.GA31015@ziepe.ca> <98351044-a710-1d52-f030-022eec89d1d5@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <98351044-a710-1d52-f030-022eec89d1d5@oracle.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 11, 2020 at 04:23:49PM +0000, Joao Martins wrote: > On 2/7/20 9:08 PM, Jason Gunthorpe wrote: > > On Fri, Jan 10, 2020 at 07:03:12PM +0000, Joao Martins wrote: > >> From: Nikita Leshenko > >> > >> Unconditionally interpreting vm_pgoff as a PFN is incorrect. > >> > >> VMAs created by /dev/mem do this, but in general VM_PFNMAP just means > >> that the VMA doesn't have an associated struct page and is being managed > >> directly by something other than the core mmu. > >> > >> Use follow_pfn like KVM does to find the PFN. > >> > >> Signed-off-by: Nikita Leshenko > >> drivers/vfio/vfio_iommu_type1.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >> index 2ada8e6cdb88..1e43581f95ea 100644 > >> +++ b/drivers/vfio/vfio_iommu_type1.c > >> @@ -362,9 +362,9 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > >> vma = find_vma_intersection(mm, vaddr, vaddr + 1); > >> > >> if (vma && vma->vm_flags & VM_PFNMAP) { > >> - *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > >> - if (is_invalid_reserved_pfn(*pfn)) > >> - ret = 0; > >> + ret = follow_pfn(vma, vaddr, pfn); > >> + if (!ret && !is_invalid_reserved_pfn(*pfn)) > >> + ret = -EOPNOTSUPP; > >> } > > > > FWIW this existing code is a huge hack and a security problem. > > > > I'm not sure how you could be successfully using this path on actual > > memory without hitting bad bugs? > > > ATM I think this codepath is largelly hit at the moment for MMIO (GPU > passthrough, or mdev). In the context of this patch, guest memory would be > treated similarly meaning the device-dax backing memory wouldn't have a 'struct > page' (as introduced in this series). I think it is being used specifically to allow two VFIO's to be inserted into a VM and have the IOMMU setup to allow MMIO access. > > Fudamentally VFIO can't retain a reference to a page from within a VMA > > without some kind of recount/locking/etc to allow the thing that put > > the page there to know it is still being used (ie programmed in a > > IOMMU) by VFIO. > > > > Otherwise it creates use-after-free style security problems on the > > page. > > I take it you're referring to the past problems with long term page pinning + > fsdax? Or you had something else in mind, perhaps related to your LSFMM topic? No. I'm refering to retaining access to memory backed a VMA without holding any kind of locking on it. This is an access after free scenario. It *should* be like a long term page pin so that the VMA owner knows something is happening. > Here the memory can't be used by the kernel (and there's no struct page) except > from device-dax managing/tearing/driving the pfn region (which is static and the > underlying PFNs won't change throughout device lifetime), and vfio > pinning/unpinning the pfns (which are refcounted against multiple map/unmaps); For instance if you tear down the device-dax then VFIO will happily continue to reference the memory. This is a bug. There are other cases that escalate to security bugs. > > This code needs to be deleted, not extended :( > > To some extent it isn't really an extension: the patch was just removing the > assumption @vm_pgoff being the 'start pfn' on PFNMAP vmas. This is also > similarly done by get_vaddr_frames(). You are extending it in the sense that you plan to use it for more cases than VMAs created by some other VFIO. That should not be done as it will only complicate fixing this code. KVM is allowed to use follow_pfn because it uses MMU notifiers and does not allow the result of follow_pfn to outlive the VMA (AFAIK at least). So it should be safe. Jason