From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F64FC282D0 for ; Wed, 30 Jan 2019 02:49:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AF69221873 for ; Wed, 30 Jan 2019 02:48:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF69221873 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3CA1F8E0004; Tue, 29 Jan 2019 21:48:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 37A698E0001; Tue, 29 Jan 2019 21:48:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26A5C8E0004; Tue, 29 Jan 2019 21:48:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id EFFD58E0001 for ; Tue, 29 Jan 2019 21:48:58 -0500 (EST) Received: by mail-qt1-f197.google.com with SMTP id w19so27291411qto.13 for ; Tue, 29 Jan 2019 18:48:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=VNLB9n0ApiJu82gkCjuCwQtMkl/Gta4BlPMuP0fYWxE=; b=j+Bp0co02XW6r47aMQYzThKnmd2YeLZaTAmx3sdm+93p6SI9JcwbYmq98ASCkewwMS KC0NSSz5bESqmGnP+qCT1M+vsXEwHXb4CS2dbY374teK5jSGTxPc+O98RIRB1mLDqjMb 8CRUgE/zOnkvxhcu+9hYB3qRdcIvhk/zne9RKXUxZR1IH2NRaXf7KoFBfv2q1tHuPast tw6a/IcaVsMPlrBj6rWe4Yt685kXvOuGl5tc37EDN8ZC7EqL4aAsVzprm1MZDpKv0RfI +wxYquXHmbsiuzPoipyEpgOTiiONfYADSM53zBWBTQiXpGz8/bGEfihOH1zYJ0MmyyxC LOMg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AJcUukfo7Hp1BsGV6RzF3+uca9ae3UpT/PBy7hpoIWfejY7781tXRxmu oXdO7Lx9THYHNgk36U8Tk922F9HO4ZrsEw3dxre/GNAU7qjLotgqA1Al8cvSvBMcU1PEy221g9Y u6/7P7DXGL/sYftwWHn1ZtdZpaqd5QLuz1UcKgR8OQv6dBGbBXa54cAFOaAQwcNI8Ww== X-Received: by 2002:a0c:9dc6:: with SMTP id p6mr27487313qvf.217.1548816538654; Tue, 29 Jan 2019 18:48:58 -0800 (PST) X-Google-Smtp-Source: ALg8bN60/vMJdXrV9vNHVD7uRlg3u2EwvUGksnLn9XWLDqRV0gwj5KBiml4KWNtQBFQg05zxIzJk X-Received: by 2002:a0c:9dc6:: with SMTP id p6mr27487270qvf.217.1548816537708; Tue, 29 Jan 2019 18:48:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548816537; cv=none; d=google.com; s=arc-20160816; b=RSE+RHmY0bxRTNHlFyA2nADzLx+BUD3zr+B5ZgLlLT5V23+vBEahfZ2Z1RKvdpwY5u rvi+Uy4Kmp+LBVysDCjccve3ToszZ4thAavMFBTC5u5SB5P/Mop26Hd3qlNFiRR+InGY oWHM5l/aD9HqtFnjqbDKeD07MXS/6BRObtUxakk5jw2MhuPK+VkovXVJX2k8fFuc3Ko3 5QxdKpsBLzvPzHXNVPc2ITn9ENQvmhFuPAw0o4Deoh1LaXEkZa3q7T58X2THxtFcSN60 Hc+5Kx9sJ1N6IB2EOo+A6m/ddM/AZcPpKy9/SpzBbiCj0yzYPifgsqhPEPW4HWPrgA78 JhqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VNLB9n0ApiJu82gkCjuCwQtMkl/Gta4BlPMuP0fYWxE=; b=ANVSlkn6N8AEOqGClmMU7sKMWNQgRQhd3hFVCjxxe6CGJEcdHjrPxoUZs1bnBNzoRw CPmH0Eopq48ThDlmWi9DVjl2h3MrD5b0DM+xN2noLUgWjtNmuAzUAEFs9aRQz5fxu//r +zJMguBEnxqNrnNElOV3d/6CfGyVk53ncO4N6iX/HdizmtLd+SwlGOVYHlLqCJOfPIoi rAzDsLPGfx0h5g04JZX3kPEJ3AbDliVGfqq6sNXJx9kis9gMhXQhB8qBUflqELWOjH1t FiFUFtvOXd0R6T/ImEhmNca16iTDZ1lJV0Vpe/PHhvFFlndqLuzEcKzvlqzkYSG5DhvL vtLQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id u48si155609qte.81.2019.01.29.18.48.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 18:48:57 -0800 (PST) Received-SPF: pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jglisse@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=jglisse@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6B06989AD4; Wed, 30 Jan 2019 02:48:56 +0000 (UTC) Received: from redhat.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0DF4F19C65; Wed, 30 Jan 2019 02:48:53 +0000 (UTC) Date: Tue, 29 Jan 2019 21:48:52 -0500 From: Jerome Glisse To: Logan Gunthorpe Cc: Jason Gunthorpe , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , "linux-pci@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , "iommu@lists.linux-foundation.org" Subject: Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma Message-ID: <20190130024851.GB10462@redhat.com> References: <20190129191120.GE3176@redhat.com> <20190129193250.GK10108@mellanox.com> <99c228c6-ef96-7594-cb43-78931966c75d@deltatee.com> <20190129205749.GN3176@redhat.com> <2b704e96-9c7c-3024-b87f-364b9ba22208@deltatee.com> <20190129215028.GQ3176@redhat.com> <20190129234752.GR3176@redhat.com> <655a335c-ab91-d1fc-1ed3-b5f0d37c6226@deltatee.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <655a335c-ab91-d1fc-1ed3-b5f0d37c6226@deltatee.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 30 Jan 2019 02:48:56 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 29, 2019 at 06:17:43PM -0700, Logan Gunthorpe wrote: > > > On 2019-01-29 4:47 p.m., Jerome Glisse wrote: > > The whole point is to allow to use device memory for range of virtual > > address of a process when it does make sense to use device memory for > > that range. So they are multiple cases where it does make sense: > > [1] - Only the device is accessing the range and they are no CPU access > > For instance the program is executing/running a big function on > > the GPU and they are not concurrent CPU access, this is very > > common in all the existing GPGPU code. In fact AFAICT It is the > > most common pattern. So here you can use HMM private or public > > memory. > > [2] - Both device and CPU access a common range of virtul address > > concurrently. In that case if you are on a platform with cache > > coherent inter-connect like OpenCAPI or CCIX then you can use > > HMM public device memory and have both access the same memory. > > You can not use HMM private memory. > > > > So far on x86 we only have PCIE and thus so far on x86 we only have > > private HMM device memory that is not accessible by the CPU in any > > way. > > I feel like you're just moving the rug out from under us... Before you > said ignore HMM and I was asking about the use case that wasn't using > HMM and how it works without HMM. In response, you just give me *way* > too much information describing HMM. And still, as best as I can see, > managing DMA mappings (which is different from the userspace mappings) > for GPU P2P should be handled by HMM and the userspace mappings should > *just* link VMAs to HMM pages using the standard infrastructure we > already have. For HMM P2P mapping we need to call into the driver to know if driver wants to fallback to main memory (running out of BAR addresses) or if it can allow a peer device to directly access its memory. We also need the call to exporting device driver as only the exporting device driver can map the HMM page pfn to some physical BAR address (which would be allocated by driver for GPU). I wanted to make sure the HMM case was understood too, sorry if it caused confusion with the non HMM case which i describe below. > >> And what struct pages are actually going to be backing these VMAs if > >> it's not using HMM? > > > > When you have some range of virtual address migrated to HMM private > > memory then the CPU pte are special swap entry and they behave just > > as if the memory was swapped to disk. So CPU access to those will > > fault and trigger a migration back to main memory. > > This isn't answering my question at all... I specifically asked what is > backing the VMA when we are *not* using HMM. So when you are not using HMM ie existing GPU object without HMM then like i said you do not have any valid pte most of the time inside the CPU page table ie the GPU driver only populate the pte with valid entry when they are CPU page fault and it clear those as soon as the corresponding object is use by the GPU. In fact some driver also unmap it agressively from the BAR making the memory totaly un-accessible to anything but the GPU. GPU driver do not like CPU mapping, they are quite aggressive about clearing them. Then everything i said about having userspace deciding which object can be share, and, with who, do apply here. So for GPU you do want to give control to GPU driver and you do not want to require valid CPU pte for the vma so that the exporting driver can return valid address to the importing peer device only. Also exporting device driver might decide to fallback to main memory (running out of BAR addresses for instance). So again here we want to go through the exporting device driver so that it can take the right action. So the expected pattern (for GPU driver) is: - no valid pte for the special vma (mmap of device file) - importing device call p2p_map() for the vma if it succeed the first time then we expect it will succeed for the same vma and range next time we call it. - exporting driver can either return physical address to page into its BAR space that point to the correct device memory or fallback to main memory Then at any point in time: - if GPU driver want to move the object around (for whatever reasons) it calls zap_vma_ptes() the fact that there is no valid CPU pte does not matter it will call mmu notifier and thus any importing device driver will invalidate its mapping - importing device driver that lost the mapping due to mmu notification can re-map by re-calling p2p_map() (it should check that the vma is still valid ...) and guideline is for the exporting device driver to succeed and return valid address to the new memory use for the object This allow device driver like GPU to keep control. The expected pattern is still the p2p mapping to stay undisrupted for their whole lifetime. Invalidation should only be triggered if GPU driver do need to move things around. All the above is for the no HMM case ie mmap of a device file so for any existing open source GPU device driver that do not support HMM. Cheers, Jérôme