From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 26E99CD13CF for ; Mon, 2 Sep 2024 09:45:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E617210E277; Mon, 2 Sep 2024 09:45:45 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="m4RAxLK9"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) by gabe.freedesktop.org (Postfix) with ESMTPS id 23E7210E26A; Mon, 2 Sep 2024 09:45:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725270345; x=1756806345; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=3FqHNCqQ2PxEPD6c0gpV6LohA9+foAPZMqgF35tUvr0=; b=m4RAxLK90PjnJwUbNtc+tB3KZE5h45VxT5LAJ1aRW4FsU7oM88Eyj6cJ sEzNUwUS9HKnejUT3XjJNhgQQDUQkAiw+Hp72qvVGVQXz+hDVo2hswZbA aP/82L7WXo7Rp1V8oinMkS97RoKsMuwZwJ8Rut2HbTo/gU0R+UFWLnl6i 8Z33SVvajITbv8PeJuwVI20oopUlN90Hwth73K1nRlLevqeDh8pFcDnxY 9hP3t1Y3q2tVt8GUkFRX6W7OKQav6sJG/U3+sGEWMlXmuJlj3q+Do1STH 3kSh/lWJ/Ka5iuR9UdDyR3ZOv1oGQGrQ7fTEmfH1kVI8XAOgdY/2SRYrL g==; X-CSE-ConnectionGUID: /KzDokRdTpatrgCtaak/7A== X-CSE-MsgGUID: EeCTSi0IQJ2tAfzTkBiEkQ== X-IronPort-AV: E=McAfee;i="6700,10204,11182"; a="13328363" X-IronPort-AV: E=Sophos;i="6.10,195,1719903600"; d="scan'208";a="13328363" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Sep 2024 02:45:45 -0700 X-CSE-ConnectionGUID: cLYY9qHoQs6Tbwzc07kq1Q== X-CSE-MsgGUID: UqE6OWo5SruaOCrFW7ebAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,195,1719903600"; d="scan'208";a="64758534" Received: from mlehtone-mobl.ger.corp.intel.com (HELO [10.245.244.77]) ([10.245.244.77]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Sep 2024 02:45:43 -0700 Message-ID: <7d4e0ccccd321a1cb66a9081d9a9f0089fb79e7e.camel@linux.intel.com> Subject: Re: [RFC PATCH 05/28] drm/gpusvm: Add support for GPU Shared Virtual Memory From: Thomas =?ISO-8859-1?Q?Hellstr=F6m?= To: Matthew Brost Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, airlied@gmail.com, christian.koenig@amd.com, matthew.auld@intel.com, daniel@ffwll.ch Date: Mon, 02 Sep 2024 11:45:40 +0200 In-Reply-To: References: <20240828024901.2582335-1-matthew.brost@intel.com> <20240828024901.2582335-6-matthew.brost@intel.com> <666768946f0d7dbf9373ab6369a4b0c7eefd8e2d.camel@linux.intel.com> Autocrypt: addr=thomas.hellstrom@linux.intel.com; prefer-encrypt=mutual; keydata=mDMEZaWU6xYJKwYBBAHaRw8BAQdAj/We1UBCIrAm9H5t5Z7+elYJowdlhiYE8zUXgxcFz360SFRob21hcyBIZWxsc3Ryw7ZtIChJbnRlbCBMaW51eCBlbWFpbCkgPHRob21hcy5oZWxsc3Ryb21AbGludXguaW50ZWwuY29tPoiTBBMWCgA7FiEEbJFDO8NaBua8diGTuBaTVQrGBr8FAmWllOsCGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQuBaTVQrGBr/yQAD/Z1B+Kzy2JTuIy9LsKfC9FJmt1K/4qgaVeZMIKCAxf2UBAJhmZ5jmkDIf6YghfINZlYq6ixyWnOkWMuSLmELwOsgPuDgEZaWU6xIKKwYBBAGXVQEFAQEHQF9v/LNGegctctMWGHvmV/6oKOWWf/vd4MeqoSYTxVBTAwEIB4h4BBgWCgAgFiEEbJFDO8NaBua8diGTuBaTVQrGBr8FAmWllOsCGwwACgkQuBaTVQrGBr/P2QD9Gts6Ee91w3SzOelNjsus/DcCTBb3fRugJoqcfxjKU0gBAKIFVMvVUGbhlEi6EFTZmBZ0QIZEIzOOVfkaIgWelFEH Organization: Intel Sweden AB, Registration Number: 556189-6027 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.50.4 (3.50.4-1.fc39) MIME-Version: 1.0 X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" Hi, Matt. On Fri, 2024-08-30 at 13:47 +0000, Matthew Brost wrote: > On Fri, Aug 30, 2024 at 11:57:33AM +0200, Thomas Hellstr=C3=B6m wrote: > > Hi, Matthew, > >=20 > > Agreed the below might not be important just now, but some ideas: > >=20 > > On Thu, 2024-08-29 at 20:56 +0000, Matthew Brost wrote: > > > Issues with removing a SVM range: > > >=20 > > > - Xe bind code stores invalidation / present state in VMA, this > > > would > > > =C2=A0 need to be moved to the radix tree. I have Jira open for that > > > work > > > =C2=A0 which I believe other developers are going to own. > >=20 > > Yeah, although we shouldn't *design* around xe bind-code and page- > > table > > code shortcomings. > >=20 >=20 > I'm thinking this one certainly should be fixed sooner rather than > later which would be helpful. >=20 > But let's also consider the case where we get a bunch of individual > page > invalidates serially for an entire range (I can't remember when this > happens but I have seen it in my testing, will look into this more to > figure exactly when). If we invalidate 1 page at a time in radix > tree, > each invalidation could potentially results in TLB invalidation > interaction with the hardware in cases where a larger GPU pages are > not > being used. The TLB invalidation is going to vastly slower than any > CPU > operation (e.g. RB search, radix tree walk). If we key on a range > invalidate the entire once on the first invalidation this may end up > being significantly faster. >=20 > Above is pure speculation though, a lot of what both of us is saying > is... So another reason I'd like to get apps running to do profiling. > It > would be nice to make design decisions based on data not speculation. Well nothing would stop you from adding a configurable invalidation granularity, even with a radix-tree based approach. You'd just pad the invalidation range to match the granularity. >=20 > >=20 > > > - Where would the dma mapping / device pages be stored? > > > - In the radix tree? What if ATS is enabled? We don't > > > have a > > > =C2=A0 driver owned radix tree. How do we reasonably connect > > > a > > > driver > > > =C2=A0 owned radix to a common GPUSVM layer? > >=20 > > With ATS you mean IOMMU SVA, right? I think we could assume that > > any > > user of this code also has a gpu page-table since otherwise they > > couldn't be using VRAM and a simpler solution would be in place.=C2=A0 > >=20 >=20 > Fair point. >=20 > > But to that specific question, drm_gpusvm state would live in a > > drm_gpusvm radix tree and driver-specific stuff in the driver tree. > > A > > helper based approach would then call drm_gpusvm_unmap_dma(range), > > whereas a middle layer would just traverse the tree and unmap. > >=20 >=20 > Let me consider this. Open to all options. >=20 > > > - In the notifier? What is the notifier is sparsely > > > populated? > > > =C2=A0 We would be wasting huge amounts of memory. What is > > > the > > > =C2=A0 notifier is configured to span the entire virtual > > > address > > > =C2=A0 space? > >=20 > > Let's assume you use a fake page-table like in xe_pt_walk.c as your > > "radix tree", adapted to relevant page-sizes, sparsity is not a > > problem. > >=20 >=20 > Ok, makes sense I think. >=20 > > > - How does the garbage collector work? We can't allocate memory > > > in > > > the > > > =C2=A0 notifier so we don't anything to add to the garbage collector. > > > We > > > =C2=A0 can't directly modify page tables given you need lock in the > > > path > > > of > > > =C2=A0 reclaim. > >=20 > > The garbage collector would operate on the whole invalidated range. > > In > > the case of xe, upon zapping under reclaim you mark individual > > page- > > table bos that are to be removed as "invalid", the garbage > > collector > > walks the range removing the "invalid" entries. Subsequent (re- > > binding) > > avoids the "invalid" entries, (perhaps even helps removing them) > > and > > can thus race with the garbage collector. Hence, any ranges implied > > by > > the page-table code are elimitated. > >=20 >=20 > This is pretty much with what I came up with too if we didn't have a > SVM > range. >=20 > > > - How do we deal with fault storms (e.g. tons of faults hitting > > > the > > > same > > > =C2=A0 SVM range in a row)? Without a SVM range no every to know if > > > mapping > > > =C2=A0 is valid and GPU page handler can be short circuited. > >=20 > > Perhaps look at page-table tree and check whether the gpu_pte > > causing > > the fault is valid. > >=20 >=20 > Came up with the same thing. >=20 > > > - Do we have notifier seqno for every PTE? > >=20 > > I'd say no. With this approach it makes sense to have a wide > > notifier. > > The seqno now only affects binding of new gpu_ptes, so the problem > > with > > a wide notifier becomes that if invalidation occurs to *any* part > > of > > the notifier while we're in the read section during binding, we > > need to >=20 > I have avoided this by the drm_gpusvm_range_pages_valid. This isn't > just > an optimization is actually required for the 2 tile case to be able > to > safely know when dma pages can be unmapped (i.e. you can't dma unmap > pages if either tile has a valid mapping). OK, I still need to read up on that.. Thanks, Thomas >=20 > Matt >=20 > > rerun the binding. Adding more notifiers to mitigate that would be > > to > > optimize faulting performance over core invalidation performance > > which > > Jason asked us to avoid. > >=20 > > /Thomas > >=20 > >=20 > >=20