From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9445ECD6E60 for ; Wed, 3 Jun 2026 14:54:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F083010FFA9; Wed, 3 Jun 2026 14:54:12 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="N6EZqR2P"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3168D10FFA8; Wed, 3 Jun 2026 14:54:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1780498452; x=1812034452; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=pbl34bRwWQ9VbmO3Ss6Vz1WWFidxt4tsoZQ/3x74Eu4=; b=N6EZqR2PwH/DMIXjoo0WwZGvUyqrUAeD20BNShLYzHYea7ISkNcO/wLO b8LJn7eeVnII8oK59753MR1BRRnRdpf6WWg/fehX76odFlUuLRE/ZtK6A lI7NQ5Bi8EijByQ0QROxrsR1rIQ8LNNHbULlQscSwM2lDLt1GSGJ0mqlg eBmrFzPmi3vMaOP22DrtmD1ikBELsx5/qFhZrzTQsHG4ZrQqeF3CyiEvM B8HqznrRu/oNEd9QrY+7ZwAztT/KUI3DUEBSCbgAh/VXZLaEzfipWt9kR hbXT5dlp5joSuucUsw94D2YW5HS6o5zRVCUuIf+lnjGEXRu4cQPImA7gG w==; X-CSE-ConnectionGUID: sidtlel/RpaX+8WEtRTBGQ== X-CSE-MsgGUID: 9tScHauLR+K6rj+LyglqTA== X-IronPort-AV: E=McAfee;i="6800,10657,11806"; a="81372300" X-IronPort-AV: E=Sophos;i="6.24,185,1774335600"; d="scan'208";a="81372300" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jun 2026 07:54:11 -0700 X-CSE-ConnectionGUID: gVfrbf8WSPaQvw+C1gySgw== X-CSE-MsgGUID: PgR24P98Qx6aHGz3Gmr/1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,185,1774335600"; d="scan'208";a="241278362" Received: from fpallare-mobl4.ger.corp.intel.com (HELO localhost) ([10.245.244.220]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jun 2026 07:54:09 -0700 Date: Wed, 3 Jun 2026 17:54:05 +0300 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= To: Rodrigo Vivi Cc: Michal Wajdeczko , intel-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, Maarten Lankhorst Subject: Re: [PATCH] drm/xe/ggtt: use full-range drm_mm with reserved nodes on PF Message-ID: References: <20260527144527.3701844-2-rodrigo.vivi@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Patchwork-Hint: comment Organization: Intel Finland Oy - BIC 0357606-4 - c/o Alberga Business Park, 6 krs Bertel Jungin Aukio 5, 02600 Espoo, Finland X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Wed, Jun 03, 2026 at 09:44:17AM -0400, Rodrigo Vivi wrote: > On Wed, Jun 03, 2026 at 04:19:41PM +0300, Ville Syrjälä wrote: > > On Fri, May 29, 2026 at 05:03:55PM -0400, Rodrigo Vivi wrote: > > > On Fri, May 29, 2026 at 01:50:34PM +0300, Ville Syrjälä wrote: > > > > On Thu, May 28, 2026 at 06:28:47PM -0400, Rodrigo Vivi wrote: > > > > > On Fri, May 29, 2026 at 12:11:22AM +0200, Michal Wajdeczko wrote: > > > > > > > > > > > > > > > > > > On 5/27/2026 4:45 PM, Rodrigo Vivi wrote: > > > > > > > The PF GGTT allocator was initialised over a relative [0, usable_size) > > > > > > > range, with ggtt->start added on every address conversion to get the > > > > > > > actual hardware address. Two consequences of that model were considered > > > > > > > "horrible hacks": > > > > > > > > > > > > > > - ggtt->start (the WOPCM offset) had to be carried around and added > > > > > > > to every drm_mm result. > > > > > > > > > > > > hmm, but this an internal detail of the xe_ggtt implementation, so why > > > > > > would someone else complain about it? > > > > > > > > > > > > > - The GUC_GGTT_TOP ceiling silently truncated the GGTT range instead > > > > > > > > > > > > hmm, for the record, this GGTT cap on the top was added back in 2023 > > > > > > > > > > > > commit ab10e976fbda8349163ceee2ce99b2bfc97031b8 > > > > > > Author: Daniele Ceraolo Spurio > > > > > > Date: Wed Jun 14 10:47:54 2023 -0700 > > > > > > > > > > > > drm/xe: limit GGTT size to GUC_GGTT_TOP > > > > > > > > > > > > + * The GuC address space is limited on both ends of the GGTT, because > > > > > > + * the GuC shim HW redirects accesses to those addresses to other HW > > > > > > + * areas instead of going through the GGTT. On the bottom end, the GuC > > > > > > + * can't access offsets below the WOPCM size, while on the top side the > > > > > > + * limit is fixed at GUC_GGTT_TOP. To keep things simple, instead of > > > > > > + * checking each object to see if they are accessed by GuC or not, we > > > > > > + * just exclude those areas from the allocator. Additionally, to > > > > > > + * simplify the driver load, we use the maximum WOPCM size in this logic > > > > > > > > > > > > > of being made explicit, leaving PTEs in [GUC_GGTT_TOP, total_size) > > > > > > > untouched during the initial clear. > > > > > > > > > > > > and that likely will not be changed by this patch as after allocating 'two > > > > > > permanent zones', the drm_mm_for_each_hole will not iterate over them > > > > > > > > > > right... > > > > > > > > > > > > > > > > > > > > > > > > > Fix this for the PF case by initialising drm_mm over the full hardware > > > > > > > GGTT range [0, total_size) and permanently reserving the two forbidden > > > > > > > zones: > > > > > > > > > > > > > > - [0, wopcm) — inaccessible below WOPCM > > > > > > > - [GUC_GGTT_TOP, total_size) — inaccessible above GUC_GGTT_TOP > > > > > > > > > > > > that looks odds: why pretend to claim manageability of full [0, 4GB) > > > > > > of the GGTT and then immediately permanently reserve two end zones to > > > > > > end up with real [wopcm, GUC_TOP) which is what we already have? > > > > > > > > > > yes... > > > > > > > > > > > > > > > > > > > > > > > > > A new mm_offset field (zero for PF) carries the base offset used in > > > > > > > address conversions, unifying the existing VF relative model (where > > > > > > > mm_offset == vf_base) with the new PF absolute model. > > > > > > > > > > > > but public xe_ggtt API already uses absolute addressing in PF and VF > > > > > > > > > > I know... > > > > > > > > > > > > > > > > > > The public > > > > > > > xe_ggtt_start() / xe_ggtt_size() API continues to return the usable > > > > > > > [wopcm, GUC_GGTT_TOP) boundaries, so callers such as the SR-IOV PF > > > > > > > config code are unaffected. > > > > > > > > > > > > > > xe_ggtt_shift_nodes() now updates both ggtt->start and ggtt->mm_offset > > > > > > > so the VF recovery path remains a single O(1) WRITE_ONCE pair. > > > > > > > > > > > > maybe it's just me - but I can't figure out the real rationale for this > > > > > > patch - what did I miss? > > > > > > > > > > This series: > > > > > https://lore.kernel.org/intel-xe/20260511214122.8468-1-ville.syrjala@linux.intel.com/ > > > > > > > > > > And more specifically the discussion in this patch: > > > > > https://lore.kernel.org/intel-xe/20260511214122.8468-13-ville.syrjala@linux.intel.com/ > > > > > > > > > > > > > > > > > > > > > > > > > Suggested-by: Ville Syrjälä > > > > > > > Cc: Michal Wajdeczko > > > > > > > Cc: Maarten Lankhorst > > > > > > > Assisted-by: GitHub-Copilot:claude-sonnet-4.6 > > > > > > > Signed-off-by: Rodrigo Vivi > > > > > > > --- > > > > > > > drivers/gpu/drm/xe/xe_ggtt.c | 123 ++++++++++++++++++++++++++++------- > > > > > > > 1 file changed, 101 insertions(+), 22 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c > > > > > > > index a351c578b170..00a6cd2b8a51 100644 > > > > > > > --- a/drivers/gpu/drm/xe/xe_ggtt.c > > > > > > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c > > > > > > > @@ -137,6 +137,17 @@ struct xe_ggtt { > > > > > > > const struct xe_ggtt_pt_ops *pt_ops; > > > > > > > /** @mm: The memory manager used to manage individual GGTT allocations */ > > > > > > > struct drm_mm mm; > > > > > > > + /** > > > > > > > + * @mm_offset: base offset added to drm_mm node addresses to obtain hardware > > > > > > > + * GGTT addresses. For PF this is 0 (drm_mm uses absolute hardware addresses). > > > > > > > + * For VF this equals @start (drm_mm uses relative addresses from VF base). > > > > > > > + * Updated atomically by xe_ggtt_shift_nodes() during VF recovery. > > > > > > > + */ > > > > > > > + u64 mm_offset; > > > > > > > + /** @reserved_bottom: permanently reserved [0, WOPCM) drm_mm node for PF */ > > > > > > > + struct drm_mm_node reserved_bottom; > > > > > > > + /** @reserved_top: permanently reserved [GUC_GGTT_TOP, total) drm_mm node for PF */ > > > > > > > + struct drm_mm_node reserved_top; > > > > > > > > > > > > maybe all we need is to separate concepts of: > > > > > > > > > > > > * raw GGTT - fixed range [0, 4GB) > > > > > > > > > > > > from > > > > > > > > > > > > * allocable GGTT - configurable sub-range [start, end) > > > > > > * [wopcm, GUC_TOP) on PF > > > > > > * [base, base+size) on VF > > > > > > > > > > > > and then we can continue to use drm_mm.init(0, end-start) to manage > > > > > > that [start, end) range in a common way on both PF and VF? > > > > > > > > > > we need to be able to use a ggtt buffer that comes out of this range, > > > > > so I'm afraid it doesn't solve all the cases. > > > > > > > > Basically what the display needs is: > > > > 1. specify where in ggtt the buffer was originally placed by the GOP, > > > > this may be partially or fully inside these GuC reserved ranges > > > > 2. bind the buffer to some acceptable location (assuming the original > > > > location wasn't acceptable) without overwriting the PTEs for the > > > > original location > > > > > > > > I suppose this could be achieved even with this "mm doesn't cover the > > > > ends" hack, but step 1 there becomes a bit dodgy because we can't > > > > insert the mm node if it's fully outside the mm. I suppose it could > > > > still work if you hide it in a function that only validates the real > > > > ggtt offsets, but then ignores the fact that the node can't be > > > > inserted due to being fully inside those reserved ranges. And then > > > > whatever cleans up that original mm node must also ignore the fact > > > > that the node maybe wasn't even allocated. And also > > > > xe_ggtt_initial_clear() will need special code to clear the > > > > reserved ranges. > > > > > > right, so basically we could keep the xe_ggtt as is and provide > > > 2 hooks: > > > > > > 1. one to reserve the portion of the BIOS FB that goes > > > inside our managed ggtt area > > > 2. a special clear for this area > > > > > > And in between you do the rebind with existing infrastructure > > > to an empty region?! Is this what you are thinking now? > > > > > > > > > > > My original idea was that we'd just include the reserved regions > > > > in the mm, and then the display could just keep the buffer at its > > > > original location, and later the guc code can reserve what is > > > > left over. So we could skip step 2 above completely. But after > > > > a second thought we probably don't want to skip that step because > > > > we might free the display bo later, at which point we might free > > > > up some of the reserved ranges. So I guess we'd still want to keep > > > > step 2. But I think it'd still result in less special cases in the > > > > code. We'd just need the guc code to reserve what it needs, after > > > > the display code has rebound the bo to an acceptable location. > > > > > > > > So we'd end up with: > > > > 1. insert node for the bo's original ggtt location > > > > 2. rebind the display bo to an acceptable ggtt location > > > > 3. undo step 1 > > > > 4. xe_ggtt_initial_clear() (now also clears the reserved ranges > > > > without any special code) > > > > 5. guc steals the reserved ranges explicitly > > > > > > > > So only two special cases left really, and all the rest > > > > of the code is blissfully unaware of any of it. > > > > > > > > Hmm, although hibernation might still be a slight issue for > > > > xe_ggtt_initial_clear(). As in how would the reserved regions > > > > get cleared during resume from hibernation? I have no idea > > > > how the current xe ggtt code handles resume at all... > > > > > > The resume should only restore the pinned bo's one by one, nothing > > > special. > > > > Looks like currently xe_ggtt_initial_clear() is never even called > > during resume from hibernation, so in that case parts of the GGTT > > will be left with whatever garbage the GOP put there. So that's > > one thing that needs fixing. > > > > > So I guess if we keep the original code we are okay, > > > but if we start to managing the full range with the reserved areas > > > we might have some difficulties here on the way... > > > > If we had a full range mm I suppose we'd need a bit of special code > > to remove the reserved nodes before xe_ggtt_initial_clear() gets > > called, at least for the resume from hibernation case. > > it looks like Maarten suggestion fix the clear portion. > But for the steps 1 and 3 above we would need a special reservation > with a node area only within our managed area?! > > something like (for step 1): > > mm_start = max(start, ggtt->start); > mm_end = min(start + size, ggtt->start + ggtt->size); > > node->base.start = mm_start - ggtt->start; > node->base.size = mm_end - mm_start; > > drm_mm_reserve_node(&ggtt->mm, &node->base); > > and another special function to delete this special node > if needed to be created? Something like that could work. > Or what do you have on mind for the full area? I believe the full > area is this patch, but with some additions anyway since we need > to handle this buffer plus ensure it gets reserved when we don't > have it... >From the display side the full mm approach shouldn't really look any different. > > I'd like to avoid complications like the phys offset addition What phys_offset addition are we talking about here? > or > the full range if possible. > > Could you please incorporate something simple in a v2 of your series? If you mean me and my initial fb series then I have no plan to post a v2 of that. What I posted is IMO just fine as is. The remaining broken corner cases can be fixed afterwards once we get that new thing to properly protect the current PTEs. -- Ville Syrjälä Intel