From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 137E8C41535 for ; Sun, 17 Dec 2023 02:00:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B037910E032; Sun, 17 Dec 2023 02:00:29 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6F5A210E032 for ; Sun, 17 Dec 2023 02:00:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702778428; x=1734314428; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=tVCIIOfwoZ2S6N1KsK4PlPxa31jjZs7JU2umc4wK+Rw=; b=dmuY7YCkyi1P32PWLPFs5bFlYCbYD19LGT4imJCAAsPlcLbFSbqfclDi N4W7RdSd2eaVmtMBODe0ReUNWCEZBy2Jqqx/yXn8NZQos2FCL5kD0oAY3 53/fxF3Swv8CxEUmklb7AvU2donL57GjI0Fp9gqavpDgwGgQu+pht5Hcz bxmYkyCY98lEP0zXEaWwmvsQafHDWNKGmtMWQta7e0FzLM2LM8bsl2lz3 bTRLNLoFfnxhLMmheLIaR1y6REddxJaeFyd0Z/WEMWcYfleGGYP48Ic0O jXm8XMV6GhxCy8WBOsnJtdFAEdn0xq8eS8sICXOKxJffkDWplE6NrbUeF A==; X-IronPort-AV: E=McAfee;i="6600,9927,10926"; a="392561097" X-IronPort-AV: E=Sophos;i="6.04,282,1695711600"; d="scan'208";a="392561097" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Dec 2023 13:50:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10926"; a="865781580" X-IronPort-AV: E=Sophos;i="6.04,282,1695711600"; d="scan'208";a="865781580" Received: from hekner-mobl1.ger.corp.intel.com (HELO [10.249.254.48]) ([10.249.254.48]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Dec 2023 13:50:20 -0800 Message-ID: Date: Sat, 16 Dec 2023 22:50:18 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH] drm/xe/migrate: Fix CCS copy for small VRAM copy chunks To: Matt Roper References: <20231215125436.41135-1-thomas.hellstrom@linux.intel.com> <20231215234049.GC1327160@mdroper-desk1.amr.corp.intel.com> Content-Language: en-US From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= In-Reply-To: <20231215234049.GC1327160@mdroper-desk1.amr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-xe@lists.freedesktop.org Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" Hi,  Matt On 12/16/23 00:40, Matt Roper wrote: > On Fri, Dec 15, 2023 at 01:54:36PM +0100, Thomas Hellström wrote: >> Since the migrate code is using the identity map for addressing VRAM, >> copy chunks may become as small as 64K if the VRAM resource is fragmented. >> >> However, a chunk size smaller that 1MiB may lead to the *next* chunk's >> offset into the CCS metadata backup memory may not be page-aligned, and >> the XY_CTRL_SURF_COPY_BLT command can't handle that, and even if it could, >> the current code doesn't handle the offset calculaton correctly. >> >> To fix this, make sure we align the size of VRAM copy chunks to 1MiB. If > Does this need to be device-specific (derived from > NUM_BYTES_PER_CCS_BYTE)? On DG2 the main:ccs ratio is 256:1, but on LNL > (and presumably future platforms) it's 512:1. Yes, we need to update this once we have a DGFX card with other than 256:1 (LNL is not affected since we don't use the identity map), but we don't know yet whether there might be other changes as well. Perhaps I should look at adjusting the 1MiB size alignment based on NUM_BYTES_PER_CCS_BYTE. /Thomas > Matt > >> the remaining data to copy is smaller than that, that's not a problem, >> so use the remaining size. If the VRAM copy cunk becomes fragmented due >> to the size alignment restriction, don't use the identity map, but instead >> emit PTEs into the page-table like we do for system memory. >> >> Signed-off-by: Thomas Hellström >> --- >> drivers/gpu/drm/xe/tests/xe_migrate.c | 2 +- >> drivers/gpu/drm/xe/xe_migrate.c | 67 ++++++++++++++++----------- >> 2 files changed, 40 insertions(+), 29 deletions(-) >> >> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c >> index 47fcd6e6b777..5f5b416dc88c 100644 >> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c >> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c >> @@ -331,7 +331,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test) >> xe_res_first_sg(xe_bo_sg(pt), 0, pt->size, &src_it); >> >> emit_pte(m, bb, NUM_KERNEL_PDE - 1, xe_bo_is_vram(pt), >> - &src_it, XE_PAGE_SIZE, pt); >> + &src_it, XE_PAGE_SIZE, pt->ttm.resource); >> >> run_sanity_job(m, xe, bb, bb->len, "Writing PTE for our fake PT", test); >> >> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c >> index 2ca927f3fb2a..0b8a33116322 100644 >> --- a/drivers/gpu/drm/xe/xe_migrate.c >> +++ b/drivers/gpu/drm/xe/xe_migrate.c >> @@ -411,14 +411,31 @@ struct xe_migrate *xe_migrate_init(struct xe_tile *tile) >> >> static u64 xe_migrate_res_sizes(struct xe_res_cursor *cur) >> { >> - /* >> - * For VRAM we use identity mapped pages so we are limited to current >> - * cursor size. For system we program the pages ourselves so we have no >> - * such limitation. >> - */ >> - return min_t(u64, MAX_PREEMPTDISABLE_TRANSFER, >> - mem_type_is_vram(cur->mem_type) ? cur->size : >> - cur->remaining); >> + u64 size = min_t(u64, MAX_PREEMPTDISABLE_TRANSFER, cur->remaining); >> + >> + if (mem_type_is_vram(cur->mem_type)) { >> + /* >> + * VRAM we want to blit in chunks with sizes aligned to >> + * 1MiB in order for the offset to CCS metadata to be >> + * page-aligned. If it's the last chunk it may be smaller. >> + * >> + * Another constraint is that we need to limit the blit to >> + * the VRAM block size, unless size is smaller than 1MiB. >> + */ >> + u64 chunk = max_t(u64, cur->size, SZ_1M); >> + >> + size = min_t(u64, size, chunk); >> + if (size > SZ_1M) >> + size = round_down(size, SZ_1M); >> + } >> + >> + return size; >> +} >> + >> +static bool xe_migrate_avoid_identity(u64 size, const struct xe_res_cursor *cur) >> +{ >> + /* The chunk is fragmented. Hence can't use identity map. */ >> + return cur->size < size; >> } >> >> static u32 pte_update_size(struct xe_migrate *m, >> @@ -431,7 +448,7 @@ static u32 pte_update_size(struct xe_migrate *m, >> u32 cmds = 0; >> >> *L0_pt = pt_ofs; >> - if (!is_vram) { >> + if (!is_vram || xe_migrate_avoid_identity(*L0, cur)) { >> /* Clip L0 to available size */ >> u64 size = min(*L0, (u64)avail_pts * SZ_2M); >> u64 num_4k_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE); >> @@ -461,20 +478,13 @@ static void emit_pte(struct xe_migrate *m, >> struct xe_bb *bb, u32 at_pt, >> bool is_vram, >> struct xe_res_cursor *cur, >> - u32 size, struct xe_bo *bo) >> + u32 size, struct ttm_resource *res) >> { >> u16 pat_index = tile_to_xe(m->tile)->pat.idx[XE_CACHE_WB]; >> u32 ptes; >> u64 ofs = at_pt * XE_PAGE_SIZE; >> u64 cur_ofs; >> >> - /* >> - * FIXME: Emitting VRAM PTEs to L0 PTs is forbidden. Currently >> - * we're only emitting VRAM PTEs during sanity tests, so when >> - * that's moved to a Kunit test, we should condition VRAM PTEs >> - * on running tests. >> - */ >> - >> ptes = DIV_ROUND_UP(size, XE_PAGE_SIZE); >> >> while (ptes) { >> @@ -498,10 +508,10 @@ static void emit_pte(struct xe_migrate *m, >> if ((m->q->vm->flags & XE_VM_FLAG_64K) && >> !(cur_ofs & (16 * 8 - 1))) { >> xe_tile_assert(m->tile, IS_ALIGNED(addr, SZ_64K)); >> - flags |= XE_PTE_PS64; >> } >> >> - addr += vram_region_gpu_offset(bo->ttm.resource); >> + addr += vram_region_gpu_offset(res); >> + flags |= XE_PTE_PS64; >> devmem = true; >> } >> >> @@ -730,6 +740,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, >> &ccs_ofs, &ccs_pt, 0, >> 2 * NUM_PT_PER_BLIT, >> NUM_PT_PER_BLIT); >> + xe_assert(xe, IS_ALIGNED(ccs_it.start, PAGE_SIZE)); >> } >> >> /* Add copy commands size here */ >> @@ -742,20 +753,20 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, >> goto err_sync; >> } >> >> - if (!src_is_vram) >> + if (!src_is_vram || xe_migrate_avoid_identity(src_L0, &src_it)) >> emit_pte(m, bb, src_L0_pt, src_is_vram, &src_it, src_L0, >> - src_bo); >> + src); >> else >> xe_res_next(&src_it, src_L0); >> >> - if (!dst_is_vram) >> + if (!dst_is_vram || xe_migrate_avoid_identity(src_L0, &dst_it)) >> emit_pte(m, bb, dst_L0_pt, dst_is_vram, &dst_it, src_L0, >> - dst_bo); >> + dst); >> else >> xe_res_next(&dst_it, src_L0); >> >> if (copy_system_ccs) >> - emit_pte(m, bb, ccs_pt, false, &ccs_it, ccs_size, src_bo); >> + emit_pte(m, bb, ccs_pt, false, &ccs_it, ccs_size, src); >> >> bb->cs[bb->len++] = MI_BATCH_BUFFER_END; >> update_idx = bb->len; >> @@ -984,12 +995,12 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, >> size -= clear_L0; >> >> /* Preemption is enabled again by the ring ops. */ >> - if (!clear_vram) { >> + if (!clear_vram || xe_migrate_avoid_identity(clear_L0, &src_it)) >> emit_pte(m, bb, clear_L0_pt, clear_vram, &src_it, clear_L0, >> - bo); >> - } else { >> + dst); >> + else >> xe_res_next(&src_it, clear_L0); >> - } >> + >> bb->cs[bb->len++] = MI_BATCH_BUFFER_END; >> update_idx = bb->len; >> >> -- >> 2.42.0 >>