From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1D793C3DA59 for ; Tue, 16 Jul 2024 08:51:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E352010E5E9; Tue, 16 Jul 2024 08:51:56 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="diIKkE96"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id C205F10E5E7 for ; Tue, 16 Jul 2024 08:51:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721119915; x=1752655915; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=Qscfgqh8IBU8qn7uhwXQSYMqkI5p4EL3fSTwHKX7vYY=; b=diIKkE96AuZqH1vqwiQyz9xbrnjry/DBfDjOhA0fgOc5g8vLt/5hZWVk HcwugCkXscfjJzyb9k6pTnse1eejDiUbdTOGe3qPJghAxsQx9hX49bliM fQL/XCua2wwfrAUTd0NBXu6+mB3vM0LCIeGqqBvgD3lIpX5QFUpDAgsBX LNsFbTDs47Ovwgr+tAvluFlguqcKuKClVBMw6OTSzDlQqQPaYVaONCuhw NtdjjAimqM7LrivQ5XSpmBwEvs8xyAUTgnU03uo49B8ZS5ISWivNagR1P OysQo1Si5capMl1VgXnRUw/Ip45qboR5VbmGLTjEFZm7ooSdWhA8IxzDC g==; X-CSE-ConnectionGUID: L79GmXoURfmMe6joIv/thg== X-CSE-MsgGUID: cHwgX/jvSECjNYvjRQTnTw== X-IronPort-AV: E=McAfee;i="6700,10204,11134"; a="22354333" X-IronPort-AV: E=Sophos;i="6.09,211,1716274800"; d="scan'208";a="22354333" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jul 2024 01:51:55 -0700 X-CSE-ConnectionGUID: wL3mvn+TRL+q8TZkNuKigA== X-CSE-MsgGUID: EeV+CVfGT1qF7w0nTG2LAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,211,1716274800"; d="scan'208";a="55087625" Received: from johunt-mobl9.ger.corp.intel.com (HELO [10.245.244.35]) ([10.245.244.35]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jul 2024 01:51:53 -0700 Message-ID: <72264df8-d920-4d9a-8631-d020c6d2c93a@intel.com> Date: Tue, 16 Jul 2024 09:51:50 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/6] drm/xe/migrate: Add kunit to test clear functionality To: "Jahagirdar, Akshata" , intel-xe@lists.freedesktop.org Cc: matthew.d.roper@intel.com, himal.prasad.ghimiray@intel.com, lucas.demarchi@intel.com References: <5ab88aebf91c7f7f35fd3a0d4601df80fde9f03d.1720689220.git.akshata.jahagirdar@intel.com> <0f4f3aef-d998-4bad-95ad-e495b8a9e7bf@intel.com> Content-Language: en-GB From: Matthew Auld In-Reply-To: <0f4f3aef-d998-4bad-95ad-e495b8a9e7bf@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On 12/07/2024 05:15, Jahagirdar, Akshata wrote: > > On 7/11/2024 7:27 AM, Matthew Auld wrote: >> On 11/07/2024 10:18, Akshata Jahagirdar wrote: >>> This test verifies if the main and ccs data are cleared during bo >>> creation. >>> The motivation to use Kunit instead of IGT is that, although we can >>> verify >>> whether the data is zero following bo creation, >>> we cannot confirm whether the zero value after bo creation is the >>> result of >>> our clear function or simply because the initial data present there >>> was zero. >> >> One idea might be to pre-fill and release some amount of vram at the >> start of the test. Then maybe spawn a number threads each allocating a >> bit of vram (also various sizes) in a loop, each time dirtying the >> pages and ccs before releasing it. There should be some amount of page >> re-use to catch issues, plus is a lot more nasty with multiple threads >> and varying sizes. Such a test can also be useful for lnl, just that >> we use system memory instead of vram. Or maybe we already have >> something like that? >> > If I am not wrong, think there is an igt testcase xe_evict_ccs which > exercises similar testcases. >>> >>> Signed-off-by: Akshata Jahagirdar >>> --- >>>   drivers/gpu/drm/xe/tests/xe_migrate.c      | 271 +++++++++++++++++++++ >>>   drivers/gpu/drm/xe/tests/xe_migrate_test.c |   1 + >>>   drivers/gpu/drm/xe/tests/xe_migrate_test.h |   1 + >>>   3 files changed, 273 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c >>> b/drivers/gpu/drm/xe/tests/xe_migrate.c >>> index 962f6438e219..6538c0e6b57e 100644 >>> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c >>> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c >>> @@ -359,3 +359,274 @@ void xe_migrate_sanity_kunit(struct kunit *test) >>>       xe_call_for_each_device(migrate_test_run_device); >>>   } >>>   EXPORT_SYMBOL_IF_KUNIT(xe_migrate_sanity_kunit); >>> + >>> +static struct dma_fence *blt_copy(struct xe_tile *tile, >>> +                struct xe_bo *src_bo, struct xe_bo *dst_bo, >>> +                bool copy_only_ccs, const char *str, struct kunit >>> *test) >>> +{ >>> +    struct xe_gt *gt = tile->primary_gt; >>> +    struct xe_migrate *m = tile->migrate; >>> +    struct xe_device *xe = gt_to_xe(gt); >>> +    struct dma_fence *fence = NULL; >>> +    u64 size = src_bo->size; >>> +    struct xe_res_cursor src_it, dst_it; >>> +    struct ttm_resource *src = src_bo->ttm.resource, *dst = >>> dst_bo->ttm.resource; >>> +    u64 src_L0_ofs, dst_L0_ofs; >>> +    u32 src_L0_pt, dst_L0_pt; >>> +    u64 src_L0, dst_L0; >>> +    int err; >>> +    bool src_is_vram = mem_type_is_vram(src->mem_type); >>> +    bool dst_is_vram = mem_type_is_vram(dst->mem_type); >>> + >>> +    if (!src_is_vram) >>> +        xe_res_first_sg(xe_bo_sg(src_bo), 0, size, &src_it); >>> +    else >>> +        xe_res_first(src, 0, size, &src_it); >>> + >>> +    if (!dst_is_vram) >>> +        xe_res_first_sg(xe_bo_sg(dst_bo), 0, size, &dst_it); >>> +    else >>> +        xe_res_first(dst, 0, size, &dst_it); >>> + >>> +    while (size) { >>> +        u32 batch_size = 2; /* arb_clear() + MI_BATCH_BUFFER_END */ >>> +        struct xe_sched_job *job; >>> +        struct xe_bb *bb; >>> +        u32 flush_flags = 0; >>> +        u32 update_idx; >>> +        u32 avail_pts = max_mem_transfer_per_pass(xe) / >>> LEVEL0_PAGE_TABLE_ENCODE_SIZE; >>> + >>> +        src_L0 = xe_migrate_res_sizes(m, &src_it); >>> +        dst_L0 = xe_migrate_res_sizes(m, &dst_it); >>> + >>> +        src_L0 = min(src_L0, dst_L0); >>> + >>> +        batch_size += pte_update_size(m, src_is_vram, src_is_vram, >>> src, &src_it, &src_L0, >>> +                          &src_L0_ofs, &src_L0_pt, 0, 0, >>> +                          avail_pts); >>> + >>> +        batch_size += pte_update_size(m, dst_is_vram, dst_is_vram, >>> dst, &dst_it, &src_L0, >>> +                          &dst_L0_ofs, &dst_L0_pt, 0, >>> +                          avail_pts, avail_pts); >>> + >>> +        /* Add copy commands size here */ >>> +        batch_size += ((copy_only_ccs) ? 0 : EMIT_COPY_DW) + >>> +            ((xe_device_has_flat_ccs(xe) && copy_only_ccs) ? >>> EMIT_COPY_CCS_DW : 0); >>> + >>> +        bb = xe_bb_new(gt, batch_size, xe->info.has_usm); >>> +        if (IS_ERR(bb)) { >>> +            err = PTR_ERR(bb); >>> +            goto err_sync; >>> +        } >>> + >>> +        if (src_is_vram) >>> +            xe_res_next(&src_it, src_L0); >>> +        else >>> +            emit_pte(m, bb, src_L0_pt, src_is_vram, false, >>> +                 &src_it, src_L0, src); >>> + >>> +        if (dst_is_vram) >>> +            xe_res_next(&dst_it, src_L0); >>> +        else >>> +            emit_pte(m, bb, dst_L0_pt, dst_is_vram, false, >>> +                 &dst_it, src_L0, dst); >>> + >>> +        bb->cs[bb->len++] = MI_BATCH_BUFFER_END; >>> +        update_idx = bb->len; >>> +        if (!copy_only_ccs) >>> +            emit_copy(gt, bb, src_L0_ofs, dst_L0_ofs, src_L0, >>> XE_PAGE_SIZE); >>> + >>> +        if (copy_only_ccs) >>> +            flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, >>> +                    src_is_vram, dst_L0_ofs, >>> +                    dst_is_vram, src_L0, dst_L0_ofs, copy_only_ccs); >>> + >>> +        mutex_lock(&m->job_mutex); >> >> This is not allowed anymore it seems. CI is also complaing about this. >> See: 50e52592fbe791d96ec2cb431d158cc6bc495be5 >> >> Not sure what else has changed. >> > I see. I missed this. Thank you! Will fix this! >>> + >>> +        job = xe_bb_create_migration_job(m->q, bb, >>> +                         xe_migrate_batch_base(m, xe->info.has_usm), >>> +                         update_idx); >>> +        if (IS_ERR(job)) { >>> +            err = PTR_ERR(job); >>> +            goto err; >>> +        } >>> + >>> +        xe_sched_job_add_migrate_flush(job, flush_flags); >>> + >>> +        xe_sched_job_arm(job); >>> +        dma_fence_put(fence); >>> +        fence = dma_fence_get(&job->drm.s_fence->finished); >>> +        xe_sched_job_push(job); >>> + >>> +        dma_fence_put(m->fence); >>> +        m->fence = dma_fence_get(fence); >>> + >>> +        mutex_unlock(&m->job_mutex); >>> + >>> +        xe_bb_free(bb, fence); >>> +        size -= src_L0; >>> +        continue; >>> + >>> +err: >>> +        mutex_unlock(&m->job_mutex); >>> +        xe_bb_free(bb, NULL); >>> + >>> +err_sync: >>> +        if (fence) { >>> +            dma_fence_wait(fence, false); >>> +            dma_fence_put(fence); >>> +        } >>> +        return ERR_PTR(err); >>> +    } >>> + >>> +    return fence; >>> +} >>> + >>> +static void test_clear(struct xe_device *xe, struct xe_tile *tile, >>> +            struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct >>> kunit *test) >>> +{ >>> +    struct dma_fence *fence; >>> +    u64 expected, retval; >>> + >>> +    expected = 0xd0d0d0d0d0d0d0d0; >>> +    xe_map_memset(xe, &sys_bo->vmap, 0, 0xd0, sys_bo->size); >>> + >>> +    fence = blt_copy(tile, sys_bo, vram_bo, false, "Blit copy from >>> sysmem to vram", test); >>> +    if (!sanity_fence_failed(xe, fence, "Blit copy from sysmem to >>> vram", test)) { >>> +        retval = xe_map_rd(xe, &vram_bo->vmap, 0, u64); >>> +        if (retval == expected) >>> +            KUNIT_FAIL(test, "Sanity check failed: VRAM must have >>> compressed value\n"); >>> +    } >>> +    dma_fence_put(fence); >>> + >>> +    fence = blt_copy(tile, vram_bo, sys_bo, false, "Blit copy from >>> vram to sysmem", test); >>> +    if (!sanity_fence_failed(xe, fence, "Blit copy from vram to >>> sysmem", test)) { >>> +        retval = xe_map_rd(xe, &sys_bo->vmap, 0, u64); >>> +        check(retval, expected, "Decompressed value must be equal to >>> initial value", test); >>> +        retval = xe_map_rd(xe, &sys_bo->vmap, sys_bo->size - 8, u64); >>> +        check(retval, expected, "Decompressed value must be equal to >>> initial value", test); >>> +    } >>> +    dma_fence_put(fence); >>> + >>> +    kunit_info(test, "Clear vram buffer object\n"); >>> +    expected = 0x0000000000000000; >>> +    fence = xe_migrate_clear(tile->migrate, vram_bo, >>> vram_bo->ttm.resource); >>> + >>> +    fence = blt_copy(tile, vram_bo, sys_bo, >>> +                false, "Blit copy from vram to sysmem", test); >>> +    if (!sanity_fence_failed(xe, fence, "Clear main buffer data", >>> test)) { >>> +        retval = xe_map_rd(xe, &sys_bo->vmap, 0, u64); >>> +        check(retval, expected, "Clear main buffer first value", test); >>> +        retval = xe_map_rd(xe, &sys_bo->vmap, sys_bo->size - 8, u64); >>> +        check(retval, expected, "Clear main buffer last value", test); >>> +    } >>> +    dma_fence_put(fence); >>> + >>> +    fence = blt_copy(tile, vram_bo, sys_bo, >>> +            true, "Blit surf copy from vram to sysmem", test); >>> +    if (!sanity_fence_failed(xe, fence, "Clear ccs buffer data", >>> test)) { >>> +        retval = xe_map_rd(xe, &sys_bo->vmap, 0, u64); >>> +        check(retval, expected, "Clear ccs data first value", test); >>> +        retval = xe_map_rd(xe, &sys_bo->vmap, sys_bo->size - 8, u64); >>> +        check(retval, expected, "Clear ccs data last value", test); >>> +    } >>> +    dma_fence_put(fence); >>> +} >>> + >>> +static void validate_ccs_test_run_tile(struct xe_device *xe, struct >>> xe_tile *tile, struct kunit *test) >>> +{ >>> +    struct xe_bo *sys_bo, *vram_bo; >>> +    unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile); >>> +    long ret; >>> + >>> +    sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M, >>> DRM_XE_GEM_CPU_CACHING_WC, >>> +            ttm_bo_type_device, XE_BO_FLAG_SYSTEM | >>> XE_BO_FLAG_NEEDS_CPU_ACCESS); >>> + >>> +    if (IS_ERR(sys_bo)) { >>> +        KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n", >>> +               PTR_ERR(sys_bo)); >>> +        return; >>> +    } >>> + >>> +    xe_bo_lock(sys_bo, false); >>> +    ret = xe_bo_validate(sys_bo, NULL, false); >>> +    if (ret) { >>> +        KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", >>> ret); >>> +        goto out_unlock; >> >> vram_bo is not initialized here. Also I guess needs an unlock. Below also. >> >>> +    } >>> + >>> +    ret = xe_bo_vmap(sys_bo); >>> +    if (ret) { >>> +        KUNIT_FAIL(test, "Failed to vmap system bo: %li\n", ret); >>> +        goto out_unlock; >>> +    } >>> +    xe_bo_unlock(sys_bo); >>> + >>> +    vram_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M, >>> DRM_XE_GEM_CPU_CACHING_WC, >>> +                   ttm_bo_type_device, bo_flags | >>> XE_BO_FLAG_NEEDS_CPU_ACCESS); >>> +    if (IS_ERR(vram_bo)) { >>> +        KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n", >>> +               PTR_ERR(vram_bo)); >>> +        return; >> >> Missing bo_put? >> > Since, bo creation failed, we don't call bo_put() again. > This is taken care by the xe_bo_create_user() function. The sys_bo needs bo_put() here, or maybe better jump to onion unwind. >> >>> +    } >>> + >>> +    xe_bo_lock(vram_bo, false); >>> +    ret = xe_bo_validate(vram_bo, NULL, false); >>> +    if (ret) { >>> +        KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret); >>> +        goto out_unlock; >>> +    } >>> + >>> +    ret = xe_bo_vmap(vram_bo); >>> +    if (ret) { >>> +        KUNIT_FAIL(test, "Failed to vmap vram bo: %li\n", ret); >>> +        goto out_unlock; >>> +    } >>> + >>> +    test_clear(xe, tile, sys_bo, vram_bo, test); >>> +    xe_bo_unlock(vram_bo); >>> + >>> +    xe_bo_lock(vram_bo, false); >>> +    xe_bo_vunmap(vram_bo); >>> +    xe_bo_unlock(vram_bo); >>> + >>> +    xe_bo_lock(sys_bo, false); >>> +    xe_bo_vunmap(sys_bo); >>> +    xe_bo_unlock(sys_bo); >>> +out_unlock: >> >> There is no unlock here. > Will change it to out_put here. >>> +    xe_bo_put(vram_bo); >>> +    xe_bo_put(sys_bo); >>> +} >>> + >>> +static int validate_ccs_test_run_device(struct xe_device *xe) >>> +{ >>> +    struct kunit *test = xe_cur_kunit(); >>> +    struct xe_tile *tile; >>> +    int id; >>> + >>> +    if (!xe_device_has_flat_ccs(xe)) { >>> +        kunit_info(test, "Skipping non-flat-ccs device.\n"); >>> +        return 0; >>> +    } >>> + >>> +    if (!(GRAPHICS_VER(xe) >= 20 && IS_DGFX(xe))) { >>> +        kunit_info(test, "Skipping non-xe2 discrete device %s.\n", >>> +               dev_name(xe->drm.dev)); >>> +        return 0; >>> +    } >> >> So is there something else for xe2 igpu and ccs? > Yes, so there is another kunit test xe_ccs_migrate_kunit for xe2 igpu > that does data validation for ccs as well. >>> + >>> +    xe_pm_runtime_get(xe); >>> + >>> +    for_each_tile(tile, xe, id) >>> +        validate_ccs_test_run_tile(xe, tile, test); >>> + >>> +    xe_pm_runtime_put(xe); >>> + >>> +    return 0; >>> +} >>> + >>> +void xe_validate_ccs_kunit(struct kunit *test) >>> +{ >>> +    xe_call_for_each_device(validate_ccs_test_run_device); >>> +} >>> +EXPORT_SYMBOL_IF_KUNIT(xe_validate_ccs_kunit); >>> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate_test.c >>> b/drivers/gpu/drm/xe/tests/xe_migrate_test.c >>> index eb0d8963419c..49b9ef060ad3 100644 >>> --- a/drivers/gpu/drm/xe/tests/xe_migrate_test.c >>> +++ b/drivers/gpu/drm/xe/tests/xe_migrate_test.c >>> @@ -9,6 +9,7 @@ >>>     static struct kunit_case xe_migrate_tests[] = { >>>       KUNIT_CASE(xe_migrate_sanity_kunit), >>> +    KUNIT_CASE(xe_validate_ccs_kunit), >>>       {} >>>   }; >>>   diff --git a/drivers/gpu/drm/xe/tests/xe_migrate_test.h >>> b/drivers/gpu/drm/xe/tests/xe_migrate_test.h >>> index 7c645c66824f..dc0ecfdfad39 100644 >>> --- a/drivers/gpu/drm/xe/tests/xe_migrate_test.h >>> +++ b/drivers/gpu/drm/xe/tests/xe_migrate_test.h >>> @@ -9,5 +9,6 @@ >>>   struct kunit; >>>     void xe_migrate_sanity_kunit(struct kunit *test); >>> +void xe_validate_ccs_kunit(struct kunit *test); >>>     #endif