From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84B75E77188 for ; Fri, 20 Dec 2024 17:19:26 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4768610E3B6; Fri, 20 Dec 2024 17:19:26 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="nLYroI05"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1481210E3B6 for ; Fri, 20 Dec 2024 17:19:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734715165; x=1766251165; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=B0pm40DTh18Oa6ml5uP5v0W9EjBggQCpLRgoXS6Kifg=; b=nLYroI05weCKHyPpXxu8AjPmUdXt4HcwIwfGNEVQc6JDVRivZDt2eK/Q O1P2u9uYFQlCznjbszBQvnWLIsYxdBZcEVOfhn/8URQR4eVv0+vzMDx9m IfOY8rpjUeM0X1Zx2MnAuiguV22M5JHV+rQ+ZXxLOW1DUKfwDhw80jRlQ MgGcjTQLEC9Og/jH9Vcfh1PiRRIy22UbwcwkNe/h42YDaCmZdD0pqXoVF 2/WkwlNs5jiqs411MgnsJAZYiyjX4thDdqLCxq1MrsVgaFBChFz1Otxva 5geKPXEv9/cexYiiPzbgXehgj4ZXBV5h/q94H33Euev1XCxCyNxZntRmE g==; X-CSE-ConnectionGUID: qYwgHG1EQ82x3dtd1msQrw== X-CSE-MsgGUID: 2TDjk5jeTXSutXnLRy5kHg== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="52793966" X-IronPort-AV: E=Sophos;i="6.12,251,1728975600"; d="scan'208";a="52793966" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 09:19:25 -0800 X-CSE-ConnectionGUID: eFVwGWgAQXCv+ZMkbjg5hA== X-CSE-MsgGUID: R1fUe1otSS2qCFjrEaP2pg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103548803" Received: from orsosgc001.jf.intel.com (HELO orsosgc001.intel.com) ([10.165.21.142]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 09:19:25 -0800 Date: Fri, 20 Dec 2024 09:19:24 -0800 Message-ID: <85ikre1dcz.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: "Souza, Jose" Cc: "intel-xe@lists.freedesktop.org" , "Nerlige Ramappa, Umesh" Subject: Re: [PATCH v4 0/2] Fixes for MI_REPORT_PERF_COUNT In-Reply-To: References: <20241220002234.554135-1-umesh.nerlige.ramappa@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-redhat-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Fri, 20 Dec 2024 08:16:45 -0800, Souza, Jose wrote: > Hi Jose, > On Thu, 2024-12-19 at 16:22 -0800, Umesh Nerlige Ramappa wrote: > > OA programming sequence for query mode or MI_REPORT_PERF_COUNT requires > > modifying some HW registers in the same hw context as the user exec > > queue. User passes the exec_queue to the OA interface and OA > > implementation submits an MI_LOAD_REGISTER_IMM to this queue to modify > > the registers. > > > > The OA implementation submits a batch mapped in GGTT to the user exec > > queue and hence, some plumbing is added into relevant code to enable > > that (as per suggestions from Matthew Brost). > > > > v2: review rework > > v3: > > - review rework > > - original patches squashed for porting to stable > > - code cleanup > > > > v4: > > - review rework/fixes > > Got this oops with this version: > > [ 176.066578] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0 > [ 176.068577] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0 > [ 176.072629] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0 > [ 176.078117] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0 > [ 176.081285] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0 > [ 176.093564] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0 > [ 176.102886] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0 > [ 194.119229] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ba3: 0000 [#1] PREEMPT SMP > [ 194.130187] CPU: 3 UID: 1000 PID: 2240 Comm: ReplayManager Not tainted 6.13.0-rc3-zeh-xe+ #1454 > [ 194.138931] Hardware name: Intel Corporation Lunar Lake Client Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.3152.D83.2404190622 04/19/2024 > [ 194.151258] RIP: 0010:xe_sync_entry_add_deps+0x1c/0x60 [xe] > [ 194.157013] Code: c7 43 18 f4 ff ff ff e9 9b fe ff ff 66 90 55 53 48 8b 5f 08 48 85 db 75 05 31 c0 5b 5d c3 48 89 f5 48 8d 7b 38 b8 01 00 00 00 > 0f c1 43 38 85 c0 74 20 8d 50 01 09 c2 78 0d 48 89 de 48 89 ef > [ 194.175863] RSP: 0018:ffffc90001f93de8 EFLAGS: 00010202 > [ 194.181136] RAX: 0000000000000001 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 > [ 194.188331] RDX: ffff88815ee8edc0 RSI: ffff88814ebb0840 RDI: 6b6b6b6b6b6b6ba3 > [ 194.195520] RBP: ffff88814ebb0840 R08: 0000000000000001 R09: 0000000000000000 > [ 194.202707] R10: 0000000000000001 R11: 0000000000000003 R12: ffff88814ebb0840 > [ 194.209889] R13: ffff8881457f9900 R14: ffff888173075800 R15: 0000000000000000 > [ 194.217071] FS: 00007f6c80db9640(0000) GS:ffff88885e580000(0000) knlGS:0000000000000000 > [ 194.225216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 194.231014] CR2: 00007f6bdb33a000 CR3: 0000000144f44001 CR4: 0000000000772ef0 > [ 194.238201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 194.245386] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 > [ 194.252575] PKRU: 55555554 > [ 194.255315] Call Trace: > [ 194.257794] > [ 194.259932] ? __die_body.cold+0x19/0x21 > [ 194.263899] ? die_addr+0x33/0x50 > [ 194.267256] ? exc_general_protection+0x19e/0x450 > [ 194.272002] ? asm_exc_general_protection+0x22/0x30 > [ 194.276930] ? xe_sync_entry_add_deps+0x1c/0x60 [xe] Looks related to this inadvertent change I noticed yesterday and pointed out in the thread: >> static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *reg_lri) >> { >> ... >> - fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_NO_DEPS, bb); >> + fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_ADD_DEPS, bb); > > This looks like a copy-paste error, could you please change this back to > XE_OA_SUBMIT_NO_DEPS as it used to be. Sorry you ran into this. We'll fix this and ask your help to test again. > [ 194.282052] xe_oa_submit_bb.constprop.0+0x9d/0x1c0 [xe] > [ 194.287517] xe_oa_load_with_lri.constprop.0+0xc4/0x130 [xe] > [ 194.293313] xe_oa_configure_oa_context+0x1fd/0x210 [xe] > [ 194.298770] xe_oa_disable_metric_set+0x4b/0xc0 [xe] > [ 194.303857] xe_oa_stream_destroy+0x3a/0x140 [xe] > [ 194.308698] xe_oa_release+0x3a/0xe0 [xe] > [ 194.312833] __fput+0xee/0x2a0 > [ 194.315934] __x64_sys_close+0x49/0xb0 > [ 194.319722] do_syscall_64+0x64/0x130 > [ 194.323417] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > [ 194.328511] RIP: 0033:0x7f6ca8b14f8b Thanks. -- Ashutosh