From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5E2EBC4345F for ; Mon, 29 Apr 2024 21:19:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1FA6910E651; Mon, 29 Apr 2024 21:19:09 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="KlFTPDJx"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id EDE0810E392; Mon, 29 Apr 2024 21:19:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1714425548; x=1745961548; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=GDlV2jxTfSdygwd7f1NNUDctpUkMo7rplw2hJwUWVDI=; b=KlFTPDJxlk8KgXen4kHrSoBG8IoBx70OZUicN6JEiQGXobjtXrI0CygS 53FESyEG+5qX68uKtmoRrWqUE/yR2hl9hYa50JAASD5CTEXpFPX/zgmo4 lziZnjEvTMnnU/mXtEtES/lRGnZyY9Rb6KkESPVI5/SoG6RZ2UwyHt76F WILzab1oT8/dfF35LxKr/b5y6qRujeh+5Q+d3kePQXPeyq1R3OJAxgD5R MQkc0Y018cHqGZSNmXbhmUHhfDChKsZTTb4T6nFPSqvcb19BH3oWBZT5w 3obLrUZAMT1K4Q+NvixfepS8tEQRT+pHydIhwG9megbGzDBskuw/cmFm3 A==; X-CSE-ConnectionGUID: 0HfCEqzMQsmaNSt8Y9GAiw== X-CSE-MsgGUID: B/SQzAIiRg+RV0jH3MtFCQ== X-IronPort-AV: E=McAfee;i="6600,9927,11059"; a="13887011" X-IronPort-AV: E=Sophos;i="6.07,240,1708416000"; d="scan'208";a="13887011" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2024 14:19:07 -0700 X-CSE-ConnectionGUID: mbCW/8IuSRu3DCUoJLPqnA== X-CSE-MsgGUID: pG/J5qolRL+m+bSMGiZ21w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,240,1708416000"; d="scan'208";a="26216726" Received: from orsosgc001.jf.intel.com (HELO orsosgc001.intel.com) ([10.165.21.138]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2024 14:19:07 -0700 Date: Mon, 29 Apr 2024 14:19:06 -0700 Message-ID: <85plu7yiid.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Rodrigo Vivi Cc: intel-xe@lists.freedesktop.org, Matthew Auld , intel-gfx@lists.freedesktop.org, Francois Dugast Subject: Re: [PATCH] drm/xe/xe_ggtt: No need to use xe_pm_runtime_get_noresume In-Reply-To: References: <20240429162915.1831945-1-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-redhat-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Mon, 29 Apr 2024 11:24:27 -0700, Rodrigo Vivi wrote: > > On Mon, Apr 29, 2024 at 09:29:15AM -0700, Ashutosh Dixit wrote: > > Switching from xe_device_mem_access_get/put to xe_pm_runtime_get/put > > results in the following WARNING in xe_oa: > > > > [11614.356168] xe 0000:00:02.0: Missing outer runtime PM protection > > [11614.356187] WARNING: CPU: 1 PID: 13075 at drivers/gpu/drm/xe/xe_pm.c:549 xe_pm_runtime_get_noresume+0x60/0x80 [xe] > > ... > > [11614.356377] Call Trace: > > [11614.356379] > > [11614.356381] ? __warn+0x7e/0x180 > > [11614.356387] ? xe_pm_runtime_get_noresume+0x60/0x80 [xe] > > [11614.356507] xe_ggtt_remove_node+0x22/0x80 [xe] > > [11614.356546] xe_ttm_bo_destroy+0xea/0xf0 [xe] > > [11614.356579] xe_oa_stream_destroy+0xf7/0x120 [xe] > > [11614.356627] xe_oa_release+0x35/0xc0 [xe] > > [11614.356673] __fput+0xa1/0x2d0 > > [11614.356679] __x64_sys_close+0x37/0x80 > > [11614.356697] do_syscall_64+0x6d/0x140 > > [11614.356700] entry_SYSCALL_64_after_hwframe+0x71/0x79 > > [11614.356702] RIP: 0033:0x7f2b37314f67 > > > > There seems to be no reason to use xe_pm_runtime_get_noresume in xe_ggtt > > functions. Just use xe_pm_runtime_get. > > > > Signed-off-by: Ashutosh Dixit > > --- > > drivers/gpu/drm/xe/xe_ggtt.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c > > index 0d541f55b4fc..8548a2eb3b32 100644 > > --- a/drivers/gpu/drm/xe/xe_ggtt.c > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c > > @@ -404,7 +404,7 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo, > > if (err) > > return err; > > > > - xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile)); > > + xe_pm_runtime_get(tile_to_xe(ggtt->tile)); > > mutex_lock(&ggtt->lock); > > err = drm_mm_insert_node_in_range(&ggtt->mm, &bo->ggtt_node, bo->size, > > alignment, 0, start, end, 0); > > @@ -433,7 +433,7 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo) > > void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node, > > bool invalidate) > > { > > - xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile)); > > + xe_pm_runtime_get(tile_to_xe(ggtt->tile)); > > we cannot do this as this place gets called from locked places. > This is a deadlock risk. > We need to ensure to have an outer caller of the xe_pm_runtime_get that will > ensure to get the device waked first, then then we continue with the _noresume > variant here that only ensures that we have an extra reference. > > These warnings are indeed poping up in multiple places, and this is a good > thing since we killed the mem_access... at least now we know and have a > backtrace of the places that are putting our device at risk of deadlock > and can use this information to now find the right outer place protections. > > https://gitlab.freedesktop.org/drm/xe/kernel/issues/1705 OK Rodrigo, thanks for the explanation. I wasn't sure, so I thought I'll send the patch. Anyway, I'll add an outer call for xe_pm_runtime_get. Thanks. > > > > > mutex_lock(&ggtt->lock); > > xe_ggtt_clear(ggtt, node->start, node->size); > > -- > > 2.41.0 > >