From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7ACFF10E4C9 for ; Thu, 12 Oct 2023 12:11:08 +0000 (UTC) Message-ID: <09ef7bc6-6ab5-9b05-ab6f-9a8eee083c0a@linux.intel.com> Date: Thu, 12 Oct 2023 13:11:03 +0100 MIME-Version: 1.0 Content-Language: en-US To: imre.deak@intel.com References: <20231011084222.226352-1-shawn.c.lee@intel.com> From: Tvrtko Ursulin In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [igt-dev] [PATCH] tests: read engine name again before restore timeout value List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: igt-dev@lists.freedesktop.org, Lee Shawn C Errors-To: igt-dev-bounces@lists.freedesktop.org Sender: "igt-dev" List-ID: On 12/10/2023 12:33, Imre Deak wrote: > On Thu, Oct 12, 2023 at 09:53:44AM +0100, Tvrtko Ursulin wrote: >> >> On 11/10/2023 09:42, Lee Shawn C wrote: >>> We encounter a unexpected error on chrome book device while >>> running this test. The tool will restore GPU engine's timeout >>> value but open incorrect file name (XR24 in below). This is >>> a workaround patch to avoid this problem before we got the >>> root cause. >>> >>> openat(AT_FDCWD, "/sys/dev/char/226:0", O_RDONLY) = 12 >>> openat(12, "dev", O_RDONLY) = 13 >>> read(13, "226:0\n", 1023) = 6 >>> close(13) = 0 >>> openat(12, "engine", O_RDONLY) = 13 >>> close(12) = 0 >>> openat(13, "XR24", O_RDONLY) = -1 ENOENT (No such file or directory) >>> >>> Signed-off-by: Lee Shawn C >>> Issue: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/issues/147 >>> --- >>> tests/intel/kms_busy.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/tests/intel/kms_busy.c b/tests/intel/kms_busy.c >>> index 5b620658fb18..119e6f1652ce 100644 >>> --- a/tests/intel/kms_busy.c >>> +++ b/tests/intel/kms_busy.c >>> @@ -414,9 +414,15 @@ static void gpu_engines_init_timeouts(int fd, int max_engines, >>> } >>> } >>> -static void gpu_engines_restore_timeouts(int fd, int num_engines, const struct gem_engine_properties *props) >>> +static void gpu_engines_restore_timeouts(int fd, int num_engines, struct gem_engine_properties *props) >>> { >>> - int i; >>> + const struct intel_execution_engine2 *e; >>> + int i = 0; >>> + >>> + for_each_physical_engine(fd, e) { >>> + props[i].engine = e; >>> + i++; >>> + } >>> for (i = 0; i < num_engines; i++) >>> gem_engine_properties_restore(fd, &props[i]); >> >> By the look of it bug is in gpu_engines_init_timeouts(). This pointer >> assignment: >> >> for_each_physical_engine(fd, e) { >> igt_assert(*num_engines < max_engines); >> >> props[*num_engines].engine = e; >> >> ^^^ e is on stack, in scope of for_each_physical_engine, so by the time >> gpu_engines_restore_timeouts() runs it can legitimately point to garbage, >> like XR24 in your example. >> >> Your workaround works, although strictly don't think the order of engines is >> guaranteed. Which is also moot since same preempt_timeout and >> hearbeat_interval is used for all. >> >> Nevertheless, proper fix would be to allocate a make a copy of each engine >> and store a pointer to that. It might be an overkill but, up for discussion >> I guess. >> >> Fixes: 9e635a1c5029 ("tests/kms_busy: Ensure GPU reset when waiting for a >> new FB during modeset") >> >> So I'll be cheeky and add Imre and Juha-Pekka too. > > ugh, thanks for catching this. > > Would it work to save the engine class/instance instead in > gpu_engines_init_timeouts(), and look up the engines using these in > gpu_engines_restore_timeouts() ? Not sure exactly what you have in mind. Modify struct gem_engine_properties to not store the pointer to the engine? But e->name is what it needs to restore. Storing class:instance and then on restore iterate all engines again to find the class:instance and use the name from local copy? Hm yes, that would work. Also, on a deeper look gem_exec_capture also appears has the same bug. find_first_available_engine for_each_ctx_engine configure_hangs props.engine = e; And i915_hangman AFAICT. Unless I am super confused.. I tried running it under Valgrind but it is not detecting anything which I guess is because it is stack and not heap. Hm maybe more elegant is to change the struct to: struct gem_engine_properties { - const struct intel_execution_engine2 *engine; + const struct intel_execution_engine2 engine; int preempt_timeout; int heartbeat_interval; }; So instead of storing a pointer a copy is made, which will include a copy of the name. (Since it is embedded in struct intel_execution_engine2.) Then places which record engines would just need to: - saved_params[num_engines].engine = e; + saved_params[num_engines].engine = *e; No further churn then, I think.. Regards, Tvrtko