Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH i-g-t] lib/intel_device_info: Use per-thread device info cache
@ 2024-11-20 22:35 Matt Roper
  2024-11-21  0:30 ` ✗ GitLab.Pipeline: warning for " Patchwork
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Matt Roper @ 2024-11-20 22:35 UTC (permalink / raw)
  To: igt-dev; +Cc: matthew.d.roper

Relying on static variables to act as a cache inside the device info
lookup function is racy for multi-threaded tests since there's only one
copy of the static variables shared by all threads.  E.g.,

    Thread 1                Thread 2
    ========                ========
    cached_devid = devid;
                            if (cached_devid == devid)
                            return cache;
    cache = ...

Indeed, running xe_exec_threads repeatedly turns up cases where the test
fails with "Platform is missing PAT settings for uc/wt/wb" because a
racing thread wound up getting 'intel_generic_info' rather than a proper
device info structure (although it can take a couple thousand tries to
hit this specific race condition).  CI also sees this from time to time,
but the bugs often get closed as "cannot reproduce" because the failure
rate is so low.

We're probably going to be reworking this code completely in the
not-too-distant future since we really need to be looking things up from
the GMD_ID query rather than using hardcoded device ID table lookups,
but for now we can apply a quick fix of just making 'cached_devid' and
'cache' thread-local so that we're not racing on a  shared copy.  Simply
swapping the order of the assignment of 'cache' and 'cached_devid'
probably would have also worked around the specific race condition noted
above, but it would have left tests susceptible to other race conditions
in cases where a multi-threaded test is actually looking up different
device IDs / GPUs in its multiple threads.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1600
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1535
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3236
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 lib/intel_device_info.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/intel_device_info.c b/lib/intel_device_info.c
index 546b9c65a..d38dc415d 100644
--- a/lib/intel_device_info.c
+++ b/lib/intel_device_info.c
@@ -658,8 +658,8 @@ static const struct pci_id_match intel_device_match[] = {
  */
 const struct intel_device_info *intel_get_device_info(uint16_t devid)
 {
-	static const struct intel_device_info *cache = &intel_generic_info;
-	static uint16_t cached_devid;
+	static __thread const struct intel_device_info *cache = &intel_generic_info;
+	static __thread uint16_t cached_devid;
 	int i;
 
 	if (cached_devid == devid)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-11-21 18:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-20 22:35 [PATCH i-g-t] lib/intel_device_info: Use per-thread device info cache Matt Roper
2024-11-21  0:30 ` ✗ GitLab.Pipeline: warning for " Patchwork
2024-11-21  0:49 ` ✗ Fi.CI.BAT: failure " Patchwork
2024-11-21 18:24   ` Matt Roper
2024-11-21 11:47 ` ✗ Xe.CI.Full: " Patchwork
2024-11-21 18:36   ` Matt Roper
2024-11-21 17:13 ` [PATCH i-g-t] " Zbigniew Kempczyński

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox