From: "Hellstrom, Thomas" <thomas.hellstrom@intel.com>
To: "igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>,
"Thomas, Sobin" <sobin.thomas@intel.com>
Cc: "Sharma, Nishit" <nishit.sharma@intel.com>
Subject: Re: [PATCH i-g-t v5] test/intel/xe_vm:Add oversubscribe concurrent bind stress subtest
Date: Mon, 27 Apr 2026 09:07:37 +0000 [thread overview]
Message-ID: <931c9a225b944d772bf1e752ad0c5744265037a6.camel@intel.com> (raw)
In-Reply-To: <20260427024004.662789-1-sobin.thomas@intel.com>
Hi,
Attaching an review from Claude with review-prompts:
Thanks,
Thomas
commit e43b1bbe229b268bc5e5f3e1213e067d4ac8c4b2
Author: Sobin Thomas <sobin.thomas@intel.com>
test/intel/xe_vm:Add oversubscribe concurrent bind stress subtest
Adds a multi-process stress subtest that oversubscribes VRAM, binds
large BOs from each child process simultaneously, and verifies GPU
workloads complete correctly under eviction pressure.
> diff --git a/tests/intel/xe_vm.c b/tests/intel/xe_vm.c
> index d75b0730d..5246a854a 100644
> --- a/tests/intel/xe_vm.c
> +++ b/tests/intel/xe_vm.c
[ ... ]
> +static uint64_t *
> +vm_bind_bo_batch(int fd, struct xe_test_ctx *ctx, struct gem_bo
*bos, int size)
> +{
> + uint64_t *ufence;
> + struct drm_xe_sync bind_sync;
> + struct drm_xe_vm_bind_op binds[size];
> + int i;
> +
> + ufence = calloc(1, sizeof(uint64_t));
> + *ufence = 0;
> + igt_assert(ufence != MAP_FAILED);
Does this dereference ufence before checking it? calloc() returns NULL
on failure, not MAP_FAILED. The dereference (*ufence = 0) happens
before the assert, and the assert checks for the wrong sentinel value.
If calloc() fails, *ufence = 0 crashes before the assert is reached.
And even if the order were swapped, NULL != MAP_FAILED would cause the
assert to pass with a NULL pointer.
Should this be igt_assert(ufence) placed before the first dereference?
> + *ufence = 0;
[ ... ]
> +static void create_test_bos(int fd, struct xe_test_ctx *ctx,
> + struct mem_bind_sync *bind,
> + uint32_t placement, uint64_t *addr)
> +{
> + ret = __xe_bo_create_caching(fd, ctx->vm_id, bo->size,
> + placement, 0,
> +
DRM_XE_GEM_CPU_CACHING_WC,
> + &bo->handle);
> + if (ret == -ENOMEM || ret == -ENOSPC) {
> + bind->n_bufs = i;/* stop creating more */
Can this OOM path ever be reached?
__xe_bo_create_caching() calls ___xe_bo_create() which returns the raw
result of igt_ioctl():
err = igt_ioctl(fd, DRM_IOCTL_XE_GEM_CREATE, &create);
if (err)
return err;
igt_ioctl() is drmIoctl(), which returns -1 on ioctl failure, not
-errno. So ret will be -1 on any allocation failure, never -ENOMEM
(-12) or -ENOSPC (-28).
Compare with ___xe_vm_bind() which does this correctly:
if (igt_ioctl(fd, DRM_IOCTL_XE_VM_BIND, &bind))
return -errno;
With ___xe_bo_create() missing the -errno conversion, the ENOMEM/ENOSPC
check here is dead code. Any BO allocation failure silently continues
the loop with bo->handle left at 0 (from calloc zero-fill), and
bind->n_bufs is never reduced to reflect the actual number of
successfully created BOs.
[ ... ]
> + if (n_vram_bufs)
> + vram_bind.binds_ufence = vm_bind_bo_batch(fd,
&ctx, vram_bufs,
> +
n_vram_bufs);
> +
> + if (n_sram_bufs)
> + sram_bind.binds_ufence = vm_bind_bo_batch(fd,
&ctx, sram_bufs,
> +
n_sram_bufs);
Once the OOM detection in create_test_bos() is fixed, this becomes a
problem. create_test_bos() may reduce vram_bind.n_bufs when allocation
fails partway through, but vm_bind_bo_batch() is called with the
original n_vram_bufs count.
vm_bind_bo_batch() then builds bind ops for BOs that were never
created, with handle=0. xe_vm_bind_array() asserts that the ioctl
succeeds, so a bind with an invalid handle aborts the child process
rather than being handled gracefully.
Should the count argument use vram_bind.n_bufs and sram_bind.n_bufs
after create_test_bos() updates them?
next prev parent reply other threads:[~2026-04-27 9:07 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 2:40 [PATCH i-g-t v5] test/intel/xe_vm:Add oversubscribe concurrent bind stress subtest Sobin Thomas
2026-04-27 3:54 ` ✓ Xe.CI.BAT: success for " Patchwork
2026-04-27 3:54 ` ✓ i915.CI.BAT: " Patchwork
2026-04-27 5:01 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-27 5:41 ` ✗ i915.CI.Full: " Patchwork
2026-04-27 9:07 ` Hellstrom, Thomas [this message]
2026-04-28 8:55 ` [PATCH i-g-t v5] " Kamil Konieczny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=931c9a225b944d772bf1e752ad0c5744265037a6.camel@intel.com \
--to=thomas.hellstrom@intel.com \
--cc=igt-dev@lists.freedesktop.org \
--cc=nishit.sharma@intel.com \
--cc=sobin.thomas@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox