From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C4D97CDB470 for ; Tue, 23 Jun 2026 20:20:51 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 35C8410EC91; Tue, 23 Jun 2026 20:20:51 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="hET9RZbT"; dkim-atps=neutral Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id BED9810EC8E for ; Tue, 23 Jun 2026 20:20:49 +0000 (UTC) Received: from smtp.kernel.org (quasi.space.kernel.org [100.103.45.18]) by sea.source.kernel.org (Postfix) with ESMTP id 8856C4091C; Tue, 23 Jun 2026 20:20:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4ED321F000E9; Tue, 23 Jun 2026 20:20:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782246049; bh=c8nseoNIV5gZ3hdWNPvbbFq+a0wMTbH76ovYSYd8KH4=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=hET9RZbTzGsltX/blv+JB6AW/VzZlX/qZjqQOTxGS3bNq+PoOj9ZK6kIF4fmz6ip8 lWl1rxAtuWDFcSjuEvispZ0dbRYv5fiXsAgtYQfFWOuTJVHiVNscQgvkAoORsIGH/V rLpDMyKkeocYR3PvtP6CSV9hmY4iCn4okqdaZas/nr6OTRA+PfhmLJwknjXSB2sR+M U6OzDApZTGfCtVcXqSZK4nLikl6UCvq9vD2ZgcN3okMNBeKml2+4SNe62a6VqAB5bK PNgYPjv/ksVp89iZI2Rob3gzcTcUXUhZBBcyjvxwQfe2ameyNnbEYOXVPswC/yQhoE FKirUBFJpJtAQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH] drm/virtio: warn when virtqueue has no free space for too long To: "Ryosuke Yasuoka" Cc: dri-devel@lists.freedesktop.org In-Reply-To: <20260618-virtiogpu_add_timeout-v1-1-dc36cef609d9@redhat.com> References: <20260618-virtiogpu_add_timeout-v1-1-dc36cef609d9@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 23 Jun 2026 20:20:48 +0000 Message-Id: <20260623202049.4ED321F000E9@smtp.kernel.org> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: sashiko-reviews@lists.linux.dev Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: New issues: - [High] Replacing `wait_event()` with a 5-second `wait_event_timeout()` lo= op defeats the kernel's hung task detector for this wait, breaking automate= d recovery mechanisms. - [Medium] The `wait_event_timeout()` loop will generate severe log spam if= multiple userspace threads block concurrently. Pre-existing issues: - [High] The `virtio_gpu_object_array` (`vbuf->objs`) and its contained GEM= objects are leaked on error paths where `virtio_gpu_queue_ctrl_sgs()` fail= s to queue the buffer. - [High] The virtio-gpu driver sleeps in plane atomic update callbacks, vio= lating DRM subsystem guidelines against sleeping in atomic context. -- commit 736ae781cca53398969e63f0d5a23d6e3f738db0 Author: Ryosuke Yasuoka drm/virtio: warn when virtqueue has no free space for too long This commit replaces the bare wait_event() in virtqueue command submission with a wait_event_timeout() loop. The loop prints a warning every 5 seconds while the virtqueue remains full to help identify unresponsive host devices. > diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio= /virtgpu_vq.c > index 68d097ad9d1d0..a546130d3b6a5 100644 > --- a/drivers/gpu/drm/virtio/virtgpu_vq.c > +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c > @@ -410,8 +410,13 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_g= pu_device *vgdev, > if (vq->num_free < elemcnt) { > spin_unlock(&vgdev->ctrlq.qlock); > virtio_gpu_notify(vgdev); > - wait_event(vgdev->ctrlq.ack_queue, > - vq->num_free >=3D elemcnt || vgdev->vqs_released); > + while (!wait_event_timeout(vgdev->ctrlq.ack_queue, > + vq->num_free >=3D elemcnt || > + vgdev->vqs_released, > + 5 * HZ) && !vgdev->vqs_released) [Severity: High] Does this 5-second polling loop evade the kernel's hung task detector? The kernel's khungtaskd detects tasks stuck in TASK_UNINTERRUPTIBLE by verifying their context switch count does not change over the configured timeout. By waking the task up every 5 seconds, it will continually context switch, causing khungtaskd to see a changing context switch count. Will this prevent the system from ever flagging these tasks as hung or triggering automatic recovery panics? > + DRM_WARN("ctrlq waiting for host: no free space for %d secs\n", > + 5); [Severity: Medium] Could this cause severe kernel log spam if multiple userspace threads are blocked concurrently? Since the 5-second timeout and warning are local to each thread's wait loop, if multiple threads attempt to submit commands while the host is unresponsi= ve, will each thread emit a DRM_WARN message every 5 seconds? > + > /* > * Set by virtio_gpu_release_vqs() to unblock > * synchronize_srcu() wait in drm_dev_unplug(). [Severity: High] This is a pre-existing issue, but does this error path permanently leak the virtio_gpu_object_array and its pinned GEM objects? When vgdev->vqs_released is true, the code below this comment calls free_vbuf(vgdev, vbuf) and returns -ENODEV. However, free_vbuf() only frees the buffer and does not unreference or free vbuf->objs. Since the buffer was never queued, the completion callback that normally cleans up vbuf->objs wi= ll never run. [ ... ] > @@ -592,8 +597,13 @@ static void virtio_gpu_queue_cursor(struct virtio_gp= u_device *vgdev, > ret =3D virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC); > if (ret =3D=3D -ENOSPC) { > spin_unlock(&vgdev->cursorq.qlock); > - wait_event(vgdev->cursorq.ack_queue, > - vq->num_free >=3D outcnt || vgdev->vqs_released); > + while (!wait_event_timeout(vgdev->cursorq.ack_queue, > + vq->num_free >=3D outcnt || > + vgdev->vqs_released, > + 5 * HZ) && !vgdev->vqs_released) [Severity: High] This isn't a bug introduced by this patch, but does waiting here violate DRM subsystem atomic context rules? This function is reached via the plane atomic update callback: virtio_gpu_cursor_plane_update() -> virtio_gpu_cursor_ping() -> virtio_gpu_queue_cursor() DRM plane atomic update callbacks execute in atomic context where sleeping functions are forbidden. Using wait_event_timeout() (or previously wait_event()) sets the task state to TASK_UNINTERRUPTIBLE and schedules out. Could this scheduling while atomic cause system instability? > + DRM_WARN("cursorq waiting for host: no free space for %d secs\n", > + 5); > + > /* See comment in virtio_gpu_queue_ctrl_sgs(). */ > if (vgdev->vqs_released) { > free_vbuf(vgdev, vbuf); --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260618-virtiogpu_= add_timeout-v1-1-dc36cef609d9@redhat.com?part=3D1