From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B098A23C4F9 for ; Wed, 21 May 2025 09:19:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747819167; cv=none; b=lentTn4/WGnfENoT4o8q6npaLZHZlIsWP1mIK8gmgPWh1CaGeGR6HCQ/TMk0MJr9IKfa6YpHio8573qCr2UHPn2rfJmXl5GwcMDEKf2qmJ0ILcb//lxnZvO+6XUgQ8VvIyiInInykvWbwBfU5kR+HJYba9PPHCKdiOdKu/EllO0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747819167; c=relaxed/simple; bh=B7wXPyXUP2w07U8Up6D4wr9emBg7GtP6vgGU4Rn9vb4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=LYuAI96l8qbuw0B0WOWheCdJjCsu+o3baJY3HRfef40m3/8wx3a9Hfzd1kh/sxgGnnaTx+iIvXdUt4r7tClcCSJ6tcC53exvSlLu9If5V47YSNXpKjHMgcK6i5KwL8EwlqbY4PM/qPIv4JcaXd+LrEMdPGypwDZ/G6MTyDjp4Fk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=IewRUBxP; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IewRUBxP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1747819164; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z/o09y9SC9Ywr/vW/q2zNIFJBJF7X0kmooxajrWqv/s=; b=IewRUBxPcrP8Z2LV9yz7vevwl7FMJXgYSNWgfs9L/wJWmF32yBcyWfbAl/rS5cRL0KdsgM 9WWLZG2j20Fk8XTi3yTi+KFHf6fWFMfsLgEFx5kp4LKCwE4m3h/UTE2VNDNKPCpLgNZbCn Vo+zFxjCtuAyeqSDB6xic7lvQZSBq8U= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-206-aoh3WExKO3ySvf0nbXV6pw-1; Wed, 21 May 2025 05:19:23 -0400 X-MC-Unique: aoh3WExKO3ySvf0nbXV6pw-1 X-Mimecast-MFC-AGG-ID: aoh3WExKO3ySvf0nbXV6pw_1747819162 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3a371fb826cso1478213f8f.1 for ; Wed, 21 May 2025 02:19:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747819162; x=1748423962; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Z/o09y9SC9Ywr/vW/q2zNIFJBJF7X0kmooxajrWqv/s=; b=VQrDllBOr26OUJaEK4BuXn2i5Q0fTrrz9LNwPS2n3dJUXGvRJ3NCfg9vK0fV3RK1wz 4WIjqeVSkQ26256G7Hs7meGqzrgJwyDghWu/msVhVNI5tty4fGxNBrURAw7RFtyj7+zf 2g6JMjPv2Yb1kt23FmqXjmU0bJZuK5L9RGbKYoVkSPE22q/s9e3DAUnFZuRlUwFVkCEU jVvGmmrhKOqrF/e5vYV4fpzg2SUoACpOAXLSNWWy45xmMuJtS66WzGTj0Z32YK4DGOoL sK/nAOf4GDksKnwcMNZJnqPsWwdBX1yYAsn0eFoc2WD0YVIqjYcf3cXtvZw/S2EN8tV4 f09w== X-Forwarded-Encrypted: i=1; AJvYcCVQjtqKGqyRJA1Wf2rwioQ/E4cU4CCBo9vM/BSL+zJHAIzbcdV8Cj7PVKDQhocrPW8wyxHiV98=@vger.kernel.org X-Gm-Message-State: AOJu0Yz7XrfhTblwB1mxk0QxibvT+kIGm7rQ6E1e2qftBHrG0eGIKDuk 7gff89E8ld3Qk4YqvlvHP2FrAJ7F1/NyrN6QOvxllODyK4Pu8R7MotnQoZ22S/m+4o4E/GiNasd p7o14jdIcl8x69z+oLlrpeOieuDT0TxEAU39oKKjEpaeHTDTfQDEd4Q/2gQ== X-Gm-Gg: ASbGncuNDOxjMbZBvShGVJA5GG8c3xefBlGVCSytuI7xUWTth/J00x3519l6Bh53n1c FUmKf/KeWpi6wjowc380YDRr+p/tgg/v53BabPtHzIIS5nQ7X4xcjoyps7tp6wsiAE1VnPCL5nL qU4VD1sctIbdhRjYh1avhSwnXA0Q11BI43nqhRhJbRcO6RkKi0B4Ivtl1WPyELJ/fGJu5pwwqoX yRjO7+Wv/W/eU6LzJTit0yTRAYo4uxqJYbFFNCGqrKa6WG9/T8hiagRpIi4B1m+6oJ6dLgcXuam 0Or+QA== X-Received: by 2002:a05:6000:3103:b0:3a3:6415:96c8 with SMTP id ffacd0b85a97d-3a364159712mr14162627f8f.41.1747819161456; Wed, 21 May 2025 02:19:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHLmTcHyyqMuzgg9O5ETbsMsgVAw3A3cvgRdzmCanUIuc6/dYk73VWGritQ97Q4VSUoXhdZCA== X-Received: by 2002:a05:6000:3103:b0:3a3:6415:96c8 with SMTP id ffacd0b85a97d-3a364159712mr14162590f8f.41.1747819161026; Wed, 21 May 2025 02:19:21 -0700 (PDT) Received: from redhat.com ([2a0d:6fc0:1517:1000:ea83:8e5f:3302:3575]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-447f6ffaa6esm63402155e9.16.2025.05.21.02.19.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 May 2025 02:19:20 -0700 (PDT) Date: Wed, 21 May 2025 05:19:17 -0400 From: "Michael S. Tsirkin" To: Parav Pandit Cc: "stefanha@redhat.com" , "axboe@kernel.dk" , "virtualization@lists.linux.dev" , "linux-block@vger.kernel.or" , "stable@vger.kernel.org" , "NBU-Contact-Li Rongqing (EXTERNAL)" , Chaitanya Kulkarni , "xuanzhuo@linux.alibaba.com" , "pbonzini@redhat.com" , "jasowang@redhat.com" , Max Gurtovoy , Israel Rukshin Subject: Re: [PATCH v1] virtio_blk: Fix disk deletion hang on device surprise removal Message-ID: <20250521051556-mutt-send-email-mst@kernel.org> References: <20250521062744.1361774-1-parav@nvidia.com> <20250521041506-mutt-send-email-mst@kernel.org> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, May 21, 2025 at 09:14:31AM +0000, Parav Pandit wrote: > > From: Michael S. Tsirkin > > Sent: Wednesday, May 21, 2025 1:48 PM > > > > On Wed, May 21, 2025 at 06:37:41AM +0000, Parav Pandit wrote: > > > When the PCI device is surprise removed, requests may not complete the > > > device as the VQ is marked as broken. Due to this, the disk deletion > > > hangs. > > > > > > Fix it by aborting the requests when the VQ is broken. > > > > > > With this fix now fio completes swiftly. > > > An alternative of IO timeout has been considered, however when the > > > driver knows about unresponsive block device, swiftly clearing them > > > enables users and upper layers to react quickly. > > > > > > Verified with multiple device unplug iterations with pending requests > > > in virtio used ring and some pending with the device. > > > > > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio > > > pci device") > > > Cc: stable@vger.kernel.org > > > Reported-by: lirongqing@baidu.com > > > Closes: > > > > > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b474 > > > 1@baidu.com/ > > > Reviewed-by: Max Gurtovoy > > > Reviewed-by: Israel Rukshin > > > Signed-off-by: Parav Pandit > > > --- > > > changelog: > > > v0->v1: > > > - Fixed comments from Stefan to rename a cleanup function > > > - Improved logic for handling any outstanding requests > > > in bio layer > > > - improved cancel callback to sync with ongoing done() > > > > thanks for the patch! > > questions: > > > > > > > --- > > > drivers/block/virtio_blk.c | 95 > > > ++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 95 insertions(+) > > > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > > > index 7cffea01d868..5212afdbd3c7 100644 > > > --- a/drivers/block/virtio_blk.c > > > +++ b/drivers/block/virtio_blk.c > > > @@ -435,6 +435,13 @@ static blk_status_t virtio_queue_rq(struct > > blk_mq_hw_ctx *hctx, > > > blk_status_t status; > > > int err; > > > > > > + /* Immediately fail all incoming requests if the vq is broken. > > > + * Once the queue is unquiesced, upper block layer flushes any > > pending > > > + * queued requests; fail them right away. > > > + */ > > > + if (unlikely(virtqueue_is_broken(vblk->vqs[qid].vq))) > > > + return BLK_STS_IOERR; > > > + > > > status = virtblk_prep_rq(hctx, vblk, req, vbr); > > > if (unlikely(status)) > > > return status; > > > > just below this: > > spin_lock_irqsave(&vblk->vqs[qid].lock, flags); > > err = virtblk_add_req(vblk->vqs[qid].vq, vbr); > > if (err) { > > > > > > and virtblk_add_req calls virtqueue_add_sgs, so it will fail on a broken vq. > > > > Why do we need to check it one extra time here? > > > It may work, but for some reason if the hw queue is stopped in this flow, it can hang the IOs flushing. > I considered it risky to rely on the error code ENOSPC returned by non virtio-blk driver. > In other words, if lower layer changed for some reason, we may end up in stopping the hw queue when broken, and requests would hang. > > Compared to that one-time entry check seems more robust. I don't get it. Checking twice in a row is more robust? What am I missing? Can you describe the scenario in more detail? > > > > > > > > @@ -508,6 +515,11 @@ static void virtio_queue_rqs(struct rq_list *rqlist) > > > while ((req = rq_list_pop(rqlist))) { > > > struct virtio_blk_vq *this_vq = get_virtio_blk_vq(req- > > >mq_hctx); > > > > > > + if (unlikely(virtqueue_is_broken(this_vq->vq))) { > > > + rq_list_add_tail(&requeue_list, req); > > > + continue; > > > + } > > > + > > > if (vq && vq != this_vq) > > > virtblk_add_req_batch(vq, &submit_list); > > > vq = this_vq; > > > > similarly > > > The error code is not surfacing up here from virtblk_add_req(). but wait a sec: static void virtblk_add_req_batch(struct virtio_blk_vq *vq, struct rq_list *rqlist) { struct request *req; unsigned long flags; bool kick; spin_lock_irqsave(&vq->lock, flags); while ((req = rq_list_pop(rqlist))) { struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); int err; err = virtblk_add_req(vq->vq, vbr); if (err) { virtblk_unmap_data(req, vbr); virtblk_cleanup_cmd(req); blk_mq_requeue_request(req, true); } } kick = virtqueue_kick_prepare(vq->vq); spin_unlock_irqrestore(&vq->lock, flags); if (kick) virtqueue_notify(vq->vq); } it actually handles the error internally? > It would end up adding checking for special error code here as well to abort by translating broken VQ -> EIO to break the loop in virtblk_add_req_batch(). > > Weighing on specific error code-based data path that may require audit from lower layers now and future, an explicit check of broken in this layer could be better. > > [..] Checking add was successful is preferred because it has to be done *anyway* - device can get broken after you check before add. So I would like to understand why are we also checking explicitly and I do not get it so far.