From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C5AEAEB7ECF for ; Wed, 4 Mar 2026 12:29:39 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vxlLt-0001su-Ti; Wed, 04 Mar 2026 07:29:21 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vxlLs-0001sO-FP for qemu-devel@nongnu.org; Wed, 04 Mar 2026 07:29:20 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vxlLn-0002hv-Sm for qemu-devel@nongnu.org; Wed, 04 Mar 2026 07:29:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1772627353; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=uUO9apoCTG6e/J+8/LyzATgtyfuwsHJ40yHagC1ufac=; b=ZtsHOhQHOT/XGtniN/dCJtE1qf/nucARfGyMLwL6f5nCR033amNCIbx7nHIYfMRmVDn8eS MqJa/NbFF2KVi6R2eDlFAnrHc5bxLK896DUSUTsRD+muTtpH3xzM7GBoAnmz8ETsC/tDeI +3Lqn+/EHRq8a46IGzjvwjSJy41JvH0= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-330-El0SvASCNzaITr0Js0zjBQ-1; Wed, 04 Mar 2026 07:29:11 -0500 X-MC-Unique: El0SvASCNzaITr0Js0zjBQ-1 X-Mimecast-MFC-AGG-ID: El0SvASCNzaITr0Js0zjBQ_1772627350 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B762D18AB316; Wed, 4 Mar 2026 12:28:22 +0000 (UTC) Received: from merkur.fritz.box (unknown [10.44.32.201]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D9564300019F; Wed, 4 Mar 2026 12:28:19 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Cc: kwolf@redhat.com, hreitz@redhat.com, xeor@yandex-team.ru, vsementsov@yandex-team.ru, pkrempa@redhat.com, qemu-devel@nongnu.org, qemu-stable@nongnu.org Subject: [PATCH v2] block: Never drop BLOCK_IO_ERROR with action=stop for rate limiting Date: Wed, 4 Mar 2026 13:28:00 +0100 Message-ID: <20260304122800.51923-1-kwolf@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Received-SPF: pass client-ip=170.10.133.124; envelope-from=kwolf@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: 33 X-Spam_score: 3.3 X-Spam_bar: +++ X-Spam_report: (3.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_SBL_CSS=3.335, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.703, RCVD_IN_VALIDITY_SAFE_BLOCKED=1.386, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Commit 2155d2dd introduced rate limiting for BLOCK_IO_ERROR to emit an event only once a second. This makes sense for cases in which the guest keeps running and can submit more requests that would possibly also fail because there is a problem with the backend. However, if the error policy is configured so that the VM is stopped on errors, this is both unnecessary because stopping the VM means that the guest can't issue more requests and in fact harmful because stopping the VM is an important state change that management tools need to keep track of even if it happens more than once in a given second. If an event is dropped, the management tool would see a VM randomly going to paused state without an associated error, so it has a hard time deciding how to handle the situation. This patch disables rate limiting for action=stop by not relying on the event type alone any more in monitor_qapi_event_queue_no_reenter(), but checking action for BLOCK_IO_ERROR, too. If the error is reported to the guest or ignored, the rate limiting stays in place. Fixes: 2155d2dd7f73 ('block-backend: per-device throttling of BLOCK_IO_ERROR reports') Signed-off-by: Kevin Wolf --- qapi/block-core.json | 2 +- monitor/monitor.c | 21 ++++++++++++++++++++- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index b66bf316e2f..da0b36a3751 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5794,7 +5794,7 @@ # .. note:: If action is "stop", a `STOP` event will eventually follow # the `BLOCK_IO_ERROR` event. # -# .. note:: This event is rate-limited. +# .. note:: This event is rate-limited, except if action is "stop". # # Since: 0.13 # diff --git a/monitor/monitor.c b/monitor/monitor.c index 1273eb72605..37fa674cfe6 100644 --- a/monitor/monitor.c +++ b/monitor/monitor.c @@ -367,14 +367,33 @@ monitor_qapi_event_queue_no_reenter(QAPIEvent event, QDict *qdict) { MonitorQAPIEventConf *evconf; MonitorQAPIEventState *evstate; + bool throttled; assert(event < QAPI_EVENT__MAX); evconf = &monitor_qapi_event_conf[event]; trace_monitor_protocol_event_queue(event, qdict, evconf->rate); + throttled = evconf->rate; + + /* + * Rate limit BLOCK_IO_ERROR only for action != "stop". + * + * If the VM is stopped after an I/O error, this is important information + * for the management tool to keep track of the state of QEMU and we can't + * merge any events. At the same time, stopping the VM means that the guest + * can't send additional requests and the number of events is already + * limited, so we can do without rate limiting. + */ + if (event == QAPI_EVENT_BLOCK_IO_ERROR) { + QDict *data = qobject_to(QDict, qdict_get(qdict, "data")); + const char *action = qdict_get_str(data, "action"); + if (!strcmp(action, "stop")) { + throttled = false; + } + } QEMU_LOCK_GUARD(&monitor_lock); - if (!evconf->rate) { + if (!throttled) { /* Unthrottled event */ monitor_qapi_event_emit(event, qdict); } else { -- 2.53.0