From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 63B1CC433EF for ; Mon, 30 May 2022 12:05:50 +0000 (UTC) Received: from localhost ([::1]:59268 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nve9d-0007WA-C4 for qemu-devel@archiver.kernel.org; Mon, 30 May 2022 08:05:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53548) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nvdZT-0000we-A9 for qemu-devel@nongnu.org; Mon, 30 May 2022 07:28:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:57364) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nvdZR-0001Po-GN for qemu-devel@nongnu.org; Mon, 30 May 2022 07:28:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1653910103; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=w0sGM+4gQRLpD1Tt+Rl+5miRxSOPP0CiBfk8pHqhUOY=; b=B+9Sc1BKaUkc9ojk0jHzj1CmBi0l9NteaIUnSIiiTFGPcvSF9smyMi0JM0Q03Cd7sUlPXe PiHAyPiBylPrcrHVD7gqqF+a6ReY43eHw7OmZSev7vLoklhD6J8aZ2DPEk6NHguiy+PNI7 DQl8jmvC/7borVlMZ2sdjxpellwQ5Qk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-8-evB_h3u8OFyODWHji7P9cw-1; Mon, 30 May 2022 07:28:19 -0400 X-MC-Unique: evB_h3u8OFyODWHji7P9cw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EB41529AA3BE; Mon, 30 May 2022 11:28:18 +0000 (UTC) Received: from blackfin.pond.sub.org (unknown [10.36.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A46B540EC003; Mon, 30 May 2022 11:28:18 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 9D1DB21E688C; Mon, 30 May 2022 13:28:17 +0200 (CEST) From: Markus Armbruster To: Roman Kagan Cc: Konstantin Khlebnikov , qemu-devel@nongnu.org, yc-core@yandex-team.ru, Paolo Bonzini , Daniel P. =?utf-8?Q?Berrang=C3=A9?= , Eduardo Habkost , Eric Blake Subject: Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event References: <165296995578.196133.16183155555450040914.stgit@buzz> <87zgj5hog8.fsf@pond.sub.org> Date: Mon, 30 May 2022 13:28:17 +0200 In-Reply-To: (Roman Kagan's message of "Fri, 27 May 2022 15:49:40 +0300") Message-ID: <87sforb6pa.fsf@pond.sub.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Received-SPF: pass client-ip=170.10.133.124; envelope-from=armbru@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Roman Kagan writes: > On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote: >> Konstantin Khlebnikov writes: >> >> > This event represents device runtime errors to give time and >> > reason why device is broken. >> >> Can you give an or more examples of the "device runtime errors" you have >> in mind? > > Initially we wanted to address a situation when a vhost device > discovered an inconsistency during virtqueue processing and silently > stopped the virtqueue. This resulted in device stall (partial for > multiqueue devices) and we were the last to notice that. > > The solution appeared to be to employ errfd and, upon receiving a > notification through it, to emit a QMP event which is actionable in the > management layer or further up the stack. > > Then we observed that virtio (non-vhost) devices suffer from the same > issue: they only log the error but don't signal it to the management > layer. The case was very similar so we thought it would make sense to > share the infrastructure and the QMP event between virtio and vhost. > > Then Konstantin went a bit further and generalized the concept into > generic "device runtime error". I'm personally not completely convinced > this generalization is appropriate here; we'd appreciate the opinions > from the community on the matter. "Device emulation sending an even on entering certain error states, so that a management application can do something about it" feels reasonable enough to me as a general concept. The key point is of course "can do something": the event needs to be actionable. Can you describe possible actions for the cases you implement? Once we all have a better idea of the event's purpose, usage, and limitations, we should revisit its documentation.