From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52440C433B4 for ; Fri, 7 May 2021 17:38:13 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9DEB86146D for ; Fri, 7 May 2021 17:38:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9DEB86146D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Subject:Cc:To: From:Message-ID:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=VsALZ0M04Fcau1dpfUVUG8WaNoGWaPbtb0BoHy9SLc0=; b=WbnUFQR2nUgST3Oo6RunZ/F+b yqs39aGFAksaEbsNLQmqQO4WWVnybWHM52i3j3C/vs4qgs3OWUB1EyX4wwV73BxoEIj7jcRppKvt6 kNZIkr9L9QQxyNE66koFWJfm4y3iJKYnAe7mOyOMEcwhNhukwPpdiG5d6ehRHjEZ7sXag1/IXGxkq rBqL/v5bv9D/fM6AIMB/kinXHDdybPyYi0TbtFTYTEEkI+DTN0s53ZbsPoBJ/iLB8qSKTUK0JgkZP ULvM4j+b5cnBp3o0H4BeAwku/3G0cTZLwzz6R66VgUVa7RzIxskpDC6YmJVkbxdNJrQN/22UXZKs5 cQ4Mb9UcA==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lf4Ow-007gt9-FU; Fri, 07 May 2021 17:36:34 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lf4Ot-007gsy-SV for linux-arm-kernel@desiato.infradead.org; Fri, 07 May 2021 17:36:32 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type:MIME-Version:References: In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=fse/uGuTBHlcdY5iAdajEF3dW9Yna+BNQlbZluw0cLw=; b=VGwPfyQSgZRE6Xwwjq/sadTU7b VP6ajcwJdo84QEO7vSd5D3BjkELYU2F30yed7oPPoCfvMQwiYeIomEtLjoNW87HyT6fDhDK+aPY1U ZZ91uBPOJNBOawTXD6iOAjG0sGabUsFWMpCkV2rluNO9dci0oEH4jZwopHWdIsekcABO5LA3iliv0 fMDsdONJRWL360yB+v33ax0KbpjGenMImm7gU8E7s81YC7d61vAy0NKuAwFe5WqUCN3AjfGlcRmqi c/IrDBBblMs9i2mgyXFuLLUagQlzPLRy7jWqERbpVN91OPVASIknibfIRuLW1YKg/X9EXKlFI1Ndp JUd4IiXQ==; Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lf4Oq-0072oR-W4 for linux-arm-kernel@lists.infradead.org; Fri, 07 May 2021 17:36:30 +0000 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CFD91613ED; Fri, 7 May 2021 17:36:26 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1lf4Ol-00BVKa-Qd; Fri, 07 May 2021 18:36:23 +0100 Date: Fri, 07 May 2021 18:36:23 +0100 Message-ID: <874kfepht4.wl-maz@kernel.org> From: Marc Zyngier To: Zhu Lingshan , Shaokun Zhang Cc: , , , , Alex Williamson , Cornelia Huck , Nianyao Tang , Bjorn Helgaas , Eric Auger , Jason Wang , Michael S. Tsirkin Subject: Re: Question on guest enable msi fail when using GICv4/4.1 In-Reply-To: <878s4qq00u.wl-maz@kernel.org> References: <3a2c66d6-6ca0-8478-d24b-61e8e3241b20@hisilicon.com> <87k0oaq5jf.wl-maz@kernel.org> <878s4qq00u.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: lingshan.zhu@intel.com, zhangshaokun@hisilicon.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, linux-pci@vger.kernel.org, alex.williamson@redhat.com, cohuck@redhat.com, tangnianyao@huawei.com, bhelgaas@google.com, eric.auger@redhat.com, jasowang@redhat.com, mst@redhat.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210507_103629_130970_7BD800DB X-CRM114-Status: GOOD ( 46.08 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, 07 May 2021 12:02:57 +0100, Marc Zyngier wrote: > > On Fri, 07 May 2021 10:58:23 +0100, > Shaokun Zhang wrote: > > > > Hi Marc, > > > > Thanks for your quick reply. > > > > On 2021/5/7 17:03, Marc Zyngier wrote: > > > On Fri, 07 May 2021 06:57:04 +0100, > > > Shaokun Zhang wrote: > > >> > > >> [This letter comes from Nianyao Tang] > > >> > > >> Hi, > > >> > > >> Using GICv4/4.1 and msi capability, guest vf driver requires 3 > > >> vectors and enable msi, will lead to guest stuck. > > > > > > Stuck how? > > > > Guest serial does not response anymore and guest network shutdown. > > > > > > > >> Qemu gets number of interrupts from Multiple Message Capable field > > >> set by guest. This field is aligned to a power of 2(if a function > > >> requires 3 vectors, it initializes it to 2). > > > > > > So I guess this is a MultiMSI device with 4 vectors, right? > > > > > > > Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi. > > > > >> However, guest driver just sends 3 mapi-cmd to vits and 3 ite > > >> entries is recorded in host. Vfio initializes msi interrupts using > > >> the number of interrupts 4 provide by qemu. When it comes to the > > >> 4th msi without ite in vits, in irq_bypass_register_producer, > > >> producer and consumer will __connect fail, due to find_ite fail, and > > >> do not resume guest. > > > > > > Let me rephrase this to check that I understand it: > > > - The device has 4 vectors > > > - The guest only create mappings for 3 of them > > > - VFIO calls kvm_vgic_v4_set_forwarding() for each vector > > > - KVM doesn't have a mapping for the 4th vector and returns an error > > > - VFIO disable this 4th vector > > > > > > Is that correct? If yes, I don't understand why that impacts the guest > > > at all. From what I can see, vfio_msi_set_vector_signal() just prints > > > a message on the console and carries on. > > > > > > > function calls: > > --> vfio_msi_set_vector_signal > > --> irq_bypass_register_producer > > -->__connect > > > > in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding > > and fails to get the 4th mapping. When add_producer fail, it does > > not call cons->start, calls kvm_arch_irq_bypass_start and then > > kvm_arm_resume_guest. > > [+Eric, who wrote the irq_bypass infrastructure.] > > Ah, so the guest is actually paused, not in a livelock situation > (which is how I interpreted "stuck"). > > I think we should handle this case gracefully, as there should be no > expectation that the guest will be using this interrupt. Given that > VFIO seems to be pretty unfazed when a producer fails, I'm temped to > do the same thing and restart the guest. > > Also, __disconnect doesn't care about errors, so why should __connect > have this odd behaviour? > > Can you please try this? It is completely untested (and I think the > del_consumer call is odd, which is why I've also dropped it). > > Eric, what do you think? Adding Zhu, Jason, MST to the party. It all seems to be caused by this commit: commit a979a6aa009f3c99689432e0cdb5402a4463fb88 Author: Zhu Lingshan Date: Fri Jul 31 14:55:33 2020 +0800 irqbypass: do not start cons/prod when failed connect If failed to connect, there is no need to start consumer nor producer. Signed-off-by: Zhu Lingshan Suggested-by: Jason Wang Link: https://lore.kernel.org/r/20200731065533.4144-7-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin Zhu, I'd really like to understand why you think it is OK not to restart consumer and producers when a connection has failed to be established between the two? In the case of KVM/arm64, this results in the guest being forever suspended and never resumed. That's obviously not an acceptable regression, as there is a number of benign reasons for a connect to fail. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel