From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 915C7C388F7 for ; Sat, 31 Oct 2020 04:08:57 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3881A20857 for ; Sat, 31 Oct 2020 04:08:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3881A20857 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=invisiblethingslab.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.16767.41590 (Exim 4.92) (envelope-from ) id 1kYiBl-0007VC-0D; Sat, 31 Oct 2020 04:08:25 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 16767.41590; Sat, 31 Oct 2020 04:08:24 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1kYiBk-0007V5-TW; Sat, 31 Oct 2020 04:08:24 +0000 Received: by outflank-mailman (input) for mailman id 16767; Sat, 31 Oct 2020 04:08:23 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1kYiBj-0007V0-FH for xen-devel@lists.xenproject.org; Sat, 31 Oct 2020 04:08:23 +0000 Received: from out5-smtp.messagingengine.com (unknown [66.111.4.29]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 12f4fab8-4f74-4eff-a87f-b6b8f78a636d; Sat, 31 Oct 2020 04:08:22 +0000 (UTC) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 13BFE5C00FB; Sat, 31 Oct 2020 00:08:22 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Sat, 31 Oct 2020 00:08:22 -0400 Received: from mail-itl (ip5b40aa59.dynamic.kabel-deutschland.de [91.64.170.89]) by mail.messagingengine.com (Postfix) with ESMTPA id D1FB13280063; Sat, 31 Oct 2020 00:08:20 -0400 (EDT) Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1kYiBj-0007V0-FH for xen-devel@lists.xenproject.org; Sat, 31 Oct 2020 04:08:23 +0000 X-Inumbo-ID: 12f4fab8-4f74-4eff-a87f-b6b8f78a636d Received: from out5-smtp.messagingengine.com (unknown [66.111.4.29]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 12f4fab8-4f74-4eff-a87f-b6b8f78a636d; Sat, 31 Oct 2020 04:08:22 +0000 (UTC) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 13BFE5C00FB; Sat, 31 Oct 2020 00:08:22 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Sat, 31 Oct 2020 00:08:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=+PAx6r Uq+oxuTLVOrzZ1M1wnf5y1nOwQIKtU+KCWbVo=; b=CMQhY+hZ+Ve3M/nkA8c9nu fPpOXweVrLGdGXs+7ZzFzpLIg+EGmvrD7s9wB009rERu0I03bSRSNntFJQaWHneC ApjeOtJKea1sVqGU7sr9CoT/ywYtlQwAX7OcYzv+m8MztAFRvS7ogqVXbktkHCrT OCxfFljL+Fh9Q8rCwL/r9tu7vnbf99oVkqthqKs2RxE/N2T64Bb7KPJTv3OrIZdP F/qmsWy61UdKkzJ3Ou053TF9ou5gxEZPAZZzWoUBadqNETuzZAqElhhV1ikoMtlX G3USJ85V5adKWmPNhTM5k47zbTlNwDRH0qhmj+6cN2RdgHAdnigDSLKwZ0IXh6sA == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrleeigdeigecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtjeenucfhrhhomhepfdhmrghrmhgr rhgvkhesihhnvhhishhisghlvghthhhinhhgshhlrggsrdgtohhmfdcuoehmrghrmhgrrh gvkhesihhnvhhishhisghlvghthhhinhhgshhlrggsrdgtohhmqeenucggtffrrghtthgv rhhnpeegieffhefgvedtteelueegtedtkedvkefhhfekgeduheefgfefheelveefiefgvd enucfkphepledurdeigedrudejtddrkeelnecuvehluhhsthgvrhfuihiivgeptdenucfr rghrrghmpehmrghilhhfrhhomhepmhgrrhhmrghrvghksehinhhvihhsihgslhgvthhhih hnghhslhgrsgdrtghomh X-ME-Proxy: Received: from mail-itl (ip5b40aa59.dynamic.kabel-deutschland.de [91.64.170.89]) by mail.messagingengine.com (Postfix) with ESMTPA id D1FB13280063; Sat, 31 Oct 2020 00:08:20 -0400 (EDT) Date: Sat, 31 Oct 2020 05:08:17 +0100 From: "marmarek@invisiblethingslab.com" To: Dario Faggioli Cc: Juergen Gross , "frederic.pierret@qubes-os.org" , "George.Dunlap@citrix.com" , "xen-devel@lists.xenproject.org" , "andrew.cooper3@citrix.com" Subject: Re: Recent upgrade of 4.13 -> 4.14 issue Message-ID: <20201031040817.GG1447@mail-itl> References: <30452e9c-bf27-fce2-cc20-4ce91018a15a@citrix.com> <533ce2f2-f268-a70b-fad7-d8f3f4033209@suse.com> <182a90a89cc02beec9760559799e74572e18ce49.camel@suse.com> <9632dc14-46d5-83c0-7e44-0c3bd4f5154a@qubes-os.org> <20201031025442.GF1447@mail-itl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="TXtjXAp+Y+VkYOQB" Content-Disposition: inline In-Reply-To: --TXtjXAp+Y+VkYOQB Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: Recent upgrade of 4.13 -> 4.14 issue On Sat, Oct 31, 2020 at 04:27:58AM +0100, Dario Faggioli wrote: > On Sat, 2020-10-31 at 03:54 +0100, marmarek@invisiblethingslab.com > wrote: > > On Sat, Oct 31, 2020 at 02:34:32AM +0000, Dario Faggioli wrote: > > (XEN) *** Dumping CPU7 host state: *** > > (XEN) Xen call trace: > > (XEN)=C2=A0=C2=A0=C2=A0 [] R _spin_lock+0x35/0x40 > > (XEN)=C2=A0=C2=A0=C2=A0 [] S on_selected_cpus+0x1d/0x= c0 > > (XEN)=C2=A0=C2=A0=C2=A0 [] S vmx_do_resume+0xba/0x1b0 > > (XEN)=C2=A0=C2=A0=C2=A0 [] S context_switch+0x110/0xa= 60 > > (XEN)=C2=A0=C2=A0=C2=A0 [] S core.c#schedule+0x1aa/0x= 250 > > (XEN)=C2=A0=C2=A0=C2=A0 [] S softirq.c#__do_softirq+0= x5a/0xa0 > > (XEN)=C2=A0=C2=A0=C2=A0 [] S vmx_asm_do_vmentry+0x2b/= 0x30 > >=20 > > And so on, for (almost?) all CPUs. > > Right. So, it seems like a live (I would say) lock. It might happen on > some resource which his shared among domains. And introduced (the > livelock, not the resource or the sharing) in 4.14. >=20 > Just giving a quick look, I see that vmx_do_resume() calls > vmx_clear_vmcs() which calls on_selected_cpus() which takes the > call_lock spinlock. >=20 > And none of these seems to have received much attention recently. >=20 > But this is just a really basic analysis! I've looked at on_selected_cpus() and my understanding is this: 1. take call_lock spinlock 2. set function+args+what cpus to be called in a global "call_data" variable 3. ask CPUs to execute that function (smp_send_call_function_mask() call) 4. wait for all requested CPUs to execute the function, still holding the spinlock 5. only then - release the spinlock So, if any CPU does not execute requested function for any reason, it will keep the call_lock locked forever. I don't see any CPU waiting on step 4, but also I don't see call traces =66rom CPU3 and CPU8 in the log - that's because they are in guest (dom0 here) context, right? I do see "guest state" dumps from them. The only three CPUs that do logged xen call traces and are not waiting on t= hat spin lock are: CPU0: (XEN) Xen call trace: (XEN) [] R vcpu_unblock+0x9/0x50 (XEN) [] S vcpu_kick+0x11/0x60 (XEN) [] S tasklet.c#do_tasklet_work+0x68/0xc0 (XEN) [] S tasklet.c#tasklet_softirq_action+0x39/0x60 (XEN) [] S softirq.c#__do_softirq+0x5a/0xa0 (XEN) [] S vmx_asm_do_vmentry+0x2b/0x30 CPU4: (XEN) Xen call trace: (XEN) [] R set_timer+0x133/0x220 (XEN) [] S credit.c#csched_tick+0/0x3a0 (XEN) [] S timer.c#timer_softirq_action+0x9f/0x300 (XEN) [] S softirq.c#__do_softirq+0x5a/0xa0 (XEN) [] S x86_64/entry.S#process_softirqs+0x6/0x20 CPU14: (XEN) Xen call trace: (XEN) [] R do_softirq+0/0x10 (XEN) [] S x86_64/entry.S#process_softirqs+0x6/0x20 I'm not sure if any of those is related to that spin lock, on_selected_cpus() call, or anything like that... --=20 Best Regards, Marek Marczykowski-G=C3=B3recki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? --TXtjXAp+Y+VkYOQB Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAl+c4zEACgkQ24/THMrX 1yxW9wf/ZPjzMYLiq0CsKNmHRuOGrKJyIwcynFReZ2Fe7UmppKsmw9DF7j15m/kQ mHcS024GreWDMyNuNkgJTMcpaVSpxXa1khFDnBt3Dp3VA9mrdrTRrY8kio2cQkdJ Wt2Vn/dHxAjFCmKsidEBij+3BzVDxkH6vOxT6+XPe4aLOMY4xGTSg8BI0YNi+IT6 C9srC8rHWqgfd4k2DdWX6iNbKlxl591Cshb8Sh0RfIjRFdEALM+PAmhmk6A8x7db We/fh3cwI8UqeRqImVSvgPwdCCLPaTKt20p14roJ194DMtxrWLtH4zoDiWWAS2hi 9RmhiOviKbJQwpyEtkexAGWHPVlcAg== =L++n -----END PGP SIGNATURE----- --TXtjXAp+Y+VkYOQB--