From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94EC0C4332F for ; Mon, 28 Nov 2022 18:11:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232845AbiK1SLR (ORCPT ); Mon, 28 Nov 2022 13:11:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234043AbiK1SK5 (ORCPT ); Mon, 28 Nov 2022 13:10:57 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 556B85EFA2 for ; Mon, 28 Nov 2022 09:54:02 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 348256133B for ; Mon, 28 Nov 2022 17:53:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 97D59C433D7; Mon, 28 Nov 2022 17:53:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669658038; bh=EhsVfBecwSf4r2sfundpSyy+AjgbHDp9TZz1guKHo/8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=FpUhDf42GBY2vTM9bWaDxagJ6XbN2xZoezoyn9xk2CfLZtxXv4lJfg2EKbf17AWWd n+cgMeKDNjPKWyYPm85ndL5HFBd9yzJpFq9HYkU5i9HAfw8mluxRv0t4dOu+jsPbwP lFODMIl33ReFLjRG74ctSZDH5ef+55PQR9AZ+5QuuM7JpeLo0o2YtYgqjH63UN8d2s WDXZN0thIgiLzwI/mvHsoRLLcufyOuyCSWcp+EtEcd0jaC3SwV55pl6w+iCmHaRr5W iOiFnAaFiBVAxfmPKfY2x8N0ynJv9If7T3k4kDVXy5MZ/HYlXabl9a8YwBy5LPEWPT iKTp2sas5VnvQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oziKJ-009AmF-H8; Mon, 28 Nov 2022 17:53:56 +0000 Date: Mon, 28 Nov 2022 17:53:55 +0000 Message-ID: <86h6yjm0cs.wl-maz@kernel.org> From: Marc Zyngier To: Luiz Capitulino Cc: , , Subject: Re: [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances In-Reply-To: References: User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: luizcap@amazon.com, stable@vger.kernel.org, tglx@linutronix.de, lcapitulino@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org On Mon, 28 Nov 2022 17:08:31 +0000, Luiz Capitulino wrote: > > Hi, > > [ Marc, can you help reviewing? Esp. the first patch? ] > > This series of backports from upstream to stable 5.15 and 5.10 fixes an issue > we're seeing on AWS ARM instances where attaching an EBS volume (which is a > nvme device) to the instance after offlining CPUs causes the device to take > several minutes to show up and eventually nvme kworkers and other threads start > getting stuck. > > This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it > on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels. That's because x86 has a very different allocation policy compared to what the ITS does. The x86 vector space is tiny, so vectors are only allocated when required. In your case, that's when the CPUs are onlined. With the ITS, all the vectors are allocated upfront, as this is essentially free. But in the case of managed interrupts, these vectors are now pointing to offline CPUs. The ITS tries to fix that, but doesn't nearly have enough information. And the correct course of action is to keep these interrupts in the shutdown state, which is what the series is doing. > > An easy reproducer is: > > 1. Start an ARM instance with 32 CPUs To satisfy my own curiosity, is that in a guest or bare metal? It shouldn't make any difference, but hey... Anyway, patch #1 looks OK to me, but I haven't tried to dig further into something that is "oh so last year" ;-). Specially as we're rewriting the whole of the MSI stack! FWIW: Acked-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.