From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CB50C7EE24 for ; Tue, 16 May 2023 09:27:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231644AbjEPJ1e (ORCPT ); Tue, 16 May 2023 05:27:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231833AbjEPJ1K (ORCPT ); Tue, 16 May 2023 05:27:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66B705FE0 for ; Tue, 16 May 2023 02:26:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684229166; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+o/UCn3L1P/wSUbplutGB2edri0j4YA53Agzrtt+lMk=; b=bAioCp5xOKx7LQXlSh8DZa/SK7ggrGpSSEiADP0YNw+a08zQyld8RSFSVoopXmyCnGdJQd FI9lhUJS1yy0gNnLi3xD5BZP9hBZKpgmPJ7XtwuR6H8xHn59EQOTIKK/TfsUBpQbbubDKo /licI8crC/HoTHykssM4erYEVkdeHLA= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-395-R8eo_8tSN9q3EY5OiZ_2bA-1; Tue, 16 May 2023 05:11:36 -0400 X-MC-Unique: R8eo_8tSN9q3EY5OiZ_2bA-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-7579838b0b3so956331685a.3 for ; Tue, 16 May 2023 02:11:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684228296; x=1686820296; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+o/UCn3L1P/wSUbplutGB2edri0j4YA53Agzrtt+lMk=; b=Ioe0XMMsn6mgnWRB/sgJRevAcC2+Q4e+Bb4bUB00F9VIcHShm1xGbE3h0TtZbElC/m BViJgTiex4nWjAKyQVCmBh/t4iH+vxtdT5Iinx8zfSpy/9BR70fsY8wAOjvM5w2EIsrD 82xzr3oMrASKcRose2qOL2uDvjMCrESgTOC1+ANPvhXzCbz9Y1mHuHmr97AfJB9D8Jpr mste0HEYPzWitUvedLgVPGqozcp84JDMqN6FNGZxIV05m2OQLmxpv8gZ1tjVT8KdQ6Wc RxDj2ufFnCNgf6jRqFAsOxG7qvAHxl3J6AfNywqUCiwpqf4OXDhjvsfJ8FOS2LSbfMA1 Cxmg== X-Gm-Message-State: AC+VfDw6RCIYcF8wj3ITcH7ADM7FahQNbVPdiRWlg0u6/+Ceyj/yP9Xm EtWNGNY32adpSQbFOOCqzgI0OYlQMwvfzoLdfr/4FjIVV1wRCkcz3p51nFZnY/Y0BuZYCUWq/Ex ZivSXxDiB0Hiad1vc X-Received: by 2002:a05:6214:29e4:b0:5ef:739a:1c46 with SMTP id jv4-20020a05621429e400b005ef739a1c46mr54133612qvb.1.1684228296195; Tue, 16 May 2023 02:11:36 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7ihKd36gGMcTJKTRL1haIaDunvM6wa1GmPVMHdzciLh5kEqzQ78ojI0STMLCa3sbvm6HaSgA== X-Received: by 2002:a05:6214:29e4:b0:5ef:739a:1c46 with SMTP id jv4-20020a05621429e400b005ef739a1c46mr54133583qvb.1.1684228295813; Tue, 16 May 2023 02:11:35 -0700 (PDT) Received: from fedora (g2.ign.cz. [91.219.240.8]) by smtp.gmail.com with ESMTPSA id k3-20020ac80203000000b003e39106bdb2sm6105296qtg.31.2023.05.16.02.11.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 May 2023 02:11:35 -0700 (PDT) From: Vitaly Kuznetsov To: Michael Kelley Cc: stable@vger.kernel.org, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, decui@microsoft.com Subject: Re: [PATCH 1/1] Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs In-Reply-To: <1684172191-17100-1-git-send-email-mikelley@microsoft.com> References: <1684172191-17100-1-git-send-email-mikelley@microsoft.com> Date: Tue, 16 May 2023 11:11:32 +0200 Message-ID: <87pm707i9n.fsf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org Michael Kelley writes: > vmbus_wait_for_unload() may be called in the panic path after other > CPUs are stopped. vmbus_wait_for_unload() currently loops through > online CPUs looking for the UNLOAD response message. But the values of > CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used > to stop the other CPUs, and in one of the paths the stopped CPUs > are removed from cpu_online_mask. This removal happens in both > x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload() > only checks the panic'ing CPU, and misses the UNLOAD response message > except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload() > eventually times out, but only after waiting 100 seconds. > > Fix this by looping through *present* CPUs in vmbus_wait_for_unload(). > The cpu_present_mask is not modified by stopping the other CPUs in the > panic path, nor should it be. Furthermore, the synic_message_page > being checked in vmbus_wait_for_unload() is allocated in > hv_synic_alloc() for all present CPUs. So looping through the > present CPUs is more consistent. > > For additional safety, also add a check for the message_page being > NULL before looking for the UNLOAD response message. > > Reported-by: John Starks > Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios") I see you Cc:ed stable@ on the patch, should we also add Cc: stable@vger.kernel.org here explicitly so it gets picked up by various stable backporting scritps? I guess Wei can do it when picking the patch to the queue... > Signed-off-by: Michael Kelley > --- > drivers/hv/channel_mgmt.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > index 007f26d..df2ba20 100644 > --- a/drivers/hv/channel_mgmt.c > +++ b/drivers/hv/channel_mgmt.c > @@ -829,11 +829,14 @@ static void vmbus_wait_for_unload(void) > if (completion_done(&vmbus_connection.unload_event)) > goto completed; > > - for_each_online_cpu(cpu) { > + for_each_present_cpu(cpu) { > struct hv_per_cpu_context *hv_cpu > = per_cpu_ptr(hv_context.cpu_context, cpu); > > page_addr = hv_cpu->synic_message_page; > + if (!page_addr) > + continue; > + In theory, synic_message_page for all present CPUs is permanently assigned in hv_synic_alloc() and we fail the whole thing if any of these allocations fail so page_addr == NULL is likely impossible today but there's certainly no harm in having this extra check here, this is not a hotpath. > msg = (struct hv_message *)page_addr > + VMBUS_MESSAGE_SINT; > > @@ -867,11 +870,14 @@ static void vmbus_wait_for_unload(void) > * maybe-pending messages on all CPUs to be able to receive new > * messages after we reconnect. > */ > - for_each_online_cpu(cpu) { > + for_each_present_cpu(cpu) { > struct hv_per_cpu_context *hv_cpu > = per_cpu_ptr(hv_context.cpu_context, cpu); > > page_addr = hv_cpu->synic_message_page; > + if (!page_addr) > + continue; > + > msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT; > msg->header.message_type = HVMSG_NONE; > } Reviewed-by: Vitaly Kuznetsov -- Vitaly