From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 348E9C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 14:52:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1380855AbiEEOzo (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 10:55:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46574 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1380847AbiEEOzj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 10:55:39 -0400
Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [170.10.129.74])
        by lindbergh.monkeyblade.net (Postfix) with ESMTP id 68A2156FBB
        for <linux-kernel@vger.kernel.org>; Thu,  5 May 2022 07:51:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1651762318;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:content-type:content-type:
         in-reply-to:in-reply-to:references:references;
        bh=uM+ofZNcfgbacCYmlCH29DRYVtctL+nEHY0EJrvY55Y=;
        b=AsMVoOj5RR2T7fIl62bIhAYSeht0nMfldc84T2HgkOn2K2+1vUCyOsEjeEDAPy43ptosYw
        xLhnYc487J9SS6K2tUJUtcIDDrAjP5/yI2IocvZztzi2ma+Xqmpk0ATE9Jb8gaskWqIt1c
        kjFOnedYsMtRa7OmusYLPMY/BI7qUws=
Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com
 [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-101-bgtXw1kJNk2TCtSyx28gWA-1; Thu, 05 May 2022 10:51:58 -0400
X-MC-Unique: bgtXw1kJNk2TCtSyx28gWA-1
Received: by mail-wm1-f69.google.com with SMTP id o24-20020a05600c379800b003943412e81dso1848075wmr.6
        for <linux-kernel@vger.kernel.org>; Thu, 05 May 2022 07:51:57 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date
         :message-id:mime-version;
        bh=uM+ofZNcfgbacCYmlCH29DRYVtctL+nEHY0EJrvY55Y=;
        b=A4P7uCaf/ai/hiT9/xyZC8YJ5r0zgx6ZiSvxDk78HwYuQWfQsae3EqTKBr98eWSm0U
         3fnbFM7hLU1XW09KD7cBk4nPxLA8PTyUFH0vT168Merxy7f9Xcgxu1prbfwZsQO8hV1X
         8kzOqJ1B2Vrs9US7/lS7AomPcWbnaFlQwJnMtPt6xWMvVAgwmn5532Qrpq9OOWKbrjgY
         bDAAg0MWMTuSkOoO/89sO1BupZAwEaU4YTXY3b+LGSk+BN6pHSr1oCm3/ns2yvD1mGKb
         YW7ldS0uD09riG8LOpbaJtaZMhcgYlVqxGOcksQHXNyhJMmuRM3YtIwKloNfgJgI7FuV
         iGeg==
X-Gm-Message-State: AOAM532b91ViuOqd4GvMDtdiWY5MOzwAd4Fyhjmbmpu6zjqoIaNlmWKh
        pVMfs6TBml8d8lz9Uow1O/IqES9XuKBDeZ+g1ICEiS3Ukkr2mAqUkTlvbC0lJkuI4NWvKmEZ5fX
        DM8gkcVZYVyPO2sKspy0dggw9
X-Received: by 2002:a1c:6a1a:0:b0:394:272e:5bdf with SMTP id f26-20020a1c6a1a000000b00394272e5bdfmr5347729wmc.55.1651762316173;
        Thu, 05 May 2022 07:51:56 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJwzZ199d4B0hb5YxHFT/G5vw6e8Q1vc2C0FJNM5txW7lPwaFe4IrmxdRb6YfGAM2P9o3lQ5lw==
X-Received: by 2002:a1c:6a1a:0:b0:394:272e:5bdf with SMTP id f26-20020a1c6a1a000000b00394272e5bdfmr5347690wmc.55.1651762315893;
        Thu, 05 May 2022 07:51:55 -0700 (PDT)
Received: from fedora (nat-2.ign.cz. [91.219.240.2])
        by smtp.gmail.com with ESMTPSA id 25-20020a05600c029900b003942a244ed1sm1524096wmk.22.2022.05.05.07.51.54
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 05 May 2022 07:51:55 -0700 (PDT)
From:   Vitaly Kuznetsov <vkuznets@redhat.com>
To:     Mark Rutland <mark.rutland@arm.com>
Cc:     "Guilherme G. Piccoli" <gpiccoli@igalia.com>,
        "Michael Kelley (LINUX)" <mikelley@microsoft.com>,
        Marc Zyngier <maz@kernel.org>,
        Catalin Marinas <catalin.marinas@arm.com>,
        will Deacon <will@kernel.org>,
        Russell King <linux@armlinux.org.uk>,
        Ard Biesheuvel <ardb@kernel.org>, broonie@kernel.org,
        "linux-arm-kernel@lists.infradead.org" 
        <linux-arm-kernel@lists.infradead.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
Subject: Re: Should arm64 have a custom crash shutdown handler?
In-Reply-To: <YnPf3KPBXDNTpQoG@FVFF77S0Q05N.cambridge.arm.com>
References: <427a8277-49f0-4317-d6c3-4a15d7070e55@igalia.com>
 <874k24igjf.wl-maz@kernel.org>
 <92645c41-96fd-2755-552f-133675721a24@igalia.com>
 <YnPIwjLMDXgII1vf@FVFF77S0Q05N.cambridge.arm.com>
 <3bee47db-f771-b502-82a3-d6fac388aa89@igalia.com>
 <878rrg13zb.fsf@redhat.com>
 <YnPf3KPBXDNTpQoG@FVFF77S0Q05N.cambridge.arm.com>
Date:   Thu, 05 May 2022 16:51:54 +0200
Message-ID: <87y1zgyqut.fsf@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Mark Rutland <mark.rutland@arm.com> writes:

> On Thu, May 05, 2022 at 03:52:24PM +0200, Vitaly Kuznetsov wrote:
>> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
>> 
>> > On 05/05/2022 09:53, Mark Rutland wrote:
>> >> [...]
>> >> Looking at those, the cleanup work is all arch-specific. What exactly would we
>> >> need to do on arm64, and why does it need to happen at that point specifically?
>> >> On arm64 we don't expect as much paravirtualization as on x86, so it's not
>> >> clear to me whether we need anything at all.
>> >> 
>> >>> Anyway, the idea here was to gather a feedback on how "receptive" arm64
>> >>> community would be to allow such customization, appreciated your feedback =)
>> >> 
>> >> ... and are you trying to do this for Hyper-V or just using that as an example?
>> >> 
>> >> I think we're not going to be very receptive without a more concrete example of
>> >> what you want.
>> >> 
>> >> What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?
>> >> 
>> >> Thanks
>> >> Mark.
>> >
>> > Hi Mark, my plan would be doing that for Hyper-V - kind of the same
>> > code, almost. For example, in hv_crash_handler() there is a stimer
>> > clean-up and the vmbus unload - my understanding is that this same code
>> > would need to run in arm64. Michael Kelley is CCed, he was discussing
>> > with me in the panic notifiers thread and may elaborate more on the needs.
>> >
>> > But also (not related with my specific plan), I've seen KVM quiesce code
>> > on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if
>> > this is necessary for arm64 or if this already executing in some
>> > abstracted form, I didn't dig deep - probably Vitaly is aware of that,
>> > hence I've CCed him here.
>> 
>> Speaking about the difference between reboot notifiers call chain and
>> machine_ops.crash_shutdown for KVM/x86, the main difference is that
>> reboot notifier is called on some CPU while the VM is fully functional,
>> this way we may e.g. still use IPIs (see kvm_pv_reboot_notify() doing
>> on_each_cpu()). When we're in a crash situation,
>> machine_ops.crash_shutdown is called on the CPU which crashed. We can't
>> count on IPIs still being functional so we do the very basic minimum so
>> *this* CPU can boot kdump kernel. There's no guarantee other CPUs can
>> still boot but normally we do kdump with 'nprocs=1'.
>
> Sure; IIUC the IPI problem doesn't apply to arm64, though, since that doesn't
> use a PV mechanism (and practically speaking will either be GICv2 or GICv3).
>

This isn't really about PV: when the kernel is crashing, you have no
idea what's going on on other CPUs, they may be crashing too, locked in
a tight loop, ... so sending an IPI there to do some work and expecting
it to report back is dangerous.

>> For Hyper-V, the situation is similar: hv_crash_handler() intitiates
>> VMbus unload on the crashing CPU only, there's no mechanism to do
>> 'global' unload so other CPUs will likely not be able to connect Vmbus
>> devices in kdump kernel but this should not be necessary.
>
> Given kdump is best-effort (and we can't rely on secondary CPUs even making it
> into the kdump kernel), I also don't think that should be necessary.

Yes, exactly.

>
>> There's a crash_kexec_post_notifiers mechanism which can be used instead
>> but it's disabled by default so using machine_ops.crash_shutdown is
>> better.
>
> Another option is to defer this to the kdump kernel. On arm64 at least, we know
> if we're in a kdump kernel early on, and can reset some state based upon that.
>
> Looking at x86's hyperv_cleanup(), everything relevant to arm64 can be deferred
> to just before the kdump kernel detects and initializes anything relating to
> hyperv. So AFAICT we could have hyperv_init() check is_kdump_kernel() prior to
> the first hypercall, and do the cleanup/reset there.

In theory yes, it is possible to try sending CHANNELMSG_UNLOAD on kdump
kernel boot and not upon crash, I don't remember if this approach was
tried in the past. 

>
> Maybe we need more data for the vmbus bits? ... if so it seems that could blow
> up anyway when the first kernel was tearing down.

Not sure I understood what you mean... From what I remember, there were
issues with CHANNELMSG_UNLOAD handling on the Hyper-V host side in the
past (it was taking *minutes* for the host to reply) but this is
orthogonal to the fact that we need to do this cleanup so kdump kernel
is able to connect to Vmbus devices again.

-- 
Vitaly