From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C9EBC761A6 for ; Tue, 4 Apr 2023 19:35:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235998AbjDDTfC (ORCPT ); Tue, 4 Apr 2023 15:35:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235844AbjDDTfA (ORCPT ); Tue, 4 Apr 2023 15:35:00 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8EA50AC for ; Tue, 4 Apr 2023 12:34:59 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 140-20020a630792000000b0050be9465db8so9734397pgh.2 for ; Tue, 04 Apr 2023 12:34:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680636899; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=LuV+/sJOq2uDAF4iudL5mtYpR36Yypb8trQPNgvINbg=; b=KQ7Dk7TU7k221z8YiX2kN/3uZEQcyLXIZCxd4U6GQEAim0R92B50cjmrBHPVO8TEGw DZqacgV+ZS3QoLfWUbPPHDXqO23foWHIPdzrO+mD7geIFab38WD2+TzgvEWILX9ik8dX FW7NDUzHqUz3mqzF6i6OU6WaHOBSu1/T/Gd5CgoWstWYqyyluIr617MPHa1nOM3CJNuT LJaowseQzcCmf2rFJeDXwkk2U1TmYINodpv6Yi/6Df5PIB44+6s1SWW4gAF7Y02ABddC jQ78xmZrpu0/iKrfiQk3FnGkhLi4QUEXVGv4GMdE98xrUMJ4uhvo8TmVYyGol0YZvRBS uZHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680636899; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=LuV+/sJOq2uDAF4iudL5mtYpR36Yypb8trQPNgvINbg=; b=VEvEcsgcF3tEfwYimOWu22k5Hhc0G/Ou0hXcN0bb9Q63qATWBvGa8ERhIJ5HWMWKvP oLBYBLhmLduPOufWCNLRJ1tmiQPRx/Yw4QK7QHtLMnlfUrAQPneqE8YXoki1ZFeMKpjE iy1JgakgkKrnp0r580MnTnXboaL55HKO9WrKAl66euzxT5EVV5OKrDQC2sShFhCBbRZL GcWOa6vp8Or873KQErNlW+OPtA5zeQrvhZ8AWZjkWayp7iEFpvSoPpIZ2T9Kxrhkpj0g xIm96amZmq5jB9LrPuKmz5KRCpIrIatiT+a3HuVk6bv+akAn+WwxV9O9bRQz1h+egdi0 j1TA== X-Gm-Message-State: AAQBX9fAFQvAKH6j/i0ctnQ9EWrJVdyUBmotjkVBl/SMDSFOecHmu2fb RijAsP3hgf0xhGhyAMB3vKyyjzovewI= X-Google-Smtp-Source: AKy350bhdJ/ttpMbZ6gVAtAiTSzKPofPi3LmGG5tA+Ip3z9TKfjG/MNXk6an2WuCm3VuIO2e6EeXVPR7RAo= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:150c:0:b0:514:26cf:9c05 with SMTP id v12-20020a63150c000000b0051426cf9c05mr488041pgl.7.1680636899011; Tue, 04 Apr 2023 12:34:59 -0700 (PDT) Date: Tue, 4 Apr 2023 12:34:57 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230317000226.GA408922@ls.amr.corp.intel.com> Message-ID: Subject: Re: [WIP Patch v2 04/14] KVM: x86: Add KVM_CAP_X86_MEMORY_FAULT_EXIT and associated kvm_run field From: Sean Christopherson To: Anish Moorthy Cc: Isaku Yamahata , Marc Zyngier , Oliver Upton , jthoughton@google.com, kvm@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Tue, Mar 28, 2023, Anish Moorthy wrote: > On Tue, Mar 21, 2023 at 12:43=E2=80=AFPM Sean Christopherson wrote: > > > > On Tue, Mar 21, 2023, Anish Moorthy wrote: > > > On Tue, Mar 21, 2023 at 8:21=E2=80=AFAM Sean Christopherson wrote: > > > > FWIW, I completely agree that filling KVM_EXIT_MEMORY_FAULT without= guaranteeing > > > > that KVM "immediately" exits to userspace isn't ideal, but given th= e amount of > > > > historical code that we need to deal with, it seems like the lesser= of all evils. > > > > Unless I'm misunderstanding the use cases, unnecessarily filling kv= m_run is a far > > > > better failure mode than KVM not filling kvm_run when it should, i.= e. false > > > > positives are ok, false negatives are fatal. > > > > > > Don't you have this in reverse? > > > > No, I don't think so. > > > > > False negatives will just result in userspace not having useful extra > > > information for the -EFAULT it receives from KVM_RUN, in which case u= serspace > > > can do what you mentioned all VMMs do today and just terminate the VM= . > > > > And that is _really_ bad behavior if we have any hope of userspace actu= ally being > > able to rely on this functionality. E.g. any false negative when users= pace is > > trying to do postcopy demand paging will be fatal to the VM. > > > > > Whereas a false positive might cause a double-write to the KVM_RUN st= ruct, > > > either putting incorrect information in kvm_run.memory_fault or > > > > Recording unused information on -EFAULT in kvm_run doesn't make the inf= ormation > > incorrect. >=20 > Let's say that some function (converted to annotate its EFAULTs) fills > in kvm_run.memory_fault, but the EFAULT is suppressed from being > returned from kvm_run. What if, later within the same kvm_run call, > some other function (which we've completely overlooked) EFAULTs and > that return value actually does make it out to kvm_run? Userspace > would get stale information, which could be catastrophic. "catastrophic" is a bit hyperbolic. Yes, it would be bad, but at _worst_ u= serspace will kill the VM, which is the status quo today. > Actually even performing the annotations only in functions that > currently always bubble EFAULTs to userspace still seems brittle: if > new callers are ever added which don't bubble the EFAULTs, then we end > up in the same situation. Because of KVM's semi-magical '1 =3D=3D resume, -errno/0 =3D=3D exit' "desi= gn", that's true for literally every exit to userspace in KVM and every VM-Exit handler= . E.g. see commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)"), where KVM returned '-1' instead of '1' when reject= ing MSR accesses and inadvertantly killed the VM. A similar bug would be if KV= M returned EFAULT instead of -EFAULT, in which case vcpu_run() would resume t= he guest instead of exiting to userspace and likely put the vCPU into an infin= ite loop. Do I want to harden KVM to make things like this less brittle? Absolutely.= Do I think we should hold up this functionality just because it doesn't solve al= l of pre-existing flaws in the related KVM code? No.