From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB85B2E62B for ; Fri, 13 Dec 2024 01:58:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734055090; cv=none; b=ddmauk878TeaSBNxqI8REtg7QDOVHGruarxqFeiyzFTJbheYV591cYxh45yuM7mJjNuyKHXwxEeXxE7XAJDBE3JoP0PnaAnoV4PKp/lJMAxVZmqjx7kdvtZDpaXV/IGyHnrV1EfIGgCWkuOW+SVj63KAHsMOpO+VxAvGtkrGxXc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734055090; c=relaxed/simple; bh=rEh6nEUB6b2gWFVudTedi+7SvZpJ1yXIcaAEvQ6Ls4c=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=Zz9CqMWItmpfUXFfci6Wfoo5i/3c6BfYnCeFuy76GBhDcTTuZ1cN/a/bjoNTjlFtd3IeUcyYxLrvesx8SSjkLcGiIG9X7mcf8l1kXCyAbj+8FeACQF65bM51RgjaUZtx7Y32bk+rvbWd0vjBkgQM+uTlKR/FLv+F157KM7Xf/Nw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=a1q+A77R; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="a1q+A77R" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734055088; x=1765591088; h=message-id:date:mime-version:subject:from:to:cc: references:in-reply-to:content-transfer-encoding; bh=rEh6nEUB6b2gWFVudTedi+7SvZpJ1yXIcaAEvQ6Ls4c=; b=a1q+A77R+rod8tYtugwWlt2YZj1pmU2r9S0HGJTw9j2iQvwlxhGEDbK5 l6UHPUeYzkAtVrqS2SPgrYTP6giApxbxAxImCbjHnRe2SAeC/BVlvOCwb EPdCLwb+5kID/3/KERViym4WOC8hxva8bd67iycTtn4sfdyJonE7KrRBi aRHGZUSoJCCYuiHzdp6BgNiPwjL7sOoq6hI4JTP2A8j8KPz02orLxNn13 aYDDnTQ836hIa7OgxMa9IJGyZHEJ0VnYff9fyArqfum8+cJ2Zo9BC4BKy FvaJ7JK+F6Wozm1ToifCYo6N9UbMUi7IPjAHeuFfM4+6noxguqq+qiN4p A==; X-CSE-ConnectionGUID: qk6N46QpRpqpAxdgkRclvg== X-CSE-MsgGUID: ADpGYrc5TX2lzAAVdlBvIg== X-IronPort-AV: E=McAfee;i="6700,10204,11284"; a="45188813" X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="45188813" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 17:58:07 -0800 X-CSE-ConnectionGUID: 3VN4NUYtTsGLF8+vnybM8g== X-CSE-MsgGUID: 0ou1/nhkTMiLJu+On8Y/Ig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,230,1728975600"; d="scan'208";a="101260708" Received: from unknown (HELO [10.238.9.154]) ([10.238.9.154]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2024 17:52:36 -0800 Message-ID: <2cf8bf60-b36d-47ca-9aef-d477a841cbed@linux.intel.com> Date: Fri, 13 Dec 2024 09:52:34 +0800 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] i386/kvm: Set return value after handling KVM_EXIT_HYPERCALL From: Binbin Wu To: Paolo Bonzini , Sean Christopherson Cc: Zhao Liu , xiaoyao.li@intel.com, qemu-devel@nongnu.org, michael.roth@amd.com, rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, farrah.chen@intel.com, kvm@vger.kernel.org References: <20241212032628.475976-1-binbin.wu@linux.intel.com> <1a5e2988-9a7d-4415-86ad-8a7a98dbc5eb@redhat.com> <5b8f7d63-ef0a-487f-bf9d-44421691fa85@redhat.com> <745b2b6e-7dd0-4437-bbbf-673ddc0df014@linux.intel.com> Content-Language: en-US In-Reply-To: <745b2b6e-7dd0-4437-bbbf-673ddc0df014@linux.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 12/13/2024 9:46 AM, Binbin Wu wrote: > > > > On 12/13/2024 5:28 AM, Paolo Bonzini wrote: >> On 12/12/24 20:13, Sean Christopherson wrote: >>> On Thu, Dec 12, 2024, Paolo Bonzini wrote: >>>> On 12/12/24 09:07, Zhao Liu wrote: >>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: >>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800 >>>>>> From: Binbin Wu >>>>>> Subject: [PATCH] i386/kvm: Set return value after handling >>>>>>    KVM_EXIT_HYPERCALL >>>>>> X-Mailer: git-send-email 2.46.0 >>>>>> >>>>>> Userspace should set the ret field of hypercall after handling >>>>>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM. >>>>>> >>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") >>>>>> Reported-by: Farrah Chen >>>>>> Signed-off-by: Binbin Wu >>>>>> Tested-by: Farrah Chen >>>>>> --- >>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, >>>>>> otherwise, TDX guest boot could fail. >>>>>> A matching QEMU tree including this patch is here: >>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value >>>>>> >>>>>> Previously, the issue was not triggered because no one would modify the ret >>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the >>>>>> value could be modified. >>>>> >>>>> Could you explain the specific reasons here in detail? It would be >>>>> helpful with debugging or reproducing the issue. >>>>> >>>>>> --- >>>>>>    target/i386/kvm/kvm.c | 8 ++++++-- >>>>>>    1 file changed, 6 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>>>>> index 8e17942c3b..4bcccb48d1 100644 >>>>>> --- a/target/i386/kvm/kvm.c >>>>>> +++ b/target/i386/kvm/kvm.c >>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) >>>>>>    static int kvm_handle_hypercall(struct kvm_run *run) >>>>>>    { >>>>>> +    int ret = -EINVAL; >>>>>> + >>>>>>        if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >>>>>> -        return kvm_handle_hc_map_gpa_range(run); >>>>>> +        ret = kvm_handle_hc_map_gpa_range(run); >>>>>> + >>>>>> +    run->hypercall.ret = ret; >>>>> >>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to >>>>> -ret? >>>> >>>> If ret is less than zero, will stop the VM anyway as >>>> RUN_STATE_INTERNAL_ERROR. >>>> >>>> If this has to be fixed in QEMU, I think there's no need to set anything >>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and >>>> that's not how the error would be passed to the guest. >>>> >>>> However, I think the right fix should simply be this in KVM: >>>> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>> index 83fe0a78146f..e2118ba93ef6 100644 >>>> --- a/arch/x86/kvm/x86.c >>>> +++ b/arch/x86/kvm/x86.c >>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, >>>>           } >>>>           vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL; >>>> +        vcpu->run->ret                = 0; >>> >>>         vcpu->run->hypercall.ret >>> >>>> vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE; >>>>           vcpu->run->hypercall.args[0]  = gpa; >>>>           vcpu->run->hypercall.args[1]  = npages; >>>> >>>> While there is arguably a change in behavior of the kernel both with >>>> the patches in kvm-coco-queue and with the above one, _in practice_ >>>> the above change is one that userspace will not notice. >>> >>> I agree that KVM should initialize "ret", but I don't think '0' is the right >>> value.  KVM shouldn't assume userspace will successfully handle the hypercall. >>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS? >> >> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest. >> >> In other words, the above one-liner is pulling the "don't break userspace" card. >> >> Paolo >> >> > If the change need to be done in KVM, there are other 3 functions that use > KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue. > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 40fe7258843e..a624f7289282 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr) >         } > >         vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > +       vcpu->run->ret         = 0; >         vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; >         vcpu->run->hypercall.args[0] = gpa; >         vcpu->run->hypercall.args[1] = 1; > @@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc) >         case VMGEXIT_PSC_OP_PRIVATE: >         case VMGEXIT_PSC_OP_SHARED: >                 vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > +               vcpu->run->ret         = 0; >                 vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; >                 vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn); >                 vcpu->run->hypercall.args[1] = npages; > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 85c8aee263c1..c50c2edc8c56 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx) >         pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size); > >         tdx->vcpu.run->exit_reason       = KVM_EXIT_HYPERCALL; > +       tdx->vcpu->run->ret              = 0; >         tdx->vcpu.run->hypercall.nr      = KVM_HC_MAP_GPA_RANGE; >         tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); >         tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4f94b1e24eae..3f82bb2357e3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, >                 } > >                 vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL; > +               vcpu->run->ret                = 0; >                 vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE; >                 vcpu->run->hypercall.args[0]  = gpa; >                 vcpu->run->hypercall.args[1]  = npages; > Maybe we could add a helper to fill the vcpu->run?