kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] SVM: L2 hang with fresh L1 and old L0
@ 2022-06-28 14:23 Alexander Mikhalitsyn
  2022-06-28 14:41 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 2+ messages in thread
From: Alexander Mikhalitsyn @ 2022-06-28 14:23 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Joerg Roedel,
	babu.moger, den, ptikhomirov, alexander

Dear friends,

Recently, we (in OpenVZ) noticed an interesting issue with
L2 VM hang on RHEL 7 based hosts with SVM (AMD).

Let me describe our test configuration:
- AMD EPYC 7443P (Milan) or AMD EPYC 7261 (Rome)
- RHEL 7 based kernel on the Host Node.
... and most important:

L0 -----------> L1 --------> L2
RHEL 7       -> RHEL 7 --------> RHEL 7        *works*
RHEL 7       -> RHEL 7 --------> RHEL 8        *works*
RHEL 7       -> RHEL 7 --------> recent Fedora *works*
RHEL 7       -> RHEL 8 --------> RHEL 7        *L2 hang*
RHEL 7       -> fresh Fedora --> RHEL 7        *L2 hang*

or even more:
RHEL 7       -> RHEL 7 --------> *any tested Linux guest*  *works*
RHEL 7       -> RHEL 8 --------> *any tested Linux guest*  *L2 hang*

but at the same time:
RHEL 8       -> RHEL 8 --------> *any tested Linux guest*  *works*

It was the key observation and I've started bisecting L1 kernel to find
some hint. It was commit:
c9d40913 ("KVM: x86: enable event window in inject_pending_event")

At the same minute I've tried to revert it for CentOS 8 kernel and retry test,
and it... works! To conclude, if we have an *old* kernel on host and *sufficiently new* kernel
in L1 then L2 totaly broken (only for SVM).

I've tried to port this patch for L0 kernel and check if it will fix the issue. And yes,
it works. I wonder if it will be useful information for KVM developers and users.

My attempt to port it for RHEL 7 kernel:
https://lists.openvz.org/pipermail/devel/2022-June/079776.html

Possibly I need to port this patches for stable kernels too and send it?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.9.320&qt=grep&q=enable+event+window+in+inject_pending_event
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.14.285&qt=grep&q=enable+event+window+in+inject_pending_event
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.19.249&qt=grep&q=enable+event+window+in+inject_pending_event
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v5.4.201&qt=grep&q=enable+event+window+in+inject_pending_event

So, 4.9, 4.14, 4.19 and 5.4 kernels lacks this patch.

I've not checked that yet but it looks like, for instance,

L0  -> L1   -> L2
5.4 -> 5.10 -> *any kernel version*

setup will hang for SVM.

Regards,
    Alex

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC] SVM: L2 hang with fresh L1 and old L0
  2022-06-28 14:23 [RFC] SVM: L2 hang with fresh L1 and old L0 Alexander Mikhalitsyn
@ 2022-06-28 14:41 ` Vitaly Kuznetsov
  0 siblings, 0 replies; 2+ messages in thread
From: Vitaly Kuznetsov @ 2022-06-28 14:41 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Joerg Roedel, babu.moger, den, ptikhomirov,
	alexander, Maxim Levitsky

Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> writes:

> Dear friends,
>
> Recently, we (in OpenVZ) noticed an interesting issue with
> L2 VM hang on RHEL 7 based hosts with SVM (AMD).
>
> Let me describe our test configuration:
> - AMD EPYC 7443P (Milan) or AMD EPYC 7261 (Rome)
> - RHEL 7 based kernel on the Host Node.
> ... and most important:
>
> L0 -----------> L1 --------> L2
> RHEL 7       -> RHEL 7 --------> RHEL 7        *works*
> RHEL 7       -> RHEL 7 --------> RHEL 8        *works*
> RHEL 7       -> RHEL 7 --------> recent Fedora *works*
> RHEL 7       -> RHEL 8 --------> RHEL 7        *L2 hang*
> RHEL 7       -> fresh Fedora --> RHEL 7        *L2 hang*
>
> or even more:
> RHEL 7       -> RHEL 7 --------> *any tested Linux guest*  *works*
> RHEL 7       -> RHEL 8 --------> *any tested Linux guest*  *L2 hang*
>
> but at the same time:
> RHEL 8       -> RHEL 8 --------> *any tested Linux guest*  *works*
>
> It was the key observation and I've started bisecting L1 kernel to find
> some hint. It was commit:
> c9d40913 ("KVM: x86: enable event window in inject_pending_event")
>
> At the same minute I've tried to revert it for CentOS 8 kernel and retry test,
> and it... works! To conclude, if we have an *old* kernel on host and *sufficiently new* kernel
> in L1 then L2 totaly broken (only for SVM).
>
> I've tried to port this patch for L0 kernel and check if it will fix the issue. And yes,
> it works. I wonder if it will be useful information for KVM developers and users.
>
> My attempt to port it for RHEL 7 kernel:
> https://lists.openvz.org/pipermail/devel/2022-June/079776.html

Thanks for the investigation!

FWIW, nesting was never supported in RHEL7. It was disabled by default
and only worked to certain extent on Intel. By the time we stopped
rebasing KVM in RHEL7, nested SVM was still a trainwreck, even upstream.

>
> Possibly I need to port this patches for stable kernels too and send it?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.9.320&qt=grep&q=enable+event+window+in+inject_pending_event
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.14.285&qt=grep&q=enable+event+window+in+inject_pending_event
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.19.249&qt=grep&q=enable+event+window+in+inject_pending_event
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v5.4.201&qt=grep&q=enable+event+window+in+inject_pending_event
>
> So, 4.9, 4.14, 4.19 and 5.4 kernels lacks this patch.

Personally, I wouldn't bother with anything below 5.4, nSVM is in very
poor shape there, fixing one problem will just create an illusion that
it is 'supported'. 

>
> I've not checked that yet but it looks like, for instance,
>
> L0  -> L1   -> L2
> 5.4 -> 5.10 -> *any kernel version*
>
> setup will hang for SVM.

Cc: Max who fixed a long list of issues on nSVM.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-06-28 14:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-28 14:23 [RFC] SVM: L2 hang with fresh L1 and old L0 Alexander Mikhalitsyn
2022-06-28 14:41 ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).