* [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter
@ 2023-09-01 18:34 Luiz Capitulino
2023-09-01 18:34 ` [PATH 6.1.y 1/2] KVM: x86/mmu: Use kstrtobool() instead of strtobool() Luiz Capitulino
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Luiz Capitulino @ 2023-09-01 18:34 UTC (permalink / raw)
To: stable, seanjc, christophe.jaillet; +Cc: lcapitulino, Luiz Capitulino
Hi,
As part of the mitigation for the iTLB multihit vulnerability, KVM creates
a worker thread in KVM_CREATE_VM ioctl(). This thread calls
cgroup_attach_task_all() which takes cgroup_threadgroup_rwsem for writing
which may incur 100ms+ latency since upstream commit
6a010a49b63ac8465851a79185d8deff966f8e1a.
However, if the CPU is not vulnerable to iTLB multihit one could just
disable the mitigation (and the worker thread creation) with the
newly added KVM module parameter nx_huge_pages=never. This avoids the issue
altogether.
While there's an alternative solution for this issue already supported
in 6.1-stable (ie. cgroup's favordynmods), disabling the mitigation in
KVM is probably preferable if the workload is not impacted by dynamic
cgroup operations since one doesn't need to decide between the trade-off
in using favordynmods, the thread creation code path is avoided at
KVM_CREATE_VM and you avoid creating a thread which does nothing.
Tests performed:
* Measured KVM_CREATE_VM latency and confirmed it goes down to less than 1ms
* We've been performing latency measurements internally w/ this parameter
for some weeks now
Christophe JAILLET (1):
KVM: x86/mmu: Use kstrtobool() instead of strtobool()
Sean Christopherson (1):
KVM: x86/mmu: Add "never" option to allow sticky disabling of
nx_huge_pages
arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 37 insertions(+), 5 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATH 6.1.y 1/2] KVM: x86/mmu: Use kstrtobool() instead of strtobool()
2023-09-01 18:34 [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Luiz Capitulino
@ 2023-09-01 18:34 ` Luiz Capitulino
2023-09-06 0:01 ` Sean Christopherson
2023-09-01 18:34 ` [PATH 6.1.y 2/2] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages Luiz Capitulino
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Luiz Capitulino @ 2023-09-01 18:34 UTC (permalink / raw)
To: stable, seanjc, christophe.jaillet; +Cc: lcapitulino, Luiz Capitulino
From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Commit 11b36fe7d4500c8ef73677c087f302fd713101c2 upstream.
strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.
In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.
While at it, include the corresponding header file (<linux/kstrtox.h>)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/670882aa04dbdd171b46d3b20ffab87158454616.1673689135.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
---
arch/x86/kvm/mmu/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index beca03556379..c089242008b3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -42,6 +42,7 @@
#include <linux/uaccess.h>
#include <linux/hash.h>
#include <linux/kern_levels.h>
+#include <linux/kstrtox.h>
#include <linux/kthread.h>
#include <asm/page.h>
@@ -6667,7 +6668,7 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
new_val = 1;
else if (sysfs_streq(val, "auto"))
new_val = get_nx_auto_mode();
- else if (strtobool(val, &new_val) < 0)
+ else if (kstrtobool(val, &new_val) < 0)
return -EINVAL;
__set_nx_huge_pages(new_val);
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATH 6.1.y 2/2] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages
2023-09-01 18:34 [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Luiz Capitulino
2023-09-01 18:34 ` [PATH 6.1.y 1/2] KVM: x86/mmu: Use kstrtobool() instead of strtobool() Luiz Capitulino
@ 2023-09-01 18:34 ` Luiz Capitulino
2023-09-06 0:02 ` Sean Christopherson
2023-09-02 7:27 ` [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Greg KH
2023-09-07 11:25 ` Greg KH
3 siblings, 1 reply; 8+ messages in thread
From: Luiz Capitulino @ 2023-09-01 18:34 UTC (permalink / raw)
To: stable, seanjc, christophe.jaillet
Cc: lcapitulino, Li RongQing, Yong He, Robert Hoo, Kai Huang,
Luiz Capitulino
From: Sean Christopherson <seanjc@google.com>
Commit 0b210faf337314e4bc88e796218bc70c72a51209 upstream.
[ Resolved a small conflict in arch/x86/kvm/mmu/mmu.c::kvm_mmu_post_init_vm()
which is due kvm_nx_lpage_recovery_worker() being renamed in upstream
commit 55c510e26ab6181c132327a8b90c864e6193ce27 ]
Add a "never" option to the nx_huge_pages module param to allow userspace
to do a one-way hard disabling of the mitigation, and don't create the
per-VM recovery threads when the mitigation is hard disabled. Letting
userspace pinky swear that userspace doesn't want to enable NX mitigation
(without reloading KVM) allows certain use cases to avoid the latency
problems associated with spawning a kthread for each VM.
E.g. in FaaS use cases, the guest kernel is trusted and the host may
create 100+ VMs per logical CPU, which can result in 100ms+ latencies when
a burst of VMs is created.
Reported-by: Li RongQing <lirongqing@baidu.com>
Closes: https://lore.kernel.org/all/1679555884-32544-1-git-send-email-lirongqing@baidu.com
Cc: Yong He <zhuangel570@gmail.com>
Cc: Robert Hoo <robert.hoo.linux@gmail.com>
Cc: Kai Huang <kai.huang@intel.com>
Reviewed-by: Robert Hoo <robert.hoo.linux@gmail.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Luiz Capitulino <luizcap@amazon.com>
Reviewed-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/r/20230602005859.784190-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
---
arch/x86/kvm/mmu/mmu.c | 41 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 36 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c089242008b3..7a6df4b62c1b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -56,6 +56,8 @@
extern bool itlb_multihit_kvm_mitigation;
+static bool nx_hugepage_mitigation_hard_disabled;
+
int __read_mostly nx_huge_pages = -1;
static uint __read_mostly nx_huge_pages_recovery_period_ms;
#ifdef CONFIG_PREEMPT_RT
@@ -65,12 +67,13 @@ static uint __read_mostly nx_huge_pages_recovery_ratio = 0;
static uint __read_mostly nx_huge_pages_recovery_ratio = 60;
#endif
+static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp);
static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel_param *kp);
static const struct kernel_param_ops nx_huge_pages_ops = {
.set = set_nx_huge_pages,
- .get = param_get_bool,
+ .get = get_nx_huge_pages,
};
static const struct kernel_param_ops nx_huge_pages_recovery_param_ops = {
@@ -6645,6 +6648,14 @@ static void mmu_destroy_caches(void)
kmem_cache_destroy(mmu_page_header_cache);
}
+static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp)
+{
+ if (nx_hugepage_mitigation_hard_disabled)
+ return sprintf(buffer, "never\n");
+
+ return param_get_bool(buffer, kp);
+}
+
static bool get_nx_auto_mode(void)
{
/* Return true when CPU has the bug, and mitigations are ON */
@@ -6661,15 +6672,29 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
bool old_val = nx_huge_pages;
bool new_val;
+ if (nx_hugepage_mitigation_hard_disabled)
+ return -EPERM;
+
/* In "auto" mode deploy workaround only if CPU has the bug. */
- if (sysfs_streq(val, "off"))
+ if (sysfs_streq(val, "off")) {
new_val = 0;
- else if (sysfs_streq(val, "force"))
+ } else if (sysfs_streq(val, "force")) {
new_val = 1;
- else if (sysfs_streq(val, "auto"))
+ } else if (sysfs_streq(val, "auto")) {
new_val = get_nx_auto_mode();
- else if (kstrtobool(val, &new_val) < 0)
+ } else if (sysfs_streq(val, "never")) {
+ new_val = 0;
+
+ mutex_lock(&kvm_lock);
+ if (!list_empty(&vm_list)) {
+ mutex_unlock(&kvm_lock);
+ return -EBUSY;
+ }
+ nx_hugepage_mitigation_hard_disabled = true;
+ mutex_unlock(&kvm_lock);
+ } else if (kstrtobool(val, &new_val) < 0) {
return -EINVAL;
+ }
__set_nx_huge_pages(new_val);
@@ -6800,6 +6825,9 @@ static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel
uint old_period, new_period;
int err;
+ if (nx_hugepage_mitigation_hard_disabled)
+ return -EPERM;
+
was_recovery_enabled = calc_nx_huge_pages_recovery_period(&old_period);
err = param_set_uint(val, kp);
@@ -6923,6 +6951,9 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
{
int err;
+ if (nx_hugepage_mitigation_hard_disabled)
+ return 0;
+
err = kvm_vm_create_worker_thread(kvm, kvm_nx_lpage_recovery_worker, 0,
"kvm-nx-lpage-recovery",
&kvm->arch.nx_lpage_recovery_thread);
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter
2023-09-01 18:34 [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Luiz Capitulino
2023-09-01 18:34 ` [PATH 6.1.y 1/2] KVM: x86/mmu: Use kstrtobool() instead of strtobool() Luiz Capitulino
2023-09-01 18:34 ` [PATH 6.1.y 2/2] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages Luiz Capitulino
@ 2023-09-02 7:27 ` Greg KH
2023-09-03 17:28 ` Luiz Capitulino
2023-09-07 11:25 ` Greg KH
3 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2023-09-02 7:27 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: stable, seanjc, christophe.jaillet, lcapitulino
On Fri, Sep 01, 2023 at 06:34:51PM +0000, Luiz Capitulino wrote:
> Hi,
>
> As part of the mitigation for the iTLB multihit vulnerability, KVM creates
> a worker thread in KVM_CREATE_VM ioctl(). This thread calls
> cgroup_attach_task_all() which takes cgroup_threadgroup_rwsem for writing
> which may incur 100ms+ latency since upstream commit
> 6a010a49b63ac8465851a79185d8deff966f8e1a.
>
> However, if the CPU is not vulnerable to iTLB multihit one could just
> disable the mitigation (and the worker thread creation) with the
> newly added KVM module parameter nx_huge_pages=never. This avoids the issue
> altogether.
>
> While there's an alternative solution for this issue already supported
> in 6.1-stable (ie. cgroup's favordynmods), disabling the mitigation in
> KVM is probably preferable if the workload is not impacted by dynamic
> cgroup operations since one doesn't need to decide between the trade-off
> in using favordynmods, the thread creation code path is avoided at
> KVM_CREATE_VM and you avoid creating a thread which does nothing.
>
> Tests performed:
>
> * Measured KVM_CREATE_VM latency and confirmed it goes down to less than 1ms
> * We've been performing latency measurements internally w/ this parameter
> for some weeks now
What about the 6.4.y kernel for these changes? Anyone moving from 6.1
to 6.4 will have a regression, right?
Or you can wait a week or so for 6.4.y to go end-of-life, your choice :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter
2023-09-02 7:27 ` [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Greg KH
@ 2023-09-03 17:28 ` Luiz Capitulino
0 siblings, 0 replies; 8+ messages in thread
From: Luiz Capitulino @ 2023-09-03 17:28 UTC (permalink / raw)
To: Greg KH; +Cc: stable, seanjc, christophe.jaillet, lcapitulino
On 2023-09-02 03:27, Greg KH wrote:
>
>
>
> On Fri, Sep 01, 2023 at 06:34:51PM +0000, Luiz Capitulino wrote:
>> Hi,
>>
>> As part of the mitigation for the iTLB multihit vulnerability, KVM creates
>> a worker thread in KVM_CREATE_VM ioctl(). This thread calls
>> cgroup_attach_task_all() which takes cgroup_threadgroup_rwsem for writing
>> which may incur 100ms+ latency since upstream commit
>> 6a010a49b63ac8465851a79185d8deff966f8e1a.
>>
>> However, if the CPU is not vulnerable to iTLB multihit one could just
>> disable the mitigation (and the worker thread creation) with the
>> newly added KVM module parameter nx_huge_pages=never. This avoids the issue
>> altogether.
>>
>> While there's an alternative solution for this issue already supported
>> in 6.1-stable (ie. cgroup's favordynmods), disabling the mitigation in
>> KVM is probably preferable if the workload is not impacted by dynamic
>> cgroup operations since one doesn't need to decide between the trade-off
>> in using favordynmods, the thread creation code path is avoided at
>> KVM_CREATE_VM and you avoid creating a thread which does nothing.
>>
>> Tests performed:
>>
>> * Measured KVM_CREATE_VM latency and confirmed it goes down to less than 1ms
>> * We've been performing latency measurements internally w/ this parameter
>> for some weeks now
>
> What about the 6.4.y kernel for these changes? Anyone moving from 6.1
> to 6.4 will have a regression, right?
>
> Or you can wait a week or so for 6.4.y to go end-of-life, your choice :)
I can do this backport for 6.4.y if that's better for stable users. Will
submit the patches next week.
- Luiz
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATH 6.1.y 1/2] KVM: x86/mmu: Use kstrtobool() instead of strtobool()
2023-09-01 18:34 ` [PATH 6.1.y 1/2] KVM: x86/mmu: Use kstrtobool() instead of strtobool() Luiz Capitulino
@ 2023-09-06 0:01 ` Sean Christopherson
0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2023-09-06 0:01 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: stable, christophe.jaillet, lcapitulino
On Fri, Sep 01, 2023, Luiz Capitulino wrote:
> From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
>
> Commit 11b36fe7d4500c8ef73677c087f302fd713101c2 upstream.
>
> strtobool() is the same as kstrtobool().
> However, the latter is more used within the kernel.
>
> In order to remove strtobool() and slightly simplify kstrtox.h, switch to
> the other function name.
>
> While at it, include the corresponding header file (<linux/kstrtox.h>)
>
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> Link: https://lore.kernel.org/r/670882aa04dbdd171b46d3b20ffab87158454616.1673689135.git.christophe.jaillet@wanadoo.fr
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
> ---
Acked-by: Sean Christopherson <seanjc@google.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATH 6.1.y 2/2] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages
2023-09-01 18:34 ` [PATH 6.1.y 2/2] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages Luiz Capitulino
@ 2023-09-06 0:02 ` Sean Christopherson
0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2023-09-06 0:02 UTC (permalink / raw)
To: Luiz Capitulino
Cc: stable, christophe.jaillet, lcapitulino, Li RongQing, Yong He,
Robert Hoo, Kai Huang
On Fri, Sep 01, 2023, Luiz Capitulino wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Commit 0b210faf337314e4bc88e796218bc70c72a51209 upstream.
>
> [ Resolved a small conflict in arch/x86/kvm/mmu/mmu.c::kvm_mmu_post_init_vm()
> which is due kvm_nx_lpage_recovery_worker() being renamed in upstream
> commit 55c510e26ab6181c132327a8b90c864e6193ce27 ]
>
> Add a "never" option to the nx_huge_pages module param to allow userspace
> to do a one-way hard disabling of the mitigation, and don't create the
> per-VM recovery threads when the mitigation is hard disabled. Letting
> userspace pinky swear that userspace doesn't want to enable NX mitigation
> (without reloading KVM) allows certain use cases to avoid the latency
> problems associated with spawning a kthread for each VM.
>
> E.g. in FaaS use cases, the guest kernel is trusted and the host may
> create 100+ VMs per logical CPU, which can result in 100ms+ latencies when
> a burst of VMs is created.
>
> Reported-by: Li RongQing <lirongqing@baidu.com>
> Closes: https://lore.kernel.org/all/1679555884-32544-1-git-send-email-lirongqing@baidu.com
> Cc: Yong He <zhuangel570@gmail.com>
> Cc: Robert Hoo <robert.hoo.linux@gmail.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Robert Hoo <robert.hoo.linux@gmail.com>
> Acked-by: Kai Huang <kai.huang@intel.com>
> Tested-by: Luiz Capitulino <luizcap@amazon.com>
> Reviewed-by: Li RongQing <lirongqing@baidu.com>
> Link: https://lore.kernel.org/r/20230602005859.784190-1-seanjc@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
> ---
Acked-by: Sean Christopherson <seanjc@google.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter
2023-09-01 18:34 [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Luiz Capitulino
` (2 preceding siblings ...)
2023-09-02 7:27 ` [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Greg KH
@ 2023-09-07 11:25 ` Greg KH
3 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2023-09-07 11:25 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: stable, seanjc, christophe.jaillet, lcapitulino
On Fri, Sep 01, 2023 at 06:34:51PM +0000, Luiz Capitulino wrote:
> Hi,
>
> As part of the mitigation for the iTLB multihit vulnerability, KVM creates
> a worker thread in KVM_CREATE_VM ioctl(). This thread calls
> cgroup_attach_task_all() which takes cgroup_threadgroup_rwsem for writing
> which may incur 100ms+ latency since upstream commit
> 6a010a49b63ac8465851a79185d8deff966f8e1a.
>
> However, if the CPU is not vulnerable to iTLB multihit one could just
> disable the mitigation (and the worker thread creation) with the
> newly added KVM module parameter nx_huge_pages=never. This avoids the issue
> altogether.
>
> While there's an alternative solution for this issue already supported
> in 6.1-stable (ie. cgroup's favordynmods), disabling the mitigation in
> KVM is probably preferable if the workload is not impacted by dynamic
> cgroup operations since one doesn't need to decide between the trade-off
> in using favordynmods, the thread creation code path is avoided at
> KVM_CREATE_VM and you avoid creating a thread which does nothing.
>
> Tests performed:
>
> * Measured KVM_CREATE_VM latency and confirmed it goes down to less than 1ms
> * We've been performing latency measurements internally w/ this parameter
> for some weeks now
ALl now queued up, thanks.
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-09-07 15:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-01 18:34 [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Luiz Capitulino
2023-09-01 18:34 ` [PATH 6.1.y 1/2] KVM: x86/mmu: Use kstrtobool() instead of strtobool() Luiz Capitulino
2023-09-06 0:01 ` Sean Christopherson
2023-09-01 18:34 ` [PATH 6.1.y 2/2] KVM: x86/mmu: Add "never" option to allow sticky disabling of nx_huge_pages Luiz Capitulino
2023-09-06 0:02 ` Sean Christopherson
2023-09-02 7:27 ` [PATH 6.1.y 0/2] Backport KVM's nx_huge_pages=never module parameter Greg KH
2023-09-03 17:28 ` Luiz Capitulino
2023-09-07 11:25 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).