* [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
@ 2021-12-22 22:53 ` Vipin Sharma
0 siblings, 0 replies; 24+ messages in thread
From: Vipin Sharma @ 2021-12-22 22:53 UTC (permalink / raw)
To: pbonzini-H+wXaHxf7aLQT0dZR+AlfA, seanjc-hpIqsD4AKlfQT0dZR+AlfA,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
hannes-druUgvl0LCNAfugRpC6u6w
Cc: dmatlack-hpIqsD4AKlfQT0dZR+AlfA,
jiangshanlai-Re5JQEeQqe8AvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Vipin Sharma
VM worker kthreads can linger in the VM process's cgroup for sometime
after KVM terminates the VM process.
KVM terminates the worker kthreads by calling kthread_stop() which waits
on the 'exited' completion, triggered by exit_mm(), via mm_release(),
during kthread's exit. However, these kthreads are removed from the
cgroup using cgroup_exit() call which happens after exit_mm(). A VM
process can terminate between the time window of exit_mm() to
cgroup_exit(), leaving only worker kthreads in the cgroup.
Moving worker kthreads back to the original cgroup (kthreadd_task's
cgroup) makes sure that cgroup is empty as soon as the main VM process
is terminated.
kthreadd_task is not an exported symbol which causes build errors if KVM
is built as a loadable module. Both users (kvm_main & vhost) of
cgroup_attach_task_all(), have the same issue, therefore, using
kthreadd_task as a default option is chosen when the API is called with
NULL argument.
Signed-off-by: Vipin Sharma <vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
v2:
- Use kthreadd_task in the cgroup API to avoid build issue.
v1: https://lore.kernel.org/lkml/20211214050708.4040200-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
kernel/cgroup/cgroup-v1.c | 5 +++++
virt/kvm/kvm_main.c | 15 ++++++++++++++-
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 81c9e0685948..81d4b2f2acf0 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -51,6 +51,8 @@ bool cgroup1_ssid_disabled(int ssid)
* @from: attach to all cgroups of a given task
* @tsk: the task to be attached
*
+ * If @from is NULL then use kthreadd_task for finding the destination cgroups.
+ *
* Return: %0 on success or a negative errno code on failure
*/
int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
@@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
struct cgroup_root *root;
int retval = 0;
+ if (!from)
+ from = kthreadd_task;
+
mutex_lock(&cgroup_mutex);
percpu_down_write(&cgroup_threadgroup_rwsem);
for_each_root(root) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b0f7e6eb00ff..f7504578c374 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context)
init_context = NULL;
if (err)
- return err;
+ goto out;
/* Wait to be woken up by the spawner before proceeding. */
kthread_parkme();
@@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
if (!kthread_should_stop())
err = thread_fn(kvm, data);
+out:
+ /*
+ * We need to move the kthread back to its original cgroups, so that it
+ * doesn't linger in the cgroups of the user process after the user
+ * process has already terminated.
+ *
+ * kthread_stop() waits on 'exited' completion condition which is set
+ * in exit_mm(), via mm_release(), in do_exit(). However, kthread
+ * is removed from cgroups in the cgroup_exit() which is called after
+ * exit_mm(). This causes lingering of kthreads in cgroups after main
+ * VM process has finished.
+ */
+ WARN_ON(cgroup_attach_task_all(NULL, current));
return err;
}
base-commit: 5e4e84f1124aa02643833b7ea40abd5a8e964388
--
2.34.1.307.g9b7440fafd-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
@ 2021-12-22 22:53 ` Vipin Sharma
0 siblings, 0 replies; 24+ messages in thread
From: Vipin Sharma @ 2021-12-22 22:53 UTC (permalink / raw)
To: pbonzini, seanjc, tj, lizefan.x, hannes
Cc: dmatlack, jiangshanlai, kvm, cgroups, linux-kernel, Vipin Sharma
VM worker kthreads can linger in the VM process's cgroup for sometime
after KVM terminates the VM process.
KVM terminates the worker kthreads by calling kthread_stop() which waits
on the 'exited' completion, triggered by exit_mm(), via mm_release(),
during kthread's exit. However, these kthreads are removed from the
cgroup using cgroup_exit() call which happens after exit_mm(). A VM
process can terminate between the time window of exit_mm() to
cgroup_exit(), leaving only worker kthreads in the cgroup.
Moving worker kthreads back to the original cgroup (kthreadd_task's
cgroup) makes sure that cgroup is empty as soon as the main VM process
is terminated.
kthreadd_task is not an exported symbol which causes build errors if KVM
is built as a loadable module. Both users (kvm_main & vhost) of
cgroup_attach_task_all(), have the same issue, therefore, using
kthreadd_task as a default option is chosen when the API is called with
NULL argument.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
v2:
- Use kthreadd_task in the cgroup API to avoid build issue.
v1: https://lore.kernel.org/lkml/20211214050708.4040200-1-vipinsh@google.com/
kernel/cgroup/cgroup-v1.c | 5 +++++
virt/kvm/kvm_main.c | 15 ++++++++++++++-
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 81c9e0685948..81d4b2f2acf0 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -51,6 +51,8 @@ bool cgroup1_ssid_disabled(int ssid)
* @from: attach to all cgroups of a given task
* @tsk: the task to be attached
*
+ * If @from is NULL then use kthreadd_task for finding the destination cgroups.
+ *
* Return: %0 on success or a negative errno code on failure
*/
int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
@@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
struct cgroup_root *root;
int retval = 0;
+ if (!from)
+ from = kthreadd_task;
+
mutex_lock(&cgroup_mutex);
percpu_down_write(&cgroup_threadgroup_rwsem);
for_each_root(root) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b0f7e6eb00ff..f7504578c374 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context)
init_context = NULL;
if (err)
- return err;
+ goto out;
/* Wait to be woken up by the spawner before proceeding. */
kthread_parkme();
@@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
if (!kthread_should_stop())
err = thread_fn(kvm, data);
+out:
+ /*
+ * We need to move the kthread back to its original cgroups, so that it
+ * doesn't linger in the cgroups of the user process after the user
+ * process has already terminated.
+ *
+ * kthread_stop() waits on 'exited' completion condition which is set
+ * in exit_mm(), via mm_release(), in do_exit(). However, kthread
+ * is removed from cgroups in the cgroup_exit() which is called after
+ * exit_mm(). This causes lingering of kthreads in cgroups after main
+ * VM process has finished.
+ */
+ WARN_ON(cgroup_attach_task_all(NULL, current));
return err;
}
base-commit: 5e4e84f1124aa02643833b7ea40abd5a8e964388
--
2.34.1.307.g9b7440fafd-goog
^ permalink raw reply related [flat|nested] 24+ messages in thread[parent not found: <20211222225350.1912249-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
2021-12-22 22:53 ` Vipin Sharma
@ 2021-12-28 17:17 ` Sean Christopherson
-1 siblings, 0 replies; 24+ messages in thread
From: Sean Christopherson @ 2021-12-28 17:17 UTC (permalink / raw)
To: Vipin Sharma
Cc: pbonzini-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w,
dmatlack-hpIqsD4AKlfQT0dZR+AlfA,
jiangshanlai-Re5JQEeQqe8AvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
On Wed, Dec 22, 2021, Vipin Sharma wrote:
> kthreadd_task is not an exported symbol which causes build errors if KVM
> is built as a loadable module. Both users (kvm_main & vhost) of
> cgroup_attach_task_all(), have the same issue, therefore, using
> kthreadd_task as a default option is chosen when the API is called with
> NULL argument.
...
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 81c9e0685948..81d4b2f2acf0 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -51,6 +51,8 @@ bool cgroup1_ssid_disabled(int ssid)
> * @from: attach to all cgroups of a given task
> * @tsk: the task to be attached
> *
> + * If @from is NULL then use kthreadd_task for finding the destination cgroups.
> + *
> * Return: %0 on success or a negative errno code on failure
> */
> int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> @@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> struct cgroup_root *root;
> int retval = 0;
>
> + if (!from)
> + from = kthreadd_task;
Rather than sully cgroup_attach_task_all() with this behavior, can't KVM do
cgroup_attach_task_all(current->real_parent, current)
since AFAICT real_parent is guaranteed to point at kthreadd_task.
> +
> mutex_lock(&cgroup_mutex);
> percpu_down_write(&cgroup_threadgroup_rwsem);
> for_each_root(root) {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b0f7e6eb00ff..f7504578c374 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context)
> init_context = NULL;
>
> if (err)
> - return err;
> + goto out;
>
> /* Wait to be woken up by the spawner before proceeding. */
> kthread_parkme();
> @@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
> if (!kthread_should_stop())
> err = thread_fn(kvm, data);
>
> +out:
> + /*
> + * We need to move the kthread back to its original cgroups, so that it
Please state what is being done, not what "needs" to be done. The need to do
something is implicit, otherwise we wouldn't be doing it.
> + * doesn't linger in the cgroups of the user process after the user
> + * process has already terminated.
> + *
> + * kthread_stop() waits on 'exited' completion condition which is set
> + * in exit_mm(), via mm_release(), in do_exit(). However, kthread
> + * is removed from cgroups in the cgroup_exit() which is called after
> + * exit_mm(). This causes lingering of kthreads in cgroups after main
> + * VM process has finished.
> + */
> + WARN_ON(cgroup_attach_task_all(NULL, current));
This should not WARN, cgroup_attach_task_all() needs to perform allocations and
will fail with -ENOMEM even in the absense of kernel bugs.
> return err;
> }
>
>
> base-commit: 5e4e84f1124aa02643833b7ea40abd5a8e964388
> --
> 2.34.1.307.g9b7440fafd-goog
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
@ 2021-12-28 17:17 ` Sean Christopherson
0 siblings, 0 replies; 24+ messages in thread
From: Sean Christopherson @ 2021-12-28 17:17 UTC (permalink / raw)
To: Vipin Sharma
Cc: pbonzini, tj, lizefan.x, hannes, dmatlack, jiangshanlai, kvm,
cgroups, linux-kernel
On Wed, Dec 22, 2021, Vipin Sharma wrote:
> kthreadd_task is not an exported symbol which causes build errors if KVM
> is built as a loadable module. Both users (kvm_main & vhost) of
> cgroup_attach_task_all(), have the same issue, therefore, using
> kthreadd_task as a default option is chosen when the API is called with
> NULL argument.
...
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 81c9e0685948..81d4b2f2acf0 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -51,6 +51,8 @@ bool cgroup1_ssid_disabled(int ssid)
> * @from: attach to all cgroups of a given task
> * @tsk: the task to be attached
> *
> + * If @from is NULL then use kthreadd_task for finding the destination cgroups.
> + *
> * Return: %0 on success or a negative errno code on failure
> */
> int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> @@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> struct cgroup_root *root;
> int retval = 0;
>
> + if (!from)
> + from = kthreadd_task;
Rather than sully cgroup_attach_task_all() with this behavior, can't KVM do
cgroup_attach_task_all(current->real_parent, current)
since AFAICT real_parent is guaranteed to point at kthreadd_task.
> +
> mutex_lock(&cgroup_mutex);
> percpu_down_write(&cgroup_threadgroup_rwsem);
> for_each_root(root) {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b0f7e6eb00ff..f7504578c374 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context)
> init_context = NULL;
>
> if (err)
> - return err;
> + goto out;
>
> /* Wait to be woken up by the spawner before proceeding. */
> kthread_parkme();
> @@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
> if (!kthread_should_stop())
> err = thread_fn(kvm, data);
>
> +out:
> + /*
> + * We need to move the kthread back to its original cgroups, so that it
Please state what is being done, not what "needs" to be done. The need to do
something is implicit, otherwise we wouldn't be doing it.
> + * doesn't linger in the cgroups of the user process after the user
> + * process has already terminated.
> + *
> + * kthread_stop() waits on 'exited' completion condition which is set
> + * in exit_mm(), via mm_release(), in do_exit(). However, kthread
> + * is removed from cgroups in the cgroup_exit() which is called after
> + * exit_mm(). This causes lingering of kthreads in cgroups after main
> + * VM process has finished.
> + */
> + WARN_ON(cgroup_attach_task_all(NULL, current));
This should not WARN, cgroup_attach_task_all() needs to perform allocations and
will fail with -ENOMEM even in the absense of kernel bugs.
> return err;
> }
>
>
> base-commit: 5e4e84f1124aa02643833b7ea40abd5a8e964388
> --
> 2.34.1.307.g9b7440fafd-goog
>
^ permalink raw reply [flat|nested] 24+ messages in thread[parent not found: <YctGtWzYcNP2iTaN-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
2021-12-28 17:17 ` Sean Christopherson
@ 2022-01-18 19:53 ` Vipin Sharma
-1 siblings, 0 replies; 24+ messages in thread
From: Vipin Sharma @ 2022-01-18 19:53 UTC (permalink / raw)
To: Sean Christopherson
Cc: pbonzini-H+wXaHxf7aLQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w,
dmatlack-hpIqsD4AKlfQT0dZR+AlfA,
jiangshanlai-Re5JQEeQqe8AvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
On Tue, Dec 28, 2021 at 9:17 AM Sean Christopherson <seanjc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Wed, Dec 22, 2021, Vipin Sharma wrote:
> > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> > index 81c9e0685948..81d4b2f2acf0 100644
> > --- a/kernel/cgroup/cgroup-v1.c
> > +++ b/kernel/cgroup/cgroup-v1.c
> > int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> > @@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> > struct cgroup_root *root;
> > int retval = 0;
> >
> > + if (!from)
> > + from = kthreadd_task;
>
> Rather than sully cgroup_attach_task_all() with this behavior, can't KVM do
>
> cgroup_attach_task_all(current->real_parent, current)
>
> since AFAICT real_parent is guaranteed to point at kthreadd_task.
>
Thanks for the "real_parent" suggestion. This is much cleaner and
better than changing cgroup logic. I will make this change.
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index b0f7e6eb00ff..f7504578c374 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
> > if (!kthread_should_stop())
> > err = thread_fn(kvm, data);
> >
> > +out:
> > + /*
> > + * We need to move the kthread back to its original cgroups, so that it
>
> Please state what is being done, not what "needs" to be done. The need to do
> something is implicit, otherwise we wouldn't be doing it.
>
I will update the statement. Thanks.
> > + * doesn't linger in the cgroups of the user process after the user
> > + * process has already terminated.
> > + *
> > + * kthread_stop() waits on 'exited' completion condition which is set
> > + * in exit_mm(), via mm_release(), in do_exit(). However, kthread
> > + * is removed from cgroups in the cgroup_exit() which is called after
> > + * exit_mm(). This causes lingering of kthreads in cgroups after main
> > + * VM process has finished.
> > + */
> > + WARN_ON(cgroup_attach_task_all(NULL, current));
>
> This should not WARN, cgroup_attach_task_all() needs to perform allocations and
> will fail with -ENOMEM even in the absense of kernel bugs.
>
I will remove WARN_ON and print an error using kvm_err(), it will be
similar to the earlier call of cgroup_attach_task_all() in the same
function.
Thanks
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
@ 2022-01-18 19:53 ` Vipin Sharma
0 siblings, 0 replies; 24+ messages in thread
From: Vipin Sharma @ 2022-01-18 19:53 UTC (permalink / raw)
To: Sean Christopherson
Cc: pbonzini, tj, lizefan.x, hannes, dmatlack, jiangshanlai, kvm,
cgroups, linux-kernel
On Tue, Dec 28, 2021 at 9:17 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Dec 22, 2021, Vipin Sharma wrote:
> > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> > index 81c9e0685948..81d4b2f2acf0 100644
> > --- a/kernel/cgroup/cgroup-v1.c
> > +++ b/kernel/cgroup/cgroup-v1.c
> > int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> > @@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
> > struct cgroup_root *root;
> > int retval = 0;
> >
> > + if (!from)
> > + from = kthreadd_task;
>
> Rather than sully cgroup_attach_task_all() with this behavior, can't KVM do
>
> cgroup_attach_task_all(current->real_parent, current)
>
> since AFAICT real_parent is guaranteed to point at kthreadd_task.
>
Thanks for the "real_parent" suggestion. This is much cleaner and
better than changing cgroup logic. I will make this change.
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index b0f7e6eb00ff..f7504578c374 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context)
> > if (!kthread_should_stop())
> > err = thread_fn(kvm, data);
> >
> > +out:
> > + /*
> > + * We need to move the kthread back to its original cgroups, so that it
>
> Please state what is being done, not what "needs" to be done. The need to do
> something is implicit, otherwise we wouldn't be doing it.
>
I will update the statement. Thanks.
> > + * doesn't linger in the cgroups of the user process after the user
> > + * process has already terminated.
> > + *
> > + * kthread_stop() waits on 'exited' completion condition which is set
> > + * in exit_mm(), via mm_release(), in do_exit(). However, kthread
> > + * is removed from cgroups in the cgroup_exit() which is called after
> > + * exit_mm(). This causes lingering of kthreads in cgroups after main
> > + * VM process has finished.
> > + */
> > + WARN_ON(cgroup_attach_task_all(NULL, current));
>
> This should not WARN, cgroup_attach_task_all() needs to perform allocations and
> will fail with -ENOMEM even in the absense of kernel bugs.
>
I will remove WARN_ON and print an error using kvm_err(), it will be
similar to the earlier call of cgroup_attach_task_all() in the same
function.
Thanks
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
2021-12-22 22:53 ` Vipin Sharma
@ 2022-01-05 18:04 ` Michal Koutný
-1 siblings, 0 replies; 24+ messages in thread
From: Michal Koutný @ 2022-01-05 18:04 UTC (permalink / raw)
To: Vipin Sharma
Cc: pbonzini-H+wXaHxf7aLQT0dZR+AlfA, seanjc-hpIqsD4AKlfQT0dZR+AlfA,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
hannes-druUgvl0LCNAfugRpC6u6w, dmatlack-hpIqsD4AKlfQT0dZR+AlfA,
jiangshanlai-Re5JQEeQqe8AvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Hi Vipin.
On Wed, Dec 22, 2021 at 10:53:50PM +0000, Vipin Sharma <vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> VM worker kthreads can linger in the VM process's cgroup for sometime
> after KVM terminates the VM process.
Why is it a problem? And how long are we talking about?
> A VM process can terminate between the time window of exit_mm() to
> cgroup_exit(), leaving only worker kthreads in the cgroup.
Even kthreads should eventually have PF_EXITING set, they shouldd be
treated as "user-space" zombies by cgroups, i.e. mostly invisible (e.g.
it doesn't prevent rmdir'ing the cgroup).
(And after the last task_struct reference is gone, the cgroup structs
can be released too. Maybe the cause is holding the reference to the KVM
worker thread somewhere for too long.)
> Moving worker kthreads back to the original cgroup (kthreadd_task's
> cgroup) makes sure that cgroup is empty as soon as the main VM process
> is terminated.
BTW this used to be done for "user-space" tasks too (migrate to root
cgroup) but it was replaced with the less transactional "ignore zombies"
approach. So this change seems inconsistent.
Regards,
Michal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
@ 2022-01-05 18:04 ` Michal Koutný
0 siblings, 0 replies; 24+ messages in thread
From: Michal Koutný @ 2022-01-05 18:04 UTC (permalink / raw)
To: Vipin Sharma
Cc: pbonzini, seanjc, tj, lizefan.x, hannes, dmatlack, jiangshanlai,
kvm, cgroups, linux-kernel
Hi Vipin.
On Wed, Dec 22, 2021 at 10:53:50PM +0000, Vipin Sharma <vipinsh@google.com> wrote:
> VM worker kthreads can linger in the VM process's cgroup for sometime
> after KVM terminates the VM process.
Why is it a problem? And how long are we talking about?
> A VM process can terminate between the time window of exit_mm() to
> cgroup_exit(), leaving only worker kthreads in the cgroup.
Even kthreads should eventually have PF_EXITING set, they shouldd be
treated as "user-space" zombies by cgroups, i.e. mostly invisible (e.g.
it doesn't prevent rmdir'ing the cgroup).
(And after the last task_struct reference is gone, the cgroup structs
can be released too. Maybe the cause is holding the reference to the KVM
worker thread somewhere for too long.)
> Moving worker kthreads back to the original cgroup (kthreadd_task's
> cgroup) makes sure that cgroup is empty as soon as the main VM process
> is terminated.
BTW this used to be done for "user-space" tasks too (migrate to root
cgroup) but it was replaced with the less transactional "ignore zombies"
approach. So this change seems inconsistent.
Regards,
Michal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
2022-01-05 18:04 ` Michal Koutný
@ 2022-01-18 20:25 ` Vipin Sharma
-1 siblings, 0 replies; 24+ messages in thread
From: Vipin Sharma @ 2022-01-18 20:25 UTC (permalink / raw)
To: Michal Koutný
Cc: pbonzini, seanjc, tj, lizefan.x, hannes, dmatlack, jiangshanlai,
kvm, cgroups, linux-kernel
Hi Michal,
Thanks for looking into this patch. I will be using Sean's suggestion
and use real_parent of the task. This will avoid my ugly code in the
cgroup APIs.
On Wed, Jan 5, 2022 at 10:04 AM Michal Koutn√Ω <mkoutny@suse.com> wrote:
>
> Hi Vipin.
>
> On Wed, Dec 22, 2021 at 10:53:50PM +0000, Vipin Sharma <vipinsh@google.com> wrote:
> > VM worker kthreads can linger in the VM process's cgroup for sometime
> > after KVM terminates the VM process.
>
> Why is it a problem? And how long are we talking about?
>
Automated tools/scripts which delete VM cgroups after the main KVM
process ends were seeing deletion errors because kernel worker threads
were still running inside those cgroups. This is not a very frequent
issue but we noticed it happens every now and then.
> > A VM process can terminate between the time window of exit_mm() to
> > cgroup_exit(), leaving only worker kthreads in the cgroup.
>
> Even kthreads should eventually have PF_EXITING set, they shouldd be
> treated as "user-space" zombies by cgroups, i.e. mostly invisible (e.g.
> it doesn't prevent rmdir'ing the cgroup).
>
Since that eventual time period is not known, we can either pause the
script for sometime before starting the cleanup or add some x number
of retries. Both of which are not preferable due to indeterministic
nature.
> (And after the last task_struct reference is gone, the cgroup structs
> can be released too. Maybe the cause is holding the reference to the KVM
> worker thread somewhere for too long.)
>
> > Moving worker kthreads back to the original cgroup (kthreadd_task's
> > cgroup) makes sure that cgroup is empty as soon as the main VM process
> > is terminated.
>
> BTW this used to be done for "user-space" tasks too (migrate to root
> cgroup) but it was replaced with the less transactional "ignore zombies"
> approach. So this change seems inconsistent.
>
Interesting, however, until PF_EXITING is set those threads will also
exhibit the same behavior if it comes to deletion. Thanks for the
background, good to know.
Thanks
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
@ 2022-01-18 20:25 ` Vipin Sharma
0 siblings, 0 replies; 24+ messages in thread
From: Vipin Sharma @ 2022-01-18 20:25 UTC (permalink / raw)
To: Michal Koutný
Cc: pbonzini, seanjc, tj, lizefan.x, hannes, dmatlack, jiangshanlai,
kvm, cgroups, linux-kernel
Hi Michal,
Thanks for looking into this patch. I will be using Sean's suggestion
and use real_parent of the task. This will avoid my ugly code in the
cgroup APIs.
On Wed, Jan 5, 2022 at 10:04 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hi Vipin.
>
> On Wed, Dec 22, 2021 at 10:53:50PM +0000, Vipin Sharma <vipinsh@google.com> wrote:
> > VM worker kthreads can linger in the VM process's cgroup for sometime
> > after KVM terminates the VM process.
>
> Why is it a problem? And how long are we talking about?
>
Automated tools/scripts which delete VM cgroups after the main KVM
process ends were seeing deletion errors because kernel worker threads
were still running inside those cgroups. This is not a very frequent
issue but we noticed it happens every now and then.
> > A VM process can terminate between the time window of exit_mm() to
> > cgroup_exit(), leaving only worker kthreads in the cgroup.
>
> Even kthreads should eventually have PF_EXITING set, they shouldd be
> treated as "user-space" zombies by cgroups, i.e. mostly invisible (e.g.
> it doesn't prevent rmdir'ing the cgroup).
>
Since that eventual time period is not known, we can either pause the
script for sometime before starting the cleanup or add some x number
of retries. Both of which are not preferable due to indeterministic
nature.
> (And after the last task_struct reference is gone, the cgroup structs
> can be released too. Maybe the cause is holding the reference to the KVM
> worker thread somewhere for too long.)
>
> > Moving worker kthreads back to the original cgroup (kthreadd_task's
> > cgroup) makes sure that cgroup is empty as soon as the main VM process
> > is terminated.
>
> BTW this used to be done for "user-space" tasks too (migrate to root
> cgroup) but it was replaced with the less transactional "ignore zombies"
> approach. So this change seems inconsistent.
>
Interesting, however, until PF_EXITING is set those threads will also
exhibit the same behavior if it comes to deletion. Thanks for the
background, good to know.
Thanks
^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <CAHVum0e84nUcGtdPYQaJDQszKj-QVP5gM+nteBpSTaQ2sWYpmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
2022-01-18 20:25 ` Vipin Sharma
@ 2022-01-18 20:39 ` Tejun Heo
-1 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2022-01-18 20:39 UTC (permalink / raw)
To: Vipin Sharma
Cc: Michal Koutný, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
seanjc-hpIqsD4AKlfQT0dZR+AlfA, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
hannes-druUgvl0LCNAfugRpC6u6w, dmatlack-hpIqsD4AKlfQT0dZR+AlfA,
jiangshanlai-Re5JQEeQqe8AvxtiuMwx3w, kvm-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Hello,
On Tue, Jan 18, 2022 at 12:25:48PM -0800, Vipin Sharma wrote:
> Automated tools/scripts which delete VM cgroups after the main KVM
> process ends were seeing deletion errors because kernel worker threads
> were still running inside those cgroups. This is not a very frequent
> issue but we noticed it happens every now and then.
So, these are normally driven by the !populated events. That's how everyone
else is doing it. If you want to tie the kvm workers lifetimes to kvm
process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process
exit wait for the workers to be cleaned up.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
@ 2022-01-18 20:39 ` Tejun Heo
0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2022-01-18 20:39 UTC (permalink / raw)
To: Vipin Sharma
Cc: Michal Koutný, pbonzini, seanjc, lizefan.x, hannes, dmatlack,
jiangshanlai, kvm, cgroups, linux-kernel
Hello,
On Tue, Jan 18, 2022 at 12:25:48PM -0800, Vipin Sharma wrote:
> Automated tools/scripts which delete VM cgroups after the main KVM
> process ends were seeing deletion errors because kernel worker threads
> were still running inside those cgroups. This is not a very frequent
> issue but we noticed it happens every now and then.
So, these are normally driven by the !populated events. That's how everyone
else is doing it. If you want to tie the kvm workers lifetimes to kvm
process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process
exit wait for the workers to be cleaned up.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.
2022-01-18 20:39 ` Tejun Heo
(?)
@ 2022-01-19 18:02 ` Paolo Bonzini
[not found] ` <7a0bc562-9f25-392d-5c05-9dbcd350d002-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
-1 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2022-01-19 18:02 UTC (permalink / raw)
To: Tejun Heo, Vipin Sharma
Cc: Michal Koutný, seanjc, lizefan.x, hannes, dmatlack,
jiangshanlai, kvm, cgroups, linux-kernel
On 1/18/22 21:39, Tejun Heo wrote:
> So, these are normally driven by the !populated events. That's how everyone
> else is doing it. If you want to tie the kvm workers lifetimes to kvm
> process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process
> exit wait for the workers to be cleaned up.
It does. For example kvm_mmu_post_init_vm's call to
kvm_vm_create_worker_thread is matched with the call to
kthread_stop in kvm_mmu_pre_destroy_vm.
According to Vpin, the problem is that there's a small amount of time
between the return from kthread_stop and the point where the cgroup
can be removed. My understanding of the race is the following:
user process kthread management
------------ ------- ----------
wait4()
exit_task_work()
____fput()
kvm_mmu_pre_destroy_vm()
kthread_stop();
wait_for_completion();
exit_signals()
/* set PF_EXITING */
exit_mm()
exit_mm_release()
complete_vfork_done()
complete();
cgroup_exit()
cgroup_set_move_task()
css_set_update_populated()
exit_notify()
do_notify_parent()
<wakeup>
rmdir()
cgroup_destroy_locked()
cgroup_is_populated()
return -EBUSY
cgroup_exit()
cgroup_set_move_task()
css_set_update_populated()
I cannot find the code that makes it possible to rmdir a cgroup
if PF_EXITING is set.
Paolo
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2022-02-25 17:37 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-22 22:53 [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting Vipin Sharma
2021-12-22 22:53 ` Vipin Sharma
[not found] ` <20211222225350.1912249-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2021-12-28 17:17 ` Sean Christopherson
2021-12-28 17:17 ` Sean Christopherson
[not found] ` <YctGtWzYcNP2iTaN-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-01-18 19:53 ` Vipin Sharma
2022-01-18 19:53 ` Vipin Sharma
2022-01-05 18:04 ` Michal Koutný
2022-01-05 18:04 ` Michal Koutný
2022-01-18 20:25 ` Vipin Sharma
2022-01-18 20:25 ` Vipin Sharma
[not found] ` <CAHVum0e84nUcGtdPYQaJDQszKj-QVP5gM+nteBpSTaQ2sWYpmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-01-18 20:39 ` Tejun Heo
2022-01-18 20:39 ` Tejun Heo
2022-01-19 18:02 ` Paolo Bonzini
[not found] ` <7a0bc562-9f25-392d-5c05-9dbcd350d002-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-01-19 18:30 ` Tejun Heo
2022-01-19 18:30 ` Tejun Heo
[not found] ` <YehY0z2vHYVZk52J-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2022-01-19 18:49 ` Vipin Sharma
2022-01-19 18:49 ` Vipin Sharma
2022-01-19 19:05 ` Tejun Heo
2022-01-20 15:05 ` Michal Koutný
[not found] ` <20220120150502.GC27269-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-02-16 17:37 ` Vipin Sharma
2022-02-16 17:37 ` Vipin Sharma
[not found] ` <CAHVum0fOP-2XcUcG3PqW08DY7CmpDroG6Fcv9KoD1FqLmGpB8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-02-16 19:48 ` Paolo Bonzini
2022-02-16 19:48 ` Paolo Bonzini
2022-02-25 17:37 ` Michal Koutný
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.