From: "Michael S. Tsirkin" <mst@redhat.com>
To: Mike Christie <michael.christie@oracle.com>
Cc: hch@infradead.org, stefanha@redhat.com, jasowang@redhat.com,
sgarzare@redhat.com, virtualization@lists.linux-foundation.org,
brauner@kernel.org, ebiederm@xmission.com,
torvalds@linux-foundation.org, konrad.wilk@oracle.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads
Date: Sun, 13 Aug 2023 15:01:24 -0400 [thread overview]
Message-ID: <20230813145936-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <b2b02526-913d-42a9-9d23-59badf5b96db@oracle.com>
On Fri, Aug 11, 2023 at 01:51:36PM -0500, Mike Christie wrote:
> On 8/10/23 1:57 PM, Michael S. Tsirkin wrote:
> > On Sat, Jul 22, 2023 at 11:03:29PM -0500, michael.christie@oracle.com wrote:
> >> On 7/20/23 8:06 AM, Michael S. Tsirkin wrote:
> >>> On Thu, Feb 02, 2023 at 05:25:17PM -0600, Mike Christie wrote:
> >>>> For vhost workers we use the kthread API which inherit's its values from
> >>>> and checks against the kthreadd thread. This results in the wrong RLIMITs
> >>>> being checked, so while tools like libvirt try to control the number of
> >>>> threads based on the nproc rlimit setting we can end up creating more
> >>>> threads than the user wanted.
> >>>>
> >>>> This patch has us use the vhost_task helpers which will inherit its
> >>>> values/checks from the thread that owns the device similar to if we did
> >>>> a clone in userspace. The vhost threads will now be counted in the nproc
> >>>> rlimits. And we get features like cgroups and mm sharing automatically,
> >>>> so we can remove those calls.
> >>>>
> >>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> >>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >>>
> >>>
> >>> Hi Mike,
> >>> So this seems to have caused a measureable regression in networking
> >>> performance (about 30%). Take a look here, and there's a zip file
> >>> with detailed measuraments attached:
> >>>
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=2222603
> >>>
> >>>
> >>> Could you take a look please?
> >>> You can also ask reporter questions there assuming you
> >>> have or can create a (free) account.
> >>>
> >>
> >> Sorry for the late reply. I just got home from vacation.
> >>
> >> The account creation link seems to be down. I keep getting a
> >> "unable to establish SMTP connection to bz-exim-prod port 25 " error.
> >>
> >> Can you give me Quan's email?
> >>
> >> I think I can replicate the problem. I just need some extra info from Quan:
> >>
> >> 1. Just double check that they are using RHEL 9 on the host running the VMs.
> >> 2. The kernel config
> >> 3. Any tuning that was done. Is tuned running in guest and/or host running the
> >> VMs and what profile is being used in each.
> >> 4. Number of vCPUs and virtqueues being used.
> >> 5. Can they dump the contents of:
> >>
> >> /sys/kernel/debug/sched
> >>
> >> and
> >>
> >> sysctl -a
> >>
> >> on the host running the VMs.
> >>
> >> 6. With the 6.4 kernel, can they also run a quick test and tell me if they set
> >> the scheduler to batch:
> >>
> >> ps -T -o comm,pid,tid $QEMU_THREAD
> >>
> >> then for each vhost thread do:
> >>
> >> chrt -b -p 0 $VHOST_THREAD
> >>
> >> Does that end up increasing perf? When I do this I see throughput go up by
> >> around 50% vs 6.3 when sessions was 16 or more (16 was the number of vCPUs
> >> and virtqueues per net device in the VM). Note that I'm not saying that is a fix.
> >> It's just a difference I noticed when running some other tests.
> >
> >
> > Mike I'm unsure what to do at this point. Regressions are not nice
> > but if the kernel is released with the new userspace api we won't
> > be able to revert. So what's the plan?
> >
>
> I'm sort of stumped. I still can't replicate the problem out of the box. 6.3 and
> 6.4 perform the same for me. I've tried your setup and settings and with different
> combos of using things like tuned and irqbalance.
>
> I can sort of force the issue. In 6.4, the vhost thread inherits it's settings
> from the parent thread. In 6.3, the vhost thread inherits from kthreadd and we
> would then reset the sched settings. So in 6.4 if I just tune the parent differently
> I can cause different performance. If we want the 6.3 behavior we can do the patch
> below.
>
> However, I don't think you guys are hitting this because you are just running
> qemu from the normal shell and were not doing anything fancy with the sched
> settings.
>
>
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index da35e5b7f047..f2c2638d1106 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
> @@ -2,6 +2,7 @@
> /*
> * Copyright (C) 2021 Oracle Corporation
> */
> +#include <uapi/linux/sched/types.h>
> #include <linux/slab.h>
> #include <linux/completion.h>
> #include <linux/sched/task.h>
> @@ -22,9 +23,16 @@ struct vhost_task {
>
> static int vhost_task_fn(void *data)
> {
> + static const struct sched_param param = { .sched_priority = 0 };
> struct vhost_task *vtsk = data;
> bool dead = false;
>
> + /*
> + * Don't inherit the parent's sched info, so we maintain compat from
> + * when we used kthreads and it reset this info.
> + */
> + sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m);
> +
> for (;;) {
> bool did_work;
>
>
>
yes seems unlikely, still, attach this to bugzilla so it can be
tested?
and, what will help you debug? any traces to enable?
Also wasn't there another issue with a non standard config?
Maybe if we fix that it will by chance fix this one too?
>
>
next prev parent reply other threads:[~2023-08-13 19:02 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-02 23:25 [PATCH v11 0/8] Use copy_process in vhost layer Mike Christie
2023-02-02 23:25 ` [PATCH v11 1/8] fork: Make IO worker options flag based Mike Christie
2023-02-03 0:14 ` Linus Torvalds
2023-02-02 23:25 ` [PATCH v11 2/8] fork/vm: Move common PF_IO_WORKER behavior to new flag Mike Christie
2023-02-02 23:25 ` [PATCH v11 3/8] fork: add USER_WORKER flag to not dup/clone files Mike Christie
2023-02-03 0:16 ` Linus Torvalds
2023-02-02 23:25 ` [PATCH v11 4/8] fork: Add USER_WORKER flag to ignore signals Mike Christie
2023-02-03 0:19 ` Linus Torvalds
2023-02-05 16:06 ` Mike Christie
2023-02-02 23:25 ` [PATCH v11 5/8] fork: allow kernel code to call copy_process Mike Christie
2023-02-02 23:25 ` [PATCH v11 6/8] vhost_task: Allow vhost layer to use copy_process Mike Christie
2023-02-03 0:43 ` Linus Torvalds
2023-02-02 23:25 ` [PATCH v11 7/8] vhost: move worker thread fields to new struct Mike Christie
2023-02-02 23:25 ` [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Mike Christie
2023-05-05 13:40 ` Nicolas Dichtel
2023-05-05 18:22 ` Linus Torvalds
2023-05-05 22:37 ` Mike Christie
2023-05-06 1:53 ` Linus Torvalds
2023-05-08 17:13 ` Christian Brauner
2023-05-09 8:09 ` Nicolas Dichtel
2023-05-09 8:17 ` Nicolas Dichtel
2023-05-13 12:39 ` Thorsten Leemhuis
2023-05-13 15:08 ` Linus Torvalds
2023-05-15 14:23 ` Christian Brauner
2023-05-15 15:44 ` Linus Torvalds
2023-05-15 15:52 ` Jens Axboe
2023-05-15 15:54 ` Linus Torvalds
2023-05-15 17:23 ` Linus Torvalds
2023-05-15 15:56 ` Linus Torvalds
2023-05-15 22:23 ` Mike Christie
2023-05-15 22:54 ` Linus Torvalds
2023-05-16 3:53 ` Mike Christie
2023-05-16 13:18 ` Oleg Nesterov
2023-05-16 13:40 ` Oleg Nesterov
2023-05-16 15:56 ` Eric W. Biederman
2023-05-16 18:37 ` Oleg Nesterov
2023-05-16 20:12 ` Eric W. Biederman
2023-05-17 17:09 ` Oleg Nesterov
2023-05-17 18:22 ` Mike Christie
2023-05-16 8:39 ` Christian Brauner
2023-05-16 16:24 ` Mike Christie
2023-05-16 16:44 ` Christian Brauner
2023-05-19 12:15 ` [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Christian Brauner
2023-06-01 7:58 ` Thorsten Leemhuis
2023-06-01 10:18 ` Nicolas Dichtel
2023-06-01 10:47 ` Christian Brauner
2023-06-01 11:29 ` Thorsten Leemhuis
2023-06-01 12:26 ` Linus Torvalds
2023-06-01 16:10 ` Mike Christie
2023-05-16 14:06 ` [PATCH v11 8/8] vhost: use vhost_tasks for worker threads Linux regression tracking #adding (Thorsten Leemhuis)
2023-05-26 9:03 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-02 11:38 ` Thorsten Leemhuis
2023-07-20 13:06 ` Michael S. Tsirkin
2023-07-23 4:03 ` michael.christie
2023-07-23 9:31 ` Michael S. Tsirkin
2023-08-10 18:57 ` Michael S. Tsirkin
2023-08-11 18:51 ` Mike Christie
2023-08-13 19:01 ` Michael S. Tsirkin [this message]
2023-08-14 3:13 ` michael.christie
2023-02-07 8:19 ` [PATCH v11 0/8] Use copy_process in vhost layer Christian Brauner
-- strict thread matches above, loose matches on Subject: below --
2023-05-18 0:09 [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Mike Christie
2023-05-18 0:09 ` [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set Mike Christie
2023-05-18 2:34 ` Eric W. Biederman
2023-05-18 3:49 ` Eric W. Biederman
2023-05-18 15:21 ` Mike Christie
2023-05-18 16:25 ` Oleg Nesterov
2023-05-18 16:42 ` Mike Christie
2023-05-18 17:04 ` Oleg Nesterov
2023-05-18 18:28 ` Eric W. Biederman
2023-05-18 22:57 ` Mike Christie
2023-05-19 4:16 ` Eric W. Biederman
2023-05-19 23:24 ` Mike Christie
2023-05-22 13:30 ` Oleg Nesterov
2023-05-18 8:08 ` Christian Brauner
2023-05-18 15:27 ` Mike Christie
2023-05-18 17:07 ` Christian Brauner
2023-05-18 18:08 ` Oleg Nesterov
2023-05-18 18:12 ` Christian Brauner
2023-05-18 18:23 ` Oleg Nesterov
2023-05-18 0:09 ` [RFC PATCH 2/8] vhost/vhost_task: Hook vhost layer into signal handler Mike Christie
2023-05-18 0:16 ` Linus Torvalds
2023-05-18 1:01 ` Mike Christie
2023-05-18 8:16 ` Christian Brauner
2023-05-18 0:09 ` [RFC PATCH 3/8] fork/vhost_task: Switch to CLONE_THREAD and CLONE_SIGHAND Mike Christie
2023-05-18 8:18 ` Christian Brauner
2023-05-18 0:09 ` [RFC PATCH 4/8] vhost-net: Move vhost_net_open Mike Christie
2023-05-18 0:09 ` [RFC PATCH 5/8] vhost: Add callback that stops new work and waits on running ones Mike Christie
2023-05-18 14:18 ` Christian Brauner
2023-05-18 15:03 ` Mike Christie
2023-05-18 15:09 ` Christian Brauner
2023-05-18 18:38 ` Eric W. Biederman
2023-05-18 0:09 ` [RFC PATCH 6/8] vhost-scsi: Add callback to stop and wait on works Mike Christie
2023-05-18 0:09 ` [RFC PATCH 7/8] vhost-net: " Mike Christie
2023-05-18 0:09 ` [RFC PATCH 8/8] fork/vhost_task: remove no_files Mike Christie
2023-05-18 1:04 ` Mike Christie
2023-05-18 8:25 ` [RFC PATCH 0/8] vhost_tasks: Use CLONE_THREAD/SIGHAND Christian Brauner
2023-05-18 8:40 ` Christian Brauner
2023-05-18 14:30 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230813145936-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=hch@infradead.org \
--cc=jasowang@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.christie@oracle.com \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox